manus_ai_clone/README.md

# 🤖 Manus AI Clone

An AI-powered browser automation system that replicates the functionality of [Manus.im](https://manus.im) and [Scout.new](https://scout.new). This application allows users to control a web browser using natural language prompts, powered by LangChain and OpenAI's GPT models.

## ✨ Features

-   **Natural Language Browser Control**: Give instructions in plain English, and the AI will control the browser for you
-   **LangChain Integration**: Uses LangChain agents with custom tools for browser automation
-   **Playwright Browser Automation**: Full browser control with support for navigation, clicking, typing, and more
-   **Real-time Screenshots**: See what the browser is doing with automatic screenshots
-   **Action History**: Track all actions performed by the AI agent
-   **Beautiful Web UI**: Modern, responsive interface for interacting with the system
-   **RESTful API**: Programmatic access to browser automation capabilities

## 🛠️ Technology Stack

-   **Backend**: FastAPI (Python)
-   **AI Framework**: LangChain + OpenAI GPT models
-   **Browser Automation**: Playwright
-   **Frontend**: HTML/CSS/JavaScript (Vanilla)
-   **Package Management**: UV

## 📋 Prerequisites

-   Python 3.12+
-   UV package manager
-   OpenAI API key
-   Chrome/Chromium browser (installed automatically by Playwright)

## 🚀 Quick Start

### 1. Clone and Setup

```bash
# Navigate to project directory
cd manus_ai_clone

# Create and activate virtual environment using uv
uv venv
source .venv/bin/activate  # On Linux/Mac
# or
.venv\Scripts\activate  # On Windows

# Install dependencies
uv pip install -e .
```

### 2. Install Playwright Browsers

```bash
# Install Playwright browser binaries
playwright install chromium
```

### 3. Configure Environment

```bash
# Copy the example environment file
cp .env.example .env

# Edit .env and add your OpenAI API key
# OPENAI_API_KEY=sk-your-actual-api-key-here
```

### 4. Run the Application

```bash
# Start the server
python main.py

# Or use uvicorn directly
uvicorn main:app --reload --host 0.0.0.0 --port 8000
```

### 5. Access the Application

Open your browser and navigate to:

```
http://localhost:8000
```

## 🎯 Usage Examples

### Via Web Interface

1. Open http://localhost:8000 in your browser
2. Enter a natural language prompt in the text area
3. Click "Execute Task"
4. Watch the AI control the browser and see the results

**Example Prompts:**

-   "Go to google.com and search for 'LangChain tutorial'"
-   "Navigate to github.com and find the trending repositories"
-   "Open wikipedia.org and search for 'Artificial Intelligence', then read the first paragraph"
-   "Go to hacker news and get the top 5 story titles"
-   "Visit amazon.com and search for 'python books'"

### Via API

```bash
# Execute a browser automation task
curl -X POST "http://localhost:8000/execute" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Go to google.com and search for LangChain"}'

# Check health status
curl http://localhost:8000/health

# Get action history
curl http://localhost:8000/status
```

### Using Python

```python
import requests

# Execute a task
response = requests.post(
    "http://localhost:8000/execute",
    json={"prompt": "Go to github.com and search for 'langchain'"}
)

result = response.json()
print(f"Success: {result['success']}")
print(f"Output: {result['output']}")

# Screenshot is available as base64 encoded image
if result['screenshot']:
    import base64
    screenshot_data = base64.b64decode(result['screenshot'])
    with open('screenshot.png', 'wb') as f:
        f.write(screenshot_data)
```

## 🏗️ Architecture

### Components

1. **Browser Agent (`browser_agent.py`)**

    - `BrowserController`: Low-level Playwright wrapper for browser operations
    - `BrowserAgent`: LangChain agent with custom tools for AI-powered automation

2. **API Server (`main.py`)**

    - FastAPI application with REST endpoints
    - Lifecycle management for browser agent
    - Web UI serving

3. **Tools Available to AI**
    - `navigate`: Go to URLs
    - `click`: Click elements by CSS selector
    - `type_text`: Fill input fields
    - `get_text`: Extract text from elements
    - `get_page_content`: Read page content
    - `scroll`: Scroll the page
    - `get_elements_info`: Inspect elements
    - `execute_javascript`: Run custom JavaScript

### How It Works

```
User Prompt → FastAPI → LangChain Agent → Tools → Playwright → Browser
                ↓                                                  ↓
            Response ← Agent Reasoning ← Tool Results ← Browser State
```

1. User submits a natural language prompt
2. LangChain agent breaks down the task into steps
3. Agent selects and executes appropriate tools
4. Playwright performs browser actions
5. Results are collected and returned with a screenshot

## ⚙️ Configuration

### Environment Variables

```bash
# Required
OPENAI_API_KEY=sk-your-api-key-here

# Optional
MODEL=gpt-4o-mini              # OpenAI model to use
HEADLESS=false                 # Run browser in headless mode
HOST=0.0.0.0                   # Server host
PORT=8000                      # Server port
```

### Model Options

-   `gpt-4o-mini` (default) - Fast and cost-effective
-   `gpt-4o` - More capable, higher cost
-   `gpt-4-turbo` - Advanced reasoning

## 🔧 Development

### Project Structure

```
manus_ai_clone/
├── main.py              # FastAPI application
├── browser_agent.py     # Browser automation logic
├── pyproject.toml       # Dependencies
├── .env.example         # Environment template
├── .gitignore          # Git ignore rules
└── README.md           # This file
```

### Adding Custom Tools

To add new browser automation capabilities:

1. Add a method to `BrowserController` class
2. Create a wrapper function in `BrowserAgent._create_tools()`
3. Add a `Tool` definition with name, function, and description

Example:

```python
# In BrowserController
async def custom_action(self, param: str) -> str:
    # Your implementation
    return "Result"

# In BrowserAgent._create_tools()
def custom_action_wrapper(param: str) -> str:
    return asyncio.run(self.browser.custom_action(param))

# Add to tools list
Tool(
    name="custom_action",
    func=custom_action_wrapper,
    description="Description of what this tool does"
)
```

## 🐛 Troubleshooting

### Common Issues

**Import errors about missing packages:**

```bash
# Packages not installed yet (errors are normal before installation)
uv pip install -e .
playwright install chromium
```

**"Browser agent not initialized":**

-   Check that your `OPENAI_API_KEY` is set in `.env`
-   Make sure the `.env` file is in the project root

**Playwright errors:**

```bash
# Reinstall Playwright browsers
playwright install --force chromium
```

**Element not found errors:**

-   The AI might be using incorrect selectors
-   Try being more specific in your prompt
-   Some websites use dynamic class names or have anti-bot measures

**Timeout errors:**

-   Some pages load slowly
-   Try increasing timeout values in `browser_agent.py`
-   Or use simpler websites for testing

## 🚀 Deployment

### Using Docker (Optional)

```dockerfile
FROM python:3.12-slim

WORKDIR /app

# Install system dependencies for Playwright
RUN apt-get update && apt-get install -y \
    wget \
    gnupg \
    && rm -rf /var/lib/apt/lists/*

# Copy project files
COPY . .

# Install Python dependencies
RUN pip install -e .
RUN playwright install --with-deps chromium

# Run the application
CMD ["python", "main.py"]
```

### Environment Setup for Production

```bash
# Use headless mode in production
HEADLESS=true

# Use a more capable model if needed
MODEL=gpt-4o

# Secure your API
# Consider adding authentication middleware
```

## 📊 Performance Tips

1. **Use headless mode** (`HEADLESS=true`) for faster execution
2. **Choose the right model**: `gpt-4o-mini` for speed, `gpt-4o` for complex tasks
3. **Be specific in prompts**: More detailed prompts = better results
4. **Set appropriate timeouts**: Adjust based on your target websites

## 🤝 Contributing

Contributions are welcome! Areas for improvement:

-   Additional browser automation tools
-   Better error handling and recovery
-   Support for multiple concurrent browser sessions
-   Screenshot comparison and validation
-   Browser session persistence
-   Integration with other LLM providers

## 📝 License

This project is open source and available under the MIT License.

## 🙏 Acknowledgments

-   [Manus.im](https://manus.im) and [Scout.new](https://scout.new) for inspiration
-   [LangChain](https://www.langchain.com/) for the agent framework
-   [Playwright](https://playwright.dev/) for browser automation
-   [FastAPI](https://fastapi.tiangolo.com/) for the web framework

## 📧 Support

For issues, questions, or contributions, please open an issue on the project repository.

---

**Built with ❤️ using LangChain, Playwright, and FastAPI**