Files

T

2025-11-05 01:03:10 +01:00

8.8 KiB

Raw Blame History

🤖 Manus AI Clone

An AI-powered browser automation system that replicates the functionality of Manus.im and Scout.new. This application allows users to control a web browser using natural language prompts, powered by LangChain and OpenAI's GPT models.

✨ Features

Natural Language Browser Control: Give instructions in plain English, and the AI will control the browser for you
LangChain Integration: Uses LangChain agents with custom tools for browser automation
Playwright Browser Automation: Full browser control with support for navigation, clicking, typing, and more
Real-time Screenshots: See what the browser is doing with automatic screenshots
Action History: Track all actions performed by the AI agent
Beautiful Web UI: Modern, responsive interface for interacting with the system
RESTful API: Programmatic access to browser automation capabilities

🛠️ Technology Stack

Backend: FastAPI (Python)
AI Framework: LangChain + OpenAI GPT models
Browser Automation: Playwright
Frontend: HTML/CSS/JavaScript (Vanilla)
Package Management: UV

📋 Prerequisites

Python 3.12+
UV package manager
OpenAI API key
Chrome/Chromium browser (installed automatically by Playwright)

🚀 Quick Start

1. Clone and Setup

# Navigate to project directory
cd manus_ai_clone

# Create and activate virtual environment using uv
uv venv
source .venv/bin/activate  # On Linux/Mac
# or
.venv\Scripts\activate  # On Windows

# Install dependencies
uv pip install -e .

2. Install Playwright Browsers

# Install Playwright browser binaries
playwright install chromium

3. Configure Environment

# Copy the example environment file
cp .env.example .env

# Edit .env and add your OpenAI API key
# OPENAI_API_KEY=sk-your-actual-api-key-here

4. Run the Application

# Start the server
python main.py

# Or use uvicorn directly
uvicorn main:app --reload --host 0.0.0.0 --port 8000

5. Access the Application

Open your browser and navigate to:

http://localhost:8000

🎯 Usage Examples

Via Web Interface

Open http://localhost:8000 in your browser
Enter a natural language prompt in the text area
Click "Execute Task"
Watch the AI control the browser and see the results

Example Prompts:

"Go to google.com and search for 'LangChain tutorial'"
"Navigate to github.com and find the trending repositories"
"Open wikipedia.org and search for 'Artificial Intelligence', then read the first paragraph"
"Go to hacker news and get the top 5 story titles"
"Visit amazon.com and search for 'python books'"

Via API

# Execute a browser automation task
curl -X POST "http://localhost:8000/execute" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Go to google.com and search for LangChain"}'

# Check health status
curl http://localhost:8000/health

# Get action history
curl http://localhost:8000/status

Using Python

import requests

# Execute a task
response = requests.post(
    "http://localhost:8000/execute",
    json={"prompt": "Go to github.com and search for 'langchain'"}
)

result = response.json()
print(f"Success: {result['success']}")
print(f"Output: {result['output']}")

# Screenshot is available as base64 encoded image
if result['screenshot']:
    import base64
    screenshot_data = base64.b64decode(result['screenshot'])
    with open('screenshot.png', 'wb') as f:
        f.write(screenshot_data)

🏗️ Architecture

Components

Browser Agent (browser_agent.py)
- BrowserController: Low-level Playwright wrapper for browser operations
- BrowserAgent: LangChain agent with custom tools for AI-powered automation
API Server (main.py)
- FastAPI application with REST endpoints
- Lifecycle management for browser agent
- Web UI serving
Tools Available to AI
- navigate: Go to URLs
- click: Click elements by CSS selector
- type_text: Fill input fields
- get_text: Extract text from elements
- get_page_content: Read page content
- scroll: Scroll the page
- get_elements_info: Inspect elements
- execute_javascript: Run custom JavaScript

How It Works

User Prompt → FastAPI → LangChain Agent → Tools → Playwright → Browser
                ↓                                                  ↓
            Response ← Agent Reasoning ← Tool Results ← Browser State

User submits a natural language prompt
LangChain agent breaks down the task into steps
Agent selects and executes appropriate tools
Playwright performs browser actions
Results are collected and returned with a screenshot

⚙️ Configuration

Environment Variables

# Required
OPENAI_API_KEY=sk-your-api-key-here

# Optional
MODEL=gpt-4o-mini              # OpenAI model to use
HEADLESS=false                 # Run browser in headless mode
HOST=0.0.0.0                   # Server host
PORT=8000                      # Server port

Model Options

gpt-4o-mini (default) - Fast and cost-effective
gpt-4o - More capable, higher cost
gpt-4-turbo - Advanced reasoning

🔧 Development

Project Structure

manus_ai_clone/
├── main.py              # FastAPI application
├── browser_agent.py     # Browser automation logic
├── pyproject.toml       # Dependencies
├── .env.example         # Environment template
├── .gitignore          # Git ignore rules
└── README.md           # This file

Adding Custom Tools

To add new browser automation capabilities:

Add a method to BrowserController class
Create a wrapper function in BrowserAgent._create_tools()
Add a Tool definition with name, function, and description

Example:

# In BrowserController
async def custom_action(self, param: str) -> str:
    # Your implementation
    return "Result"

# In BrowserAgent._create_tools()
def custom_action_wrapper(param: str) -> str:
    return asyncio.run(self.browser.custom_action(param))

# Add to tools list
Tool(
    name="custom_action",
    func=custom_action_wrapper,
    description="Description of what this tool does"
)

🐛 Troubleshooting

Common Issues

Import errors about missing packages:

# Packages not installed yet (errors are normal before installation)
uv pip install -e .
playwright install chromium

"Browser agent not initialized":

Check that your OPENAI_API_KEY is set in .env
Make sure the .env file is in the project root

Playwright errors:

# Reinstall Playwright browsers
playwright install --force chromium

Element not found errors:

The AI might be using incorrect selectors
Try being more specific in your prompt
Some websites use dynamic class names or have anti-bot measures

Timeout errors:

Some pages load slowly
Try increasing timeout values in browser_agent.py
Or use simpler websites for testing

🚀 Deployment

Using Docker (Optional)

FROM python:3.12-slim

WORKDIR /app

# Install system dependencies for Playwright
RUN apt-get update && apt-get install -y \
    wget \
    gnupg \
    && rm -rf /var/lib/apt/lists/*

# Copy project files
COPY . .

# Install Python dependencies
RUN pip install -e .
RUN playwright install --with-deps chromium

# Run the application
CMD ["python", "main.py"]

Environment Setup for Production

# Use headless mode in production
HEADLESS=true

# Use a more capable model if needed
MODEL=gpt-4o

# Secure your API
# Consider adding authentication middleware

📊 Performance Tips

Use headless mode (HEADLESS=true) for faster execution
Choose the right model: gpt-4o-mini for speed, gpt-4o for complex tasks
Be specific in prompts: More detailed prompts = better results
Set appropriate timeouts: Adjust based on your target websites

🤝 Contributing

Contributions are welcome! Areas for improvement:

Additional browser automation tools
Better error handling and recovery
Support for multiple concurrent browser sessions
Screenshot comparison and validation
Browser session persistence
Integration with other LLM providers

📝 License

This project is open source and available under the MIT License.

🙏 Acknowledgments

Manus.im and Scout.new for inspiration
LangChain for the agent framework
Playwright for browser automation
FastAPI for the web framework

📧 Support

For issues, questions, or contributions, please open an issue on the project repository.

Built with ❤️ using LangChain, Playwright, and FastAPI

8.8 KiB Raw Blame History