2025-11-05 01:03:10 +01:00
2025-11-05 01:03:10 +01:00
2025-11-05 01:03:10 +01:00
2025-11-05 01:03:10 +01:00
2025-11-05 01:03:10 +01:00
2025-11-05 01:03:10 +01:00
2025-11-05 01:03:10 +01:00
2025-11-05 01:03:10 +01:00
2025-11-05 01:03:10 +01:00
2025-11-05 01:03:10 +01:00
2025-11-05 01:03:10 +01:00
2025-11-05 01:03:10 +01:00
2025-11-05 01:03:10 +01:00
2025-11-05 01:03:10 +01:00

🤖 Manus AI Clone

An AI-powered browser automation system that replicates the functionality of Manus.im and Scout.new. This application allows users to control a web browser using natural language prompts, powered by LangChain and OpenAI's GPT models.

Features

  • Natural Language Browser Control: Give instructions in plain English, and the AI will control the browser for you
  • LangChain Integration: Uses LangChain agents with custom tools for browser automation
  • Playwright Browser Automation: Full browser control with support for navigation, clicking, typing, and more
  • Real-time Screenshots: See what the browser is doing with automatic screenshots
  • Action History: Track all actions performed by the AI agent
  • Beautiful Web UI: Modern, responsive interface for interacting with the system
  • RESTful API: Programmatic access to browser automation capabilities

🛠️ Technology Stack

  • Backend: FastAPI (Python)
  • AI Framework: LangChain + OpenAI GPT models
  • Browser Automation: Playwright
  • Frontend: HTML/CSS/JavaScript (Vanilla)
  • Package Management: UV

📋 Prerequisites

  • Python 3.12+
  • UV package manager
  • OpenAI API key
  • Chrome/Chromium browser (installed automatically by Playwright)

🚀 Quick Start

1. Clone and Setup

# Navigate to project directory
cd manus_ai_clone

# Create and activate virtual environment using uv
uv venv
source .venv/bin/activate  # On Linux/Mac
# or
.venv\Scripts\activate  # On Windows

# Install dependencies
uv pip install -e .

2. Install Playwright Browsers

# Install Playwright browser binaries
playwright install chromium

3. Configure Environment

# Copy the example environment file
cp .env.example .env

# Edit .env and add your OpenAI API key
# OPENAI_API_KEY=sk-your-actual-api-key-here

4. Run the Application

# Start the server
python main.py

# Or use uvicorn directly
uvicorn main:app --reload --host 0.0.0.0 --port 8000

5. Access the Application

Open your browser and navigate to:

http://localhost:8000

🎯 Usage Examples

Via Web Interface

  1. Open http://localhost:8000 in your browser
  2. Enter a natural language prompt in the text area
  3. Click "Execute Task"
  4. Watch the AI control the browser and see the results

Example Prompts:

  • "Go to google.com and search for 'LangChain tutorial'"
  • "Navigate to github.com and find the trending repositories"
  • "Open wikipedia.org and search for 'Artificial Intelligence', then read the first paragraph"
  • "Go to hacker news and get the top 5 story titles"
  • "Visit amazon.com and search for 'python books'"

Via API

# Execute a browser automation task
curl -X POST "http://localhost:8000/execute" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Go to google.com and search for LangChain"}'

# Check health status
curl http://localhost:8000/health

# Get action history
curl http://localhost:8000/status

Using Python

import requests

# Execute a task
response = requests.post(
    "http://localhost:8000/execute",
    json={"prompt": "Go to github.com and search for 'langchain'"}
)

result = response.json()
print(f"Success: {result['success']}")
print(f"Output: {result['output']}")

# Screenshot is available as base64 encoded image
if result['screenshot']:
    import base64
    screenshot_data = base64.b64decode(result['screenshot'])
    with open('screenshot.png', 'wb') as f:
        f.write(screenshot_data)

🏗️ Architecture

Components

  1. Browser Agent (browser_agent.py)

    • BrowserController: Low-level Playwright wrapper for browser operations
    • BrowserAgent: LangChain agent with custom tools for AI-powered automation
  2. API Server (main.py)

    • FastAPI application with REST endpoints
    • Lifecycle management for browser agent
    • Web UI serving
  3. Tools Available to AI

    • navigate: Go to URLs
    • click: Click elements by CSS selector
    • type_text: Fill input fields
    • get_text: Extract text from elements
    • get_page_content: Read page content
    • scroll: Scroll the page
    • get_elements_info: Inspect elements
    • execute_javascript: Run custom JavaScript

How It Works

User Prompt → FastAPI → LangChain Agent → Tools → Playwright → Browser
                ↓                                                  ↓
            Response ← Agent Reasoning ← Tool Results ← Browser State
  1. User submits a natural language prompt
  2. LangChain agent breaks down the task into steps
  3. Agent selects and executes appropriate tools
  4. Playwright performs browser actions
  5. Results are collected and returned with a screenshot

⚙️ Configuration

Environment Variables

# Required
OPENAI_API_KEY=sk-your-api-key-here

# Optional
MODEL=gpt-4o-mini              # OpenAI model to use
HEADLESS=false                 # Run browser in headless mode
HOST=0.0.0.0                   # Server host
PORT=8000                      # Server port

Model Options

  • gpt-4o-mini (default) - Fast and cost-effective
  • gpt-4o - More capable, higher cost
  • gpt-4-turbo - Advanced reasoning

🔧 Development

Project Structure

manus_ai_clone/
├── main.py              # FastAPI application
├── browser_agent.py     # Browser automation logic
├── pyproject.toml       # Dependencies
├── .env.example         # Environment template
├── .gitignore          # Git ignore rules
└── README.md           # This file

Adding Custom Tools

To add new browser automation capabilities:

  1. Add a method to BrowserController class
  2. Create a wrapper function in BrowserAgent._create_tools()
  3. Add a Tool definition with name, function, and description

Example:

# In BrowserController
async def custom_action(self, param: str) -> str:
    # Your implementation
    return "Result"

# In BrowserAgent._create_tools()
def custom_action_wrapper(param: str) -> str:
    return asyncio.run(self.browser.custom_action(param))

# Add to tools list
Tool(
    name="custom_action",
    func=custom_action_wrapper,
    description="Description of what this tool does"
)

🐛 Troubleshooting

Common Issues

Import errors about missing packages:

# Packages not installed yet (errors are normal before installation)
uv pip install -e .
playwright install chromium

"Browser agent not initialized":

  • Check that your OPENAI_API_KEY is set in .env
  • Make sure the .env file is in the project root

Playwright errors:

# Reinstall Playwright browsers
playwright install --force chromium

Element not found errors:

  • The AI might be using incorrect selectors
  • Try being more specific in your prompt
  • Some websites use dynamic class names or have anti-bot measures

Timeout errors:

  • Some pages load slowly
  • Try increasing timeout values in browser_agent.py
  • Or use simpler websites for testing

🚀 Deployment

Using Docker (Optional)

FROM python:3.12-slim

WORKDIR /app

# Install system dependencies for Playwright
RUN apt-get update && apt-get install -y \
    wget \
    gnupg \
    && rm -rf /var/lib/apt/lists/*

# Copy project files
COPY . .

# Install Python dependencies
RUN pip install -e .
RUN playwright install --with-deps chromium

# Run the application
CMD ["python", "main.py"]

Environment Setup for Production

# Use headless mode in production
HEADLESS=true

# Use a more capable model if needed
MODEL=gpt-4o

# Secure your API
# Consider adding authentication middleware

📊 Performance Tips

  1. Use headless mode (HEADLESS=true) for faster execution
  2. Choose the right model: gpt-4o-mini for speed, gpt-4o for complex tasks
  3. Be specific in prompts: More detailed prompts = better results
  4. Set appropriate timeouts: Adjust based on your target websites

🤝 Contributing

Contributions are welcome! Areas for improvement:

  • Additional browser automation tools
  • Better error handling and recovery
  • Support for multiple concurrent browser sessions
  • Screenshot comparison and validation
  • Browser session persistence
  • Integration with other LLM providers

📝 License

This project is open source and available under the MIT License.

🙏 Acknowledgments

📧 Support

For issues, questions, or contributions, please open an issue on the project repository.


Built with ❤️ using LangChain, Playwright, and FastAPI

S
Description
No description provided
Readme 196 KiB
Languages
Python 33.8%
JavaScript 25.4%
CSS 23.2%
HTML 14.5%
Shell 3.1%