8.8 KiB
🤖 Manus AI Clone
An AI-powered browser automation system that replicates the functionality of Manus.im and Scout.new. This application allows users to control a web browser using natural language prompts, powered by LangChain and OpenAI's GPT models.
✨ Features
- Natural Language Browser Control: Give instructions in plain English, and the AI will control the browser for you
- LangChain Integration: Uses LangChain agents with custom tools for browser automation
- Playwright Browser Automation: Full browser control with support for navigation, clicking, typing, and more
- Real-time Screenshots: See what the browser is doing with automatic screenshots
- Action History: Track all actions performed by the AI agent
- Beautiful Web UI: Modern, responsive interface for interacting with the system
- RESTful API: Programmatic access to browser automation capabilities
🛠️ Technology Stack
- Backend: FastAPI (Python)
- AI Framework: LangChain + OpenAI GPT models
- Browser Automation: Playwright
- Frontend: HTML/CSS/JavaScript (Vanilla)
- Package Management: UV
📋 Prerequisites
- Python 3.12+
- UV package manager
- OpenAI API key
- Chrome/Chromium browser (installed automatically by Playwright)
🚀 Quick Start
1. Clone and Setup
# Navigate to project directory
cd manus_ai_clone
# Create and activate virtual environment using uv
uv venv
source .venv/bin/activate # On Linux/Mac
# or
.venv\Scripts\activate # On Windows
# Install dependencies
uv pip install -e .
2. Install Playwright Browsers
# Install Playwright browser binaries
playwright install chromium
3. Configure Environment
# Copy the example environment file
cp .env.example .env
# Edit .env and add your OpenAI API key
# OPENAI_API_KEY=sk-your-actual-api-key-here
4. Run the Application
# Start the server
python main.py
# Or use uvicorn directly
uvicorn main:app --reload --host 0.0.0.0 --port 8000
5. Access the Application
Open your browser and navigate to:
http://localhost:8000
🎯 Usage Examples
Via Web Interface
- Open http://localhost:8000 in your browser
- Enter a natural language prompt in the text area
- Click "Execute Task"
- Watch the AI control the browser and see the results
Example Prompts:
- "Go to google.com and search for 'LangChain tutorial'"
- "Navigate to github.com and find the trending repositories"
- "Open wikipedia.org and search for 'Artificial Intelligence', then read the first paragraph"
- "Go to hacker news and get the top 5 story titles"
- "Visit amazon.com and search for 'python books'"
Via API
# Execute a browser automation task
curl -X POST "http://localhost:8000/execute" \
-H "Content-Type: application/json" \
-d '{"prompt": "Go to google.com and search for LangChain"}'
# Check health status
curl http://localhost:8000/health
# Get action history
curl http://localhost:8000/status
Using Python
import requests
# Execute a task
response = requests.post(
"http://localhost:8000/execute",
json={"prompt": "Go to github.com and search for 'langchain'"}
)
result = response.json()
print(f"Success: {result['success']}")
print(f"Output: {result['output']}")
# Screenshot is available as base64 encoded image
if result['screenshot']:
import base64
screenshot_data = base64.b64decode(result['screenshot'])
with open('screenshot.png', 'wb') as f:
f.write(screenshot_data)
🏗️ Architecture
Components
-
Browser Agent (
browser_agent.py)BrowserController: Low-level Playwright wrapper for browser operationsBrowserAgent: LangChain agent with custom tools for AI-powered automation
-
API Server (
main.py)- FastAPI application with REST endpoints
- Lifecycle management for browser agent
- Web UI serving
-
Tools Available to AI
navigate: Go to URLsclick: Click elements by CSS selectortype_text: Fill input fieldsget_text: Extract text from elementsget_page_content: Read page contentscroll: Scroll the pageget_elements_info: Inspect elementsexecute_javascript: Run custom JavaScript
How It Works
User Prompt → FastAPI → LangChain Agent → Tools → Playwright → Browser
↓ ↓
Response ← Agent Reasoning ← Tool Results ← Browser State
- User submits a natural language prompt
- LangChain agent breaks down the task into steps
- Agent selects and executes appropriate tools
- Playwright performs browser actions
- Results are collected and returned with a screenshot
⚙️ Configuration
Environment Variables
# Required
OPENAI_API_KEY=sk-your-api-key-here
# Optional
MODEL=gpt-4o-mini # OpenAI model to use
HEADLESS=false # Run browser in headless mode
HOST=0.0.0.0 # Server host
PORT=8000 # Server port
Model Options
gpt-4o-mini(default) - Fast and cost-effectivegpt-4o- More capable, higher costgpt-4-turbo- Advanced reasoning
🔧 Development
Project Structure
manus_ai_clone/
├── main.py # FastAPI application
├── browser_agent.py # Browser automation logic
├── pyproject.toml # Dependencies
├── .env.example # Environment template
├── .gitignore # Git ignore rules
└── README.md # This file
Adding Custom Tools
To add new browser automation capabilities:
- Add a method to
BrowserControllerclass - Create a wrapper function in
BrowserAgent._create_tools() - Add a
Tooldefinition with name, function, and description
Example:
# In BrowserController
async def custom_action(self, param: str) -> str:
# Your implementation
return "Result"
# In BrowserAgent._create_tools()
def custom_action_wrapper(param: str) -> str:
return asyncio.run(self.browser.custom_action(param))
# Add to tools list
Tool(
name="custom_action",
func=custom_action_wrapper,
description="Description of what this tool does"
)
🐛 Troubleshooting
Common Issues
Import errors about missing packages:
# Packages not installed yet (errors are normal before installation)
uv pip install -e .
playwright install chromium
"Browser agent not initialized":
- Check that your
OPENAI_API_KEYis set in.env - Make sure the
.envfile is in the project root
Playwright errors:
# Reinstall Playwright browsers
playwright install --force chromium
Element not found errors:
- The AI might be using incorrect selectors
- Try being more specific in your prompt
- Some websites use dynamic class names or have anti-bot measures
Timeout errors:
- Some pages load slowly
- Try increasing timeout values in
browser_agent.py - Or use simpler websites for testing
🚀 Deployment
Using Docker (Optional)
FROM python:3.12-slim
WORKDIR /app
# Install system dependencies for Playwright
RUN apt-get update && apt-get install -y \
wget \
gnupg \
&& rm -rf /var/lib/apt/lists/*
# Copy project files
COPY . .
# Install Python dependencies
RUN pip install -e .
RUN playwright install --with-deps chromium
# Run the application
CMD ["python", "main.py"]
Environment Setup for Production
# Use headless mode in production
HEADLESS=true
# Use a more capable model if needed
MODEL=gpt-4o
# Secure your API
# Consider adding authentication middleware
📊 Performance Tips
- Use headless mode (
HEADLESS=true) for faster execution - Choose the right model:
gpt-4o-minifor speed,gpt-4ofor complex tasks - Be specific in prompts: More detailed prompts = better results
- Set appropriate timeouts: Adjust based on your target websites
🤝 Contributing
Contributions are welcome! Areas for improvement:
- Additional browser automation tools
- Better error handling and recovery
- Support for multiple concurrent browser sessions
- Screenshot comparison and validation
- Browser session persistence
- Integration with other LLM providers
📝 License
This project is open source and available under the MIT License.
🙏 Acknowledgments
- Manus.im and Scout.new for inspiration
- LangChain for the agent framework
- Playwright for browser automation
- FastAPI for the web framework
📧 Support
For issues, questions, or contributions, please open an issue on the project repository.
Built with ❤️ using LangChain, Playwright, and FastAPI