351 lines
8.8 KiB
Markdown
351 lines
8.8 KiB
Markdown
# 🤖 Manus AI Clone
|
|
|
|
An AI-powered browser automation system that replicates the functionality of [Manus.im](https://manus.im) and [Scout.new](https://scout.new). This application allows users to control a web browser using natural language prompts, powered by LangChain and OpenAI's GPT models.
|
|
|
|
## ✨ Features
|
|
|
|
- **Natural Language Browser Control**: Give instructions in plain English, and the AI will control the browser for you
|
|
- **LangChain Integration**: Uses LangChain agents with custom tools for browser automation
|
|
- **Playwright Browser Automation**: Full browser control with support for navigation, clicking, typing, and more
|
|
- **Real-time Screenshots**: See what the browser is doing with automatic screenshots
|
|
- **Action History**: Track all actions performed by the AI agent
|
|
- **Beautiful Web UI**: Modern, responsive interface for interacting with the system
|
|
- **RESTful API**: Programmatic access to browser automation capabilities
|
|
|
|
## 🛠️ Technology Stack
|
|
|
|
- **Backend**: FastAPI (Python)
|
|
- **AI Framework**: LangChain + OpenAI GPT models
|
|
- **Browser Automation**: Playwright
|
|
- **Frontend**: HTML/CSS/JavaScript (Vanilla)
|
|
- **Package Management**: UV
|
|
|
|
## 📋 Prerequisites
|
|
|
|
- Python 3.12+
|
|
- UV package manager
|
|
- OpenAI API key
|
|
- Chrome/Chromium browser (installed automatically by Playwright)
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### 1. Clone and Setup
|
|
|
|
```bash
|
|
# Navigate to project directory
|
|
cd manus_ai_clone
|
|
|
|
# Create and activate virtual environment using uv
|
|
uv venv
|
|
source .venv/bin/activate # On Linux/Mac
|
|
# or
|
|
.venv\Scripts\activate # On Windows
|
|
|
|
# Install dependencies
|
|
uv pip install -e .
|
|
```
|
|
|
|
### 2. Install Playwright Browsers
|
|
|
|
```bash
|
|
# Install Playwright browser binaries
|
|
playwright install chromium
|
|
```
|
|
|
|
### 3. Configure Environment
|
|
|
|
```bash
|
|
# Copy the example environment file
|
|
cp .env.example .env
|
|
|
|
# Edit .env and add your OpenAI API key
|
|
# OPENAI_API_KEY=sk-your-actual-api-key-here
|
|
```
|
|
|
|
### 4. Run the Application
|
|
|
|
```bash
|
|
# Start the server
|
|
python main.py
|
|
|
|
# Or use uvicorn directly
|
|
uvicorn main:app --reload --host 0.0.0.0 --port 8000
|
|
```
|
|
|
|
### 5. Access the Application
|
|
|
|
Open your browser and navigate to:
|
|
|
|
```
|
|
http://localhost:8000
|
|
```
|
|
|
|
## 🎯 Usage Examples
|
|
|
|
### Via Web Interface
|
|
|
|
1. Open http://localhost:8000 in your browser
|
|
2. Enter a natural language prompt in the text area
|
|
3. Click "Execute Task"
|
|
4. Watch the AI control the browser and see the results
|
|
|
|
**Example Prompts:**
|
|
|
|
- "Go to google.com and search for 'LangChain tutorial'"
|
|
- "Navigate to github.com and find the trending repositories"
|
|
- "Open wikipedia.org and search for 'Artificial Intelligence', then read the first paragraph"
|
|
- "Go to hacker news and get the top 5 story titles"
|
|
- "Visit amazon.com and search for 'python books'"
|
|
|
|
### Via API
|
|
|
|
```bash
|
|
# Execute a browser automation task
|
|
curl -X POST "http://localhost:8000/execute" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"prompt": "Go to google.com and search for LangChain"}'
|
|
|
|
# Check health status
|
|
curl http://localhost:8000/health
|
|
|
|
# Get action history
|
|
curl http://localhost:8000/status
|
|
```
|
|
|
|
### Using Python
|
|
|
|
```python
|
|
import requests
|
|
|
|
# Execute a task
|
|
response = requests.post(
|
|
"http://localhost:8000/execute",
|
|
json={"prompt": "Go to github.com and search for 'langchain'"}
|
|
)
|
|
|
|
result = response.json()
|
|
print(f"Success: {result['success']}")
|
|
print(f"Output: {result['output']}")
|
|
|
|
# Screenshot is available as base64 encoded image
|
|
if result['screenshot']:
|
|
import base64
|
|
screenshot_data = base64.b64decode(result['screenshot'])
|
|
with open('screenshot.png', 'wb') as f:
|
|
f.write(screenshot_data)
|
|
```
|
|
|
|
## 🏗️ Architecture
|
|
|
|
### Components
|
|
|
|
1. **Browser Agent (`browser_agent.py`)**
|
|
|
|
- `BrowserController`: Low-level Playwright wrapper for browser operations
|
|
- `BrowserAgent`: LangChain agent with custom tools for AI-powered automation
|
|
|
|
2. **API Server (`main.py`)**
|
|
|
|
- FastAPI application with REST endpoints
|
|
- Lifecycle management for browser agent
|
|
- Web UI serving
|
|
|
|
3. **Tools Available to AI**
|
|
- `navigate`: Go to URLs
|
|
- `click`: Click elements by CSS selector
|
|
- `type_text`: Fill input fields
|
|
- `get_text`: Extract text from elements
|
|
- `get_page_content`: Read page content
|
|
- `scroll`: Scroll the page
|
|
- `get_elements_info`: Inspect elements
|
|
- `execute_javascript`: Run custom JavaScript
|
|
|
|
### How It Works
|
|
|
|
```
|
|
User Prompt → FastAPI → LangChain Agent → Tools → Playwright → Browser
|
|
↓ ↓
|
|
Response ← Agent Reasoning ← Tool Results ← Browser State
|
|
```
|
|
|
|
1. User submits a natural language prompt
|
|
2. LangChain agent breaks down the task into steps
|
|
3. Agent selects and executes appropriate tools
|
|
4. Playwright performs browser actions
|
|
5. Results are collected and returned with a screenshot
|
|
|
|
## ⚙️ Configuration
|
|
|
|
### Environment Variables
|
|
|
|
```bash
|
|
# Required
|
|
OPENAI_API_KEY=sk-your-api-key-here
|
|
|
|
# Optional
|
|
MODEL=gpt-4o-mini # OpenAI model to use
|
|
HEADLESS=false # Run browser in headless mode
|
|
HOST=0.0.0.0 # Server host
|
|
PORT=8000 # Server port
|
|
```
|
|
|
|
### Model Options
|
|
|
|
- `gpt-4o-mini` (default) - Fast and cost-effective
|
|
- `gpt-4o` - More capable, higher cost
|
|
- `gpt-4-turbo` - Advanced reasoning
|
|
|
|
## 🔧 Development
|
|
|
|
### Project Structure
|
|
|
|
```
|
|
manus_ai_clone/
|
|
├── main.py # FastAPI application
|
|
├── browser_agent.py # Browser automation logic
|
|
├── pyproject.toml # Dependencies
|
|
├── .env.example # Environment template
|
|
├── .gitignore # Git ignore rules
|
|
└── README.md # This file
|
|
```
|
|
|
|
### Adding Custom Tools
|
|
|
|
To add new browser automation capabilities:
|
|
|
|
1. Add a method to `BrowserController` class
|
|
2. Create a wrapper function in `BrowserAgent._create_tools()`
|
|
3. Add a `Tool` definition with name, function, and description
|
|
|
|
Example:
|
|
|
|
```python
|
|
# In BrowserController
|
|
async def custom_action(self, param: str) -> str:
|
|
# Your implementation
|
|
return "Result"
|
|
|
|
# In BrowserAgent._create_tools()
|
|
def custom_action_wrapper(param: str) -> str:
|
|
return asyncio.run(self.browser.custom_action(param))
|
|
|
|
# Add to tools list
|
|
Tool(
|
|
name="custom_action",
|
|
func=custom_action_wrapper,
|
|
description="Description of what this tool does"
|
|
)
|
|
```
|
|
|
|
## 🐛 Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**Import errors about missing packages:**
|
|
|
|
```bash
|
|
# Packages not installed yet (errors are normal before installation)
|
|
uv pip install -e .
|
|
playwright install chromium
|
|
```
|
|
|
|
**"Browser agent not initialized":**
|
|
|
|
- Check that your `OPENAI_API_KEY` is set in `.env`
|
|
- Make sure the `.env` file is in the project root
|
|
|
|
**Playwright errors:**
|
|
|
|
```bash
|
|
# Reinstall Playwright browsers
|
|
playwright install --force chromium
|
|
```
|
|
|
|
**Element not found errors:**
|
|
|
|
- The AI might be using incorrect selectors
|
|
- Try being more specific in your prompt
|
|
- Some websites use dynamic class names or have anti-bot measures
|
|
|
|
**Timeout errors:**
|
|
|
|
- Some pages load slowly
|
|
- Try increasing timeout values in `browser_agent.py`
|
|
- Or use simpler websites for testing
|
|
|
|
## 🚀 Deployment
|
|
|
|
### Using Docker (Optional)
|
|
|
|
```dockerfile
|
|
FROM python:3.12-slim
|
|
|
|
WORKDIR /app
|
|
|
|
# Install system dependencies for Playwright
|
|
RUN apt-get update && apt-get install -y \
|
|
wget \
|
|
gnupg \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
# Copy project files
|
|
COPY . .
|
|
|
|
# Install Python dependencies
|
|
RUN pip install -e .
|
|
RUN playwright install --with-deps chromium
|
|
|
|
# Run the application
|
|
CMD ["python", "main.py"]
|
|
```
|
|
|
|
### Environment Setup for Production
|
|
|
|
```bash
|
|
# Use headless mode in production
|
|
HEADLESS=true
|
|
|
|
# Use a more capable model if needed
|
|
MODEL=gpt-4o
|
|
|
|
# Secure your API
|
|
# Consider adding authentication middleware
|
|
```
|
|
|
|
## 📊 Performance Tips
|
|
|
|
1. **Use headless mode** (`HEADLESS=true`) for faster execution
|
|
2. **Choose the right model**: `gpt-4o-mini` for speed, `gpt-4o` for complex tasks
|
|
3. **Be specific in prompts**: More detailed prompts = better results
|
|
4. **Set appropriate timeouts**: Adjust based on your target websites
|
|
|
|
## 🤝 Contributing
|
|
|
|
Contributions are welcome! Areas for improvement:
|
|
|
|
- Additional browser automation tools
|
|
- Better error handling and recovery
|
|
- Support for multiple concurrent browser sessions
|
|
- Screenshot comparison and validation
|
|
- Browser session persistence
|
|
- Integration with other LLM providers
|
|
|
|
## 📝 License
|
|
|
|
This project is open source and available under the MIT License.
|
|
|
|
## 🙏 Acknowledgments
|
|
|
|
- [Manus.im](https://manus.im) and [Scout.new](https://scout.new) for inspiration
|
|
- [LangChain](https://www.langchain.com/) for the agent framework
|
|
- [Playwright](https://playwright.dev/) for browser automation
|
|
- [FastAPI](https://fastapi.tiangolo.com/) for the web framework
|
|
|
|
## 📧 Support
|
|
|
|
For issues, questions, or contributions, please open an issue on the project repository.
|
|
|
|
---
|
|
|
|
**Built with ❤️ using LangChain, Playwright, and FastAPI**
|