first commit
This commit is contained in:
@@ -0,0 +1,350 @@
|
||||
# 🤖 Manus AI Clone
|
||||
|
||||
An AI-powered browser automation system that replicates the functionality of [Manus.im](https://manus.im) and [Scout.new](https://scout.new). This application allows users to control a web browser using natural language prompts, powered by LangChain and OpenAI's GPT models.
|
||||
|
||||
## ✨ Features
|
||||
|
||||
- **Natural Language Browser Control**: Give instructions in plain English, and the AI will control the browser for you
|
||||
- **LangChain Integration**: Uses LangChain agents with custom tools for browser automation
|
||||
- **Playwright Browser Automation**: Full browser control with support for navigation, clicking, typing, and more
|
||||
- **Real-time Screenshots**: See what the browser is doing with automatic screenshots
|
||||
- **Action History**: Track all actions performed by the AI agent
|
||||
- **Beautiful Web UI**: Modern, responsive interface for interacting with the system
|
||||
- **RESTful API**: Programmatic access to browser automation capabilities
|
||||
|
||||
## 🛠️ Technology Stack
|
||||
|
||||
- **Backend**: FastAPI (Python)
|
||||
- **AI Framework**: LangChain + OpenAI GPT models
|
||||
- **Browser Automation**: Playwright
|
||||
- **Frontend**: HTML/CSS/JavaScript (Vanilla)
|
||||
- **Package Management**: UV
|
||||
|
||||
## 📋 Prerequisites
|
||||
|
||||
- Python 3.12+
|
||||
- UV package manager
|
||||
- OpenAI API key
|
||||
- Chrome/Chromium browser (installed automatically by Playwright)
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
### 1. Clone and Setup
|
||||
|
||||
```bash
|
||||
# Navigate to project directory
|
||||
cd manus_ai_clone
|
||||
|
||||
# Create and activate virtual environment using uv
|
||||
uv venv
|
||||
source .venv/bin/activate # On Linux/Mac
|
||||
# or
|
||||
.venv\Scripts\activate # On Windows
|
||||
|
||||
# Install dependencies
|
||||
uv pip install -e .
|
||||
```
|
||||
|
||||
### 2. Install Playwright Browsers
|
||||
|
||||
```bash
|
||||
# Install Playwright browser binaries
|
||||
playwright install chromium
|
||||
```
|
||||
|
||||
### 3. Configure Environment
|
||||
|
||||
```bash
|
||||
# Copy the example environment file
|
||||
cp .env.example .env
|
||||
|
||||
# Edit .env and add your OpenAI API key
|
||||
# OPENAI_API_KEY=sk-your-actual-api-key-here
|
||||
```
|
||||
|
||||
### 4. Run the Application
|
||||
|
||||
```bash
|
||||
# Start the server
|
||||
python main.py
|
||||
|
||||
# Or use uvicorn directly
|
||||
uvicorn main:app --reload --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
### 5. Access the Application
|
||||
|
||||
Open your browser and navigate to:
|
||||
|
||||
```
|
||||
http://localhost:8000
|
||||
```
|
||||
|
||||
## 🎯 Usage Examples
|
||||
|
||||
### Via Web Interface
|
||||
|
||||
1. Open http://localhost:8000 in your browser
|
||||
2. Enter a natural language prompt in the text area
|
||||
3. Click "Execute Task"
|
||||
4. Watch the AI control the browser and see the results
|
||||
|
||||
**Example Prompts:**
|
||||
|
||||
- "Go to google.com and search for 'LangChain tutorial'"
|
||||
- "Navigate to github.com and find the trending repositories"
|
||||
- "Open wikipedia.org and search for 'Artificial Intelligence', then read the first paragraph"
|
||||
- "Go to hacker news and get the top 5 story titles"
|
||||
- "Visit amazon.com and search for 'python books'"
|
||||
|
||||
### Via API
|
||||
|
||||
```bash
|
||||
# Execute a browser automation task
|
||||
curl -X POST "http://localhost:8000/execute" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"prompt": "Go to google.com and search for LangChain"}'
|
||||
|
||||
# Check health status
|
||||
curl http://localhost:8000/health
|
||||
|
||||
# Get action history
|
||||
curl http://localhost:8000/status
|
||||
```
|
||||
|
||||
### Using Python
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
# Execute a task
|
||||
response = requests.post(
|
||||
"http://localhost:8000/execute",
|
||||
json={"prompt": "Go to github.com and search for 'langchain'"}
|
||||
)
|
||||
|
||||
result = response.json()
|
||||
print(f"Success: {result['success']}")
|
||||
print(f"Output: {result['output']}")
|
||||
|
||||
# Screenshot is available as base64 encoded image
|
||||
if result['screenshot']:
|
||||
import base64
|
||||
screenshot_data = base64.b64decode(result['screenshot'])
|
||||
with open('screenshot.png', 'wb') as f:
|
||||
f.write(screenshot_data)
|
||||
```
|
||||
|
||||
## 🏗️ Architecture
|
||||
|
||||
### Components
|
||||
|
||||
1. **Browser Agent (`browser_agent.py`)**
|
||||
|
||||
- `BrowserController`: Low-level Playwright wrapper for browser operations
|
||||
- `BrowserAgent`: LangChain agent with custom tools for AI-powered automation
|
||||
|
||||
2. **API Server (`main.py`)**
|
||||
|
||||
- FastAPI application with REST endpoints
|
||||
- Lifecycle management for browser agent
|
||||
- Web UI serving
|
||||
|
||||
3. **Tools Available to AI**
|
||||
- `navigate`: Go to URLs
|
||||
- `click`: Click elements by CSS selector
|
||||
- `type_text`: Fill input fields
|
||||
- `get_text`: Extract text from elements
|
||||
- `get_page_content`: Read page content
|
||||
- `scroll`: Scroll the page
|
||||
- `get_elements_info`: Inspect elements
|
||||
- `execute_javascript`: Run custom JavaScript
|
||||
|
||||
### How It Works
|
||||
|
||||
```
|
||||
User Prompt → FastAPI → LangChain Agent → Tools → Playwright → Browser
|
||||
↓ ↓
|
||||
Response ← Agent Reasoning ← Tool Results ← Browser State
|
||||
```
|
||||
|
||||
1. User submits a natural language prompt
|
||||
2. LangChain agent breaks down the task into steps
|
||||
3. Agent selects and executes appropriate tools
|
||||
4. Playwright performs browser actions
|
||||
5. Results are collected and returned with a screenshot
|
||||
|
||||
## ⚙️ Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
# Required
|
||||
OPENAI_API_KEY=sk-your-api-key-here
|
||||
|
||||
# Optional
|
||||
MODEL=gpt-4o-mini # OpenAI model to use
|
||||
HEADLESS=false # Run browser in headless mode
|
||||
HOST=0.0.0.0 # Server host
|
||||
PORT=8000 # Server port
|
||||
```
|
||||
|
||||
### Model Options
|
||||
|
||||
- `gpt-4o-mini` (default) - Fast and cost-effective
|
||||
- `gpt-4o` - More capable, higher cost
|
||||
- `gpt-4-turbo` - Advanced reasoning
|
||||
|
||||
## 🔧 Development
|
||||
|
||||
### Project Structure
|
||||
|
||||
```
|
||||
manus_ai_clone/
|
||||
├── main.py # FastAPI application
|
||||
├── browser_agent.py # Browser automation logic
|
||||
├── pyproject.toml # Dependencies
|
||||
├── .env.example # Environment template
|
||||
├── .gitignore # Git ignore rules
|
||||
└── README.md # This file
|
||||
```
|
||||
|
||||
### Adding Custom Tools
|
||||
|
||||
To add new browser automation capabilities:
|
||||
|
||||
1. Add a method to `BrowserController` class
|
||||
2. Create a wrapper function in `BrowserAgent._create_tools()`
|
||||
3. Add a `Tool` definition with name, function, and description
|
||||
|
||||
Example:
|
||||
|
||||
```python
|
||||
# In BrowserController
|
||||
async def custom_action(self, param: str) -> str:
|
||||
# Your implementation
|
||||
return "Result"
|
||||
|
||||
# In BrowserAgent._create_tools()
|
||||
def custom_action_wrapper(param: str) -> str:
|
||||
return asyncio.run(self.browser.custom_action(param))
|
||||
|
||||
# Add to tools list
|
||||
Tool(
|
||||
name="custom_action",
|
||||
func=custom_action_wrapper,
|
||||
description="Description of what this tool does"
|
||||
)
|
||||
```
|
||||
|
||||
## 🐛 Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Import errors about missing packages:**
|
||||
|
||||
```bash
|
||||
# Packages not installed yet (errors are normal before installation)
|
||||
uv pip install -e .
|
||||
playwright install chromium
|
||||
```
|
||||
|
||||
**"Browser agent not initialized":**
|
||||
|
||||
- Check that your `OPENAI_API_KEY` is set in `.env`
|
||||
- Make sure the `.env` file is in the project root
|
||||
|
||||
**Playwright errors:**
|
||||
|
||||
```bash
|
||||
# Reinstall Playwright browsers
|
||||
playwright install --force chromium
|
||||
```
|
||||
|
||||
**Element not found errors:**
|
||||
|
||||
- The AI might be using incorrect selectors
|
||||
- Try being more specific in your prompt
|
||||
- Some websites use dynamic class names or have anti-bot measures
|
||||
|
||||
**Timeout errors:**
|
||||
|
||||
- Some pages load slowly
|
||||
- Try increasing timeout values in `browser_agent.py`
|
||||
- Or use simpler websites for testing
|
||||
|
||||
## 🚀 Deployment
|
||||
|
||||
### Using Docker (Optional)
|
||||
|
||||
```dockerfile
|
||||
FROM python:3.12-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Install system dependencies for Playwright
|
||||
RUN apt-get update && apt-get install -y \
|
||||
wget \
|
||||
gnupg \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Copy project files
|
||||
COPY . .
|
||||
|
||||
# Install Python dependencies
|
||||
RUN pip install -e .
|
||||
RUN playwright install --with-deps chromium
|
||||
|
||||
# Run the application
|
||||
CMD ["python", "main.py"]
|
||||
```
|
||||
|
||||
### Environment Setup for Production
|
||||
|
||||
```bash
|
||||
# Use headless mode in production
|
||||
HEADLESS=true
|
||||
|
||||
# Use a more capable model if needed
|
||||
MODEL=gpt-4o
|
||||
|
||||
# Secure your API
|
||||
# Consider adding authentication middleware
|
||||
```
|
||||
|
||||
## 📊 Performance Tips
|
||||
|
||||
1. **Use headless mode** (`HEADLESS=true`) for faster execution
|
||||
2. **Choose the right model**: `gpt-4o-mini` for speed, `gpt-4o` for complex tasks
|
||||
3. **Be specific in prompts**: More detailed prompts = better results
|
||||
4. **Set appropriate timeouts**: Adjust based on your target websites
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
Contributions are welcome! Areas for improvement:
|
||||
|
||||
- Additional browser automation tools
|
||||
- Better error handling and recovery
|
||||
- Support for multiple concurrent browser sessions
|
||||
- Screenshot comparison and validation
|
||||
- Browser session persistence
|
||||
- Integration with other LLM providers
|
||||
|
||||
## 📝 License
|
||||
|
||||
This project is open source and available under the MIT License.
|
||||
|
||||
## 🙏 Acknowledgments
|
||||
|
||||
- [Manus.im](https://manus.im) and [Scout.new](https://scout.new) for inspiration
|
||||
- [LangChain](https://www.langchain.com/) for the agent framework
|
||||
- [Playwright](https://playwright.dev/) for browser automation
|
||||
- [FastAPI](https://fastapi.tiangolo.com/) for the web framework
|
||||
|
||||
## 📧 Support
|
||||
|
||||
For issues, questions, or contributions, please open an issue on the project repository.
|
||||
|
||||
---
|
||||
|
||||
**Built with ❤️ using LangChain, Playwright, and FastAPI**
|
||||
Reference in New Issue
Block a user