first commit

2025-11-05 01:03:10 +01:00
commit 5a802e7641
20 changed files with 6161 additions and 0 deletions
@@ -0,0 +1,350 @@
+# 🤖 Manus AI Clone
+
+An AI-powered browser automation system that replicates the functionality of [Manus.im](https://manus.im) and [Scout.new](https://scout.new). This application allows users to control a web browser using natural language prompts, powered by LangChain and OpenAI's GPT models.
+
+## ✨ Features
+
+-   **Natural Language Browser Control**: Give instructions in plain English, and the AI will control the browser for you
+-   **LangChain Integration**: Uses LangChain agents with custom tools for browser automation
+-   **Playwright Browser Automation**: Full browser control with support for navigation, clicking, typing, and more
+-   **Real-time Screenshots**: See what the browser is doing with automatic screenshots
+-   **Action History**: Track all actions performed by the AI agent
+-   **Beautiful Web UI**: Modern, responsive interface for interacting with the system
+-   **RESTful API**: Programmatic access to browser automation capabilities
+
+## 🛠️ Technology Stack
+
+-   **Backend**: FastAPI (Python)
+-   **AI Framework**: LangChain + OpenAI GPT models
+-   **Browser Automation**: Playwright
+-   **Frontend**: HTML/CSS/JavaScript (Vanilla)
+-   **Package Management**: UV
+
+## 📋 Prerequisites
+
+-   Python 3.12+
+-   UV package manager
+-   OpenAI API key
+-   Chrome/Chromium browser (installed automatically by Playwright)
+
+## 🚀 Quick Start
+
+### 1. Clone and Setup
+
+```bash
+# Navigate to project directory
+cd manus_ai_clone
+
+# Create and activate virtual environment using uv
+uv venv
+source .venv/bin/activate  # On Linux/Mac
+# or
+.venv\Scripts\activate  # On Windows
+
+# Install dependencies
+uv pip install -e .
+```
+
+### 2. Install Playwright Browsers
+
+```bash
+# Install Playwright browser binaries
+playwright install chromium
+```
+
+### 3. Configure Environment
+
+```bash
+# Copy the example environment file
+cp .env.example .env
+
+# Edit .env and add your OpenAI API key
+# OPENAI_API_KEY=sk-your-actual-api-key-here
+```
+
+### 4. Run the Application
+
+```bash
+# Start the server
+python main.py
+
+# Or use uvicorn directly
+uvicorn main:app --reload --host 0.0.0.0 --port 8000
+```
+
+### 5. Access the Application
+
+Open your browser and navigate to:
+
+```
+http://localhost:8000
+```
+
+## 🎯 Usage Examples
+
+### Via Web Interface
+
+1. Open http://localhost:8000 in your browser
+2. Enter a natural language prompt in the text area
+3. Click "Execute Task"
+4. Watch the AI control the browser and see the results
+
+**Example Prompts:**
+
+-   "Go to google.com and search for 'LangChain tutorial'"
+-   "Navigate to github.com and find the trending repositories"
+-   "Open wikipedia.org and search for 'Artificial Intelligence', then read the first paragraph"
+-   "Go to hacker news and get the top 5 story titles"
+-   "Visit amazon.com and search for 'python books'"
+
+### Via API
+
+```bash
+# Execute a browser automation task
+curl -X POST "http://localhost:8000/execute" \
+  -H "Content-Type: application/json" \
+  -d '{"prompt": "Go to google.com and search for LangChain"}'
+
+# Check health status
+curl http://localhost:8000/health
+
+# Get action history
+curl http://localhost:8000/status
+```
+
+### Using Python
+
+```python
+import requests
+
+# Execute a task
+response = requests.post(
+    "http://localhost:8000/execute",
+    json={"prompt": "Go to github.com and search for 'langchain'"}
+)
+
+result = response.json()
+print(f"Success: {result['success']}")
+print(f"Output: {result['output']}")
+
+# Screenshot is available as base64 encoded image
+if result['screenshot']:
+    import base64
+    screenshot_data = base64.b64decode(result['screenshot'])
+    with open('screenshot.png', 'wb') as f:
+        f.write(screenshot_data)
+```
+
+## 🏗️ Architecture
+
+### Components
+
+1. **Browser Agent (`browser_agent.py`)**
+
+    - `BrowserController`: Low-level Playwright wrapper for browser operations
+    - `BrowserAgent`: LangChain agent with custom tools for AI-powered automation
+
+2. **API Server (`main.py`)**
+
+    - FastAPI application with REST endpoints
+    - Lifecycle management for browser agent
+    - Web UI serving
+
+3. **Tools Available to AI**
+    - `navigate`: Go to URLs
+    - `click`: Click elements by CSS selector
+    - `type_text`: Fill input fields
+    - `get_text`: Extract text from elements
+    - `get_page_content`: Read page content
+    - `scroll`: Scroll the page
+    - `get_elements_info`: Inspect elements
+    - `execute_javascript`: Run custom JavaScript
+
+### How It Works
+
+```
+User Prompt → FastAPI → LangChain Agent → Tools → Playwright → Browser
+                ↓                                                  ↓
+            Response ← Agent Reasoning ← Tool Results ← Browser State
+```
+
+1. User submits a natural language prompt
+2. LangChain agent breaks down the task into steps
+3. Agent selects and executes appropriate tools
+4. Playwright performs browser actions
+5. Results are collected and returned with a screenshot
+
+## ⚙️ Configuration
+
+### Environment Variables
+
+```bash
+# Required
+OPENAI_API_KEY=sk-your-api-key-here
+
+# Optional
+MODEL=gpt-4o-mini              # OpenAI model to use
+HEADLESS=false                 # Run browser in headless mode
+HOST=0.0.0.0                   # Server host
+PORT=8000                      # Server port
+```
+
+### Model Options
+
+-   `gpt-4o-mini` (default) - Fast and cost-effective
+-   `gpt-4o` - More capable, higher cost
+-   `gpt-4-turbo` - Advanced reasoning
+
+## 🔧 Development
+
+### Project Structure
+
+```
+manus_ai_clone/
+├── main.py              # FastAPI application
+├── browser_agent.py     # Browser automation logic
+├── pyproject.toml       # Dependencies
+├── .env.example         # Environment template
+├── .gitignore          # Git ignore rules
+└── README.md           # This file
+```
+
+### Adding Custom Tools
+
+To add new browser automation capabilities:
+
+1. Add a method to `BrowserController` class
+2. Create a wrapper function in `BrowserAgent._create_tools()`
+3. Add a `Tool` definition with name, function, and description
+
+Example:
+
+```python
+# In BrowserController
+async def custom_action(self, param: str) -> str:
+    # Your implementation
+    return "Result"
+
+# In BrowserAgent._create_tools()
+def custom_action_wrapper(param: str) -> str:
+    return asyncio.run(self.browser.custom_action(param))
+
+# Add to tools list
+Tool(
+    name="custom_action",
+    func=custom_action_wrapper,
+    description="Description of what this tool does"
+)
+```
+
+## 🐛 Troubleshooting
+
+### Common Issues
+
+**Import errors about missing packages:**
+
+```bash
+# Packages not installed yet (errors are normal before installation)
+uv pip install -e .
+playwright install chromium
+```
+
+**"Browser agent not initialized":**
+
+-   Check that your `OPENAI_API_KEY` is set in `.env`
+-   Make sure the `.env` file is in the project root
+
+**Playwright errors:**
+
+```bash
+# Reinstall Playwright browsers
+playwright install --force chromium
+```
+
+**Element not found errors:**
+
+-   The AI might be using incorrect selectors
+-   Try being more specific in your prompt
+-   Some websites use dynamic class names or have anti-bot measures
+
+**Timeout errors:**
+
+-   Some pages load slowly
+-   Try increasing timeout values in `browser_agent.py`
+-   Or use simpler websites for testing
+
+## 🚀 Deployment
+
+### Using Docker (Optional)
+
+```dockerfile
+FROM python:3.12-slim
+
+WORKDIR /app
+
+# Install system dependencies for Playwright
+RUN apt-get update && apt-get install -y \
+    wget \
+    gnupg \
+    && rm -rf /var/lib/apt/lists/*
+
+# Copy project files
+COPY . .
+
+# Install Python dependencies
+RUN pip install -e .
+RUN playwright install --with-deps chromium
+
+# Run the application
+CMD ["python", "main.py"]
+```
+
+### Environment Setup for Production
+
+```bash
+# Use headless mode in production
+HEADLESS=true
+
+# Use a more capable model if needed
+MODEL=gpt-4o
+
+# Secure your API
+# Consider adding authentication middleware
+```
+
+## 📊 Performance Tips
+
+1. **Use headless mode** (`HEADLESS=true`) for faster execution
+2. **Choose the right model**: `gpt-4o-mini` for speed, `gpt-4o` for complex tasks
+3. **Be specific in prompts**: More detailed prompts = better results
+4. **Set appropriate timeouts**: Adjust based on your target websites
+
+## 🤝 Contributing
+
+Contributions are welcome! Areas for improvement:
+
+-   Additional browser automation tools
+-   Better error handling and recovery
+-   Support for multiple concurrent browser sessions
+-   Screenshot comparison and validation
+-   Browser session persistence
+-   Integration with other LLM providers
+
+## 📝 License
+
+This project is open source and available under the MIT License.
+
+## 🙏 Acknowledgments
+
+-   [Manus.im](https://manus.im) and [Scout.new](https://scout.new) for inspiration
+-   [LangChain](https://www.langchain.com/) for the agent framework
+-   [Playwright](https://playwright.dev/) for browser automation
+-   [FastAPI](https://fastapi.tiangolo.com/) for the web framework
+
+## 📧 Support
+
+For issues, questions, or contributions, please open an issue on the project repository.
+
+---
+
+**Built with ❤️ using LangChain, Playwright, and FastAPI**