first commit
This commit is contained in:
@@ -0,0 +1,185 @@
|
||||
# How It Works: Manus AI Clone - System Summary
|
||||
|
||||
## Overview
|
||||
|
||||
Manus AI Clone is an AI-powered browser automation system that allows users to control a web browser using natural language prompts. The system combines a modern web frontend, FastAPI backend, LangChain AI agent, and Playwright browser automation to create an intelligent system that can understand user intent and execute complex browser tasks.
|
||||
|
||||
### Key Technologies
|
||||
- **Frontend**: HTML5, CSS3, Vanilla JavaScript
|
||||
- **Backend**: FastAPI (Python)
|
||||
- **AI Framework**: LangChain
|
||||
- **Browser Automation**: Playwright
|
||||
- **LLM**: OpenAI GPT models (gpt-4o-mini, gpt-4o, etc.)
|
||||
|
||||
---
|
||||
|
||||
## System Architecture
|
||||
|
||||
The system follows a layered architecture:
|
||||
|
||||
1. **Frontend Layer** - User interface for input and results display
|
||||
2. **Backend API Layer** - FastAPI server handling HTTP requests
|
||||
3. **Browser Agent Layer** - LangChain agent that plans and executes tasks
|
||||
4. **Browser Control Layer** - Playwright for browser automation
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||
### Frontend Layer
|
||||
|
||||
The frontend provides a web-based user interface where users can:
|
||||
- Enter natural language prompts describing browser tasks
|
||||
- View example prompts for quick reference
|
||||
- See real-time loading indicators during task execution
|
||||
- View results including:
|
||||
- Success/error status
|
||||
- Agent output messages
|
||||
- Complete action history (all browser actions taken)
|
||||
- Screenshot of the final browser state
|
||||
- Track execution statistics (total tasks, success rate, average time)
|
||||
|
||||
When a user submits a task:
|
||||
1. The JavaScript validates the input
|
||||
2. Sends an HTTP POST request to the `/execute` endpoint with the prompt
|
||||
3. Shows a loading indicator while waiting for the response
|
||||
4. Upon receiving the response, displays all results in the UI
|
||||
5. Updates statistics and shows notifications
|
||||
|
||||
Statistics are persisted in browser localStorage to maintain session data.
|
||||
|
||||
### Backend API Layer
|
||||
|
||||
The FastAPI backend serves multiple purposes:
|
||||
|
||||
**API Endpoints**:
|
||||
- `GET /` - Serves the frontend HTML interface
|
||||
- `POST /execute` - Main endpoint that executes browser automation tasks
|
||||
- `GET /status` - Returns current browser state and action history
|
||||
- `GET /health` - Health check endpoint
|
||||
|
||||
**Lifecycle Management**:
|
||||
- On startup, initializes a single `BrowserAgent` instance
|
||||
- Loads configuration from environment variables (OpenAI API key, model selection, headless mode)
|
||||
- Manages browser agent lifecycle (startup and shutdown)
|
||||
- On shutdown, properly cleans up browser resources
|
||||
|
||||
**Request Processing**:
|
||||
When a task execution request is received:
|
||||
1. Validates the request payload
|
||||
2. Checks that the browser agent is initialized
|
||||
3. Calls the agent's `execute_task()` method with the user's prompt
|
||||
4. Formats and returns the response with success status, output text, screenshot, and action history
|
||||
5. Handles errors appropriately with HTTP status codes
|
||||
|
||||
### Browser Agent Layer
|
||||
|
||||
The Browser Agent consists of two main components:
|
||||
|
||||
#### BrowserController (Low-Level Playwright Wrapper)
|
||||
|
||||
This component provides direct access to Playwright browser operations. It handles:
|
||||
- Browser initialization (launching Chromium, creating context and page)
|
||||
- Navigation to URLs
|
||||
- Clicking elements by CSS selectors
|
||||
- Typing text into input fields
|
||||
- Extracting text from page elements
|
||||
- Getting page content (title, URL, visible text)
|
||||
- Taking screenshots
|
||||
- Executing JavaScript on the page
|
||||
- Finding and inspecting elements
|
||||
- Scrolling the page
|
||||
|
||||
Every action is logged to an action history for transparency and debugging.
|
||||
|
||||
#### BrowserAgent (High-Level LangChain Agent)
|
||||
|
||||
This component uses LangChain to create an intelligent AI agent that can:
|
||||
- Understand natural language prompts
|
||||
- Break down complex tasks into steps
|
||||
- Select appropriate tools for each step
|
||||
- Execute tools in a logical sequence
|
||||
- Reason about results and adjust actions accordingly
|
||||
- Verify task completion
|
||||
|
||||
The agent has access to 8 tools that correspond to browser operations:
|
||||
1. **navigate** - Go to URLs
|
||||
2. **click** - Click elements by CSS selector
|
||||
3. **type_text** - Fill input fields (uses format: "selector|text")
|
||||
4. **get_text** - Extract text from specific elements
|
||||
5. **get_page_content** - Read current page content
|
||||
6. **scroll** - Scroll page in different directions
|
||||
7. **get_elements_info** - Find and inspect elements
|
||||
8. **execute_javascript** - Run custom JavaScript
|
||||
|
||||
Each tool has a detailed description that helps the AI agent understand when and how to use it. The agent uses these descriptions to select the right tool for each task.
|
||||
|
||||
**System Prompt**: The agent is given comprehensive instructions on how to approach tasks, when to use each tool, how to verify actions, and CSS selector usage.
|
||||
|
||||
**Async/Sync Bridge**: Since LangChain tools are synchronous but Playwright operations are async, wrapper functions use `asyncio.run()` to bridge this gap.
|
||||
|
||||
### Task Execution Flow
|
||||
|
||||
When a user submits a task like "Go to google.com and search for Python":
|
||||
|
||||
1. **Frontend** sends the prompt to the backend API
|
||||
2. **Backend** receives the request and calls the agent
|
||||
3. **Agent** analyzes the prompt and breaks it down:
|
||||
- Navigate to google.com
|
||||
- Understand the page structure
|
||||
- Find the search input field
|
||||
- Type "Python" into the search field
|
||||
- Click the search button
|
||||
- Verify the results
|
||||
4. **Agent** selects and executes tools in sequence:
|
||||
- Uses `navigate()` to go to Google
|
||||
- Uses `get_page_content()` to understand the page
|
||||
- Uses `get_elements_info()` to find the search input
|
||||
- Uses `type_text()` to enter the search query
|
||||
- Uses `click()` to submit the search
|
||||
- Uses `get_page_content()` again to verify success
|
||||
5. **Playwright** performs each browser action through the BrowserController
|
||||
6. **Results** flow back to the agent after each tool execution
|
||||
7. **Agent** reasons about the results and determines when the task is complete
|
||||
8. **Screenshot** is captured of the final browser state
|
||||
9. **Response** is assembled with success status, output message, base64-encoded screenshot, and action history
|
||||
10. **Frontend** displays all results to the user
|
||||
|
||||
### Data Flow
|
||||
|
||||
The complete flow follows this pattern:
|
||||
|
||||
**User Input** → **Frontend JavaScript** → **HTTP POST Request** → **FastAPI Backend** → **LangChain Agent** → **Tool Selection** → **Playwright Browser Actions** → **Results Flow Back** → **Agent Reasoning** → **Screenshot Capture** → **Response Assembly** → **JSON Response** → **Frontend Display** → **User Views Results**
|
||||
|
||||
### Key Features
|
||||
|
||||
**Action History**: Every browser action is logged with details (action type, selectors, URLs, text entered, etc.). This provides full transparency of what the AI did.
|
||||
|
||||
**Screenshot Capture**: After task completion, a screenshot is taken and included in the response as a base64-encoded image, giving users visual confirmation of the results.
|
||||
|
||||
**Error Handling**: Errors are handled at every layer:
|
||||
- Frontend catches network errors and displays user-friendly messages
|
||||
- Backend validates requests and returns appropriate HTTP status codes
|
||||
- Browser agent handles Playwright timeouts and element not found errors gracefully
|
||||
|
||||
**State Management**:
|
||||
- Browser state persists between tasks (single browser instance)
|
||||
- Frontend statistics persist in localStorage
|
||||
- Action history accumulates throughout the session
|
||||
|
||||
**Modular Architecture**: Each layer is independent, making the system maintainable and extensible. New browser tools can be added by extending the BrowserController and creating corresponding tool wrappers.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
Manus AI Clone transforms natural language instructions into browser automation through a carefully orchestrated pipeline:
|
||||
|
||||
1. Users provide natural language prompts through a web interface
|
||||
2. The FastAPI backend receives and validates requests
|
||||
3. A LangChain AI agent interprets the task and plans a sequence of actions
|
||||
4. The agent executes browser tools through Playwright
|
||||
5. Results are collected, including screenshots and action history
|
||||
6. Everything is displayed back to the user in the frontend
|
||||
|
||||
The system demonstrates how AI reasoning can be combined with browser automation to create an intelligent system that can interact with web pages just like a human would, but with the speed and consistency of automation.
|
||||
@@ -0,0 +1,378 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Frontend Architecture - Manus AI Clone</title>
|
||||
<style>
|
||||
body {
|
||||
font-family: "Segoe UI", Tahoma, Geneva, Verdana, sans-serif;
|
||||
max-width: 1200px;
|
||||
margin: 0 auto;
|
||||
padding: 20px;
|
||||
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
||||
min-height: 100vh;
|
||||
}
|
||||
.container {
|
||||
background: white;
|
||||
border-radius: 16px;
|
||||
padding: 40px;
|
||||
box-shadow: 0 20px 60px rgba(0, 0, 0, 0.3);
|
||||
}
|
||||
h1 {
|
||||
color: #333;
|
||||
text-align: center;
|
||||
margin-bottom: 10px;
|
||||
}
|
||||
.subtitle {
|
||||
text-align: center;
|
||||
color: #666;
|
||||
margin-bottom: 30px;
|
||||
}
|
||||
.architecture {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
|
||||
gap: 20px;
|
||||
margin-bottom: 40px;
|
||||
}
|
||||
.component {
|
||||
background: #f9f9f9;
|
||||
border-radius: 12px;
|
||||
padding: 20px;
|
||||
border-left: 4px solid #667eea;
|
||||
}
|
||||
.component h2 {
|
||||
color: #667eea;
|
||||
font-size: 1.3em;
|
||||
margin-bottom: 15px;
|
||||
}
|
||||
.component h3 {
|
||||
color: #764ba2;
|
||||
font-size: 1.1em;
|
||||
margin-top: 15px;
|
||||
margin-bottom: 10px;
|
||||
}
|
||||
.file-tree {
|
||||
background: #2d2d2d;
|
||||
color: #f8f8f2;
|
||||
padding: 20px;
|
||||
border-radius: 8px;
|
||||
font-family: "Courier New", monospace;
|
||||
overflow-x: auto;
|
||||
margin-bottom: 30px;
|
||||
}
|
||||
.file-tree pre {
|
||||
margin: 0;
|
||||
line-height: 1.5;
|
||||
}
|
||||
.folder {
|
||||
color: #f1fa8c;
|
||||
}
|
||||
.file-py {
|
||||
color: #50fa7b;
|
||||
}
|
||||
.file-html {
|
||||
color: #ff79c6;
|
||||
}
|
||||
.file-css {
|
||||
color: #8be9fd;
|
||||
}
|
||||
.file-js {
|
||||
color: #ffb86c;
|
||||
}
|
||||
.file-config {
|
||||
color: #bd93f9;
|
||||
}
|
||||
|
||||
.feature-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
|
||||
gap: 15px;
|
||||
margin: 20px 0;
|
||||
}
|
||||
.feature {
|
||||
background: white;
|
||||
padding: 15px;
|
||||
border-radius: 8px;
|
||||
border: 2px solid #e0e0e0;
|
||||
}
|
||||
.feature-icon {
|
||||
font-size: 2em;
|
||||
margin-bottom: 10px;
|
||||
}
|
||||
.feature h4 {
|
||||
color: #333;
|
||||
margin-bottom: 5px;
|
||||
}
|
||||
.feature p {
|
||||
color: #666;
|
||||
font-size: 0.9em;
|
||||
}
|
||||
|
||||
.flow-diagram {
|
||||
background: #f9f9f9;
|
||||
padding: 30px;
|
||||
border-radius: 12px;
|
||||
text-align: center;
|
||||
margin: 20px 0;
|
||||
}
|
||||
.flow-step {
|
||||
display: inline-block;
|
||||
background: white;
|
||||
padding: 15px 25px;
|
||||
border-radius: 8px;
|
||||
margin: 5px;
|
||||
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
|
||||
}
|
||||
.flow-arrow {
|
||||
display: inline-block;
|
||||
color: #667eea;
|
||||
font-size: 1.5em;
|
||||
margin: 0 10px;
|
||||
}
|
||||
|
||||
code {
|
||||
background: #f4f4f4;
|
||||
padding: 2px 6px;
|
||||
border-radius: 4px;
|
||||
font-family: "Courier New", monospace;
|
||||
}
|
||||
ul {
|
||||
line-height: 1.8;
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div class="container">
|
||||
<h1>🏗️ Frontend Architecture</h1>
|
||||
<p class="subtitle">
|
||||
Manus AI Clone - Professional Template Structure
|
||||
</p>
|
||||
|
||||
<div class="file-tree">
|
||||
<pre>
|
||||
<span class="folder">manus_ai_clone/</span>
|
||||
├── <span class="folder">templates/</span> <em># Jinja2 Templates</em>
|
||||
│ ├── <span class="file-html">base.html</span> <em># Base layout with nav & footer</em>
|
||||
│ └── <span class="file-html">index.html</span> <em># Main application page</em>
|
||||
│
|
||||
├── <span class="folder">static/</span> <em># Static Assets</em>
|
||||
│ ├── <span class="folder">css/</span>
|
||||
│ │ └── <span class="file-css">style.css</span> <em># Main stylesheet (500+ lines)</em>
|
||||
│ └── <span class="folder">js/</span>
|
||||
│ └── <span class="file-js">main.js</span> <em># Frontend logic (300+ lines)</em>
|
||||
│
|
||||
├── <span class="file-py">browser_agent.py</span> <em># Browser automation core</em>
|
||||
├── <span class="file-py">main.py</span> <em># FastAPI application</em>
|
||||
├── <span class="file-config">pyproject.toml</span> <em># Dependencies</em>
|
||||
├── <span class="file-config">.env</span> <em># Environment variables</em>
|
||||
└── <span class="file-config">README.md</span> <em># Documentation</em>
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<div class="architecture">
|
||||
<div class="component">
|
||||
<h2>📄 Templates</h2>
|
||||
<h3>base.html</h3>
|
||||
<ul>
|
||||
<li>Common layout structure</li>
|
||||
<li>Navigation bar</li>
|
||||
<li>Footer</li>
|
||||
<li>Static file links</li>
|
||||
<li>Template blocks</li>
|
||||
</ul>
|
||||
|
||||
<h3>index.html</h3>
|
||||
<ul>
|
||||
<li>Extends base.html</li>
|
||||
<li>Hero section</li>
|
||||
<li>Input form</li>
|
||||
<li>Results display</li>
|
||||
<li>Statistics cards</li>
|
||||
</ul>
|
||||
</div>
|
||||
|
||||
<div class="component">
|
||||
<h2>🎨 CSS</h2>
|
||||
<h3>style.css (500+ lines)</h3>
|
||||
<ul>
|
||||
<li>CSS custom properties</li>
|
||||
<li>Responsive design</li>
|
||||
<li>Gradient backgrounds</li>
|
||||
<li>Card layouts</li>
|
||||
<li>Animations</li>
|
||||
<li>Mobile-first</li>
|
||||
<li>Custom scrollbars</li>
|
||||
</ul>
|
||||
</div>
|
||||
|
||||
<div class="component">
|
||||
<h2>⚡ JavaScript</h2>
|
||||
<h3>main.js (300+ lines)</h3>
|
||||
<ul>
|
||||
<li>Task execution</li>
|
||||
<li>API communication</li>
|
||||
<li>Results rendering</li>
|
||||
<li>Statistics tracking</li>
|
||||
<li>Toast notifications</li>
|
||||
<li>localStorage persistence</li>
|
||||
<li>Event handlers</li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<h2 style="text-align: center; color: #667eea; margin: 40px 0 20px">
|
||||
✨ Key Features
|
||||
</h2>
|
||||
|
||||
<div class="feature-grid">
|
||||
<div class="feature">
|
||||
<div class="feature-icon">🎯</div>
|
||||
<h4>Jinja2 Templates</h4>
|
||||
<p>
|
||||
Modular template inheritance with blocks for easy
|
||||
customization
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<div class="feature">
|
||||
<div class="feature-icon">🎨</div>
|
||||
<h4>Modern CSS</h4>
|
||||
<p>
|
||||
Variables, gradients, animations, and responsive design
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<div class="feature">
|
||||
<div class="feature-icon">⚡</div>
|
||||
<h4>Interactive JS</h4>
|
||||
<p>
|
||||
Real-time updates, notifications, and persistent
|
||||
statistics
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<div class="feature">
|
||||
<div class="feature-icon">📱</div>
|
||||
<h4>Responsive</h4>
|
||||
<p>Mobile-first design that works on all devices</p>
|
||||
</div>
|
||||
|
||||
<div class="feature">
|
||||
<div class="feature-icon">💾</div>
|
||||
<h4>State Management</h4>
|
||||
<p>localStorage for statistics and user preferences</p>
|
||||
</div>
|
||||
|
||||
<div class="feature">
|
||||
<div class="feature-icon">🔔</div>
|
||||
<h4>Notifications</h4>
|
||||
<p>Toast messages for user feedback</p>
|
||||
</div>
|
||||
|
||||
<div class="feature">
|
||||
<div class="feature-icon">📸</div>
|
||||
<h4>Screenshots</h4>
|
||||
<p>Live browser screenshots with results</p>
|
||||
</div>
|
||||
|
||||
<div class="feature">
|
||||
<div class="feature-icon">📊</div>
|
||||
<h4>Statistics</h4>
|
||||
<p>Track tasks, success rate, and timing</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<h2 style="text-align: center; color: #667eea; margin: 40px 0 20px">
|
||||
🔄 Request Flow
|
||||
</h2>
|
||||
|
||||
<div class="flow-diagram">
|
||||
<div class="flow-step">User Input</div>
|
||||
<span class="flow-arrow">→</span>
|
||||
<div class="flow-step">JavaScript</div>
|
||||
<span class="flow-arrow">→</span>
|
||||
<div class="flow-step">FastAPI</div>
|
||||
<span class="flow-arrow">→</span>
|
||||
<div class="flow-step">LangChain Agent</div>
|
||||
<br /><br />
|
||||
<span class="flow-arrow">↓</span>
|
||||
<br /><br />
|
||||
<div class="flow-step">Playwright</div>
|
||||
<span class="flow-arrow">→</span>
|
||||
<div class="flow-step">Browser</div>
|
||||
<span class="flow-arrow">→</span>
|
||||
<div class="flow-step">Results</div>
|
||||
<span class="flow-arrow">→</span>
|
||||
<div class="flow-step">Display</div>
|
||||
</div>
|
||||
|
||||
<h2 style="text-align: center; color: #667eea; margin: 40px 0 20px">
|
||||
🚀 Quick Start
|
||||
</h2>
|
||||
|
||||
<div class="component">
|
||||
<h3>1. Install Dependencies</h3>
|
||||
<code>uv pip install -e .</code>
|
||||
<code>playwright install chromium</code>
|
||||
|
||||
<h3>2. Configure Environment</h3>
|
||||
<code>cp .env.example .env</code><br />
|
||||
<em>Edit .env and add your OPENAI_API_KEY</em>
|
||||
|
||||
<h3>3. Run Application</h3>
|
||||
<code>python main.py</code><br />
|
||||
<em>Or: uvicorn main:app --reload</em>
|
||||
|
||||
<h3>4. Access Interface</h3>
|
||||
<code>http://localhost:8000</code>
|
||||
</div>
|
||||
|
||||
<h2 style="text-align: center; color: #667eea; margin: 40px 0 20px">
|
||||
🛠️ Customization
|
||||
</h2>
|
||||
|
||||
<div class="architecture">
|
||||
<div class="component">
|
||||
<h3>Add New Template</h3>
|
||||
<ol>
|
||||
<li>Create <code>templates/custom.html</code></li>
|
||||
<li>Extend <code>base.html</code></li>
|
||||
<li>Add route in <code>main.py</code></li>
|
||||
</ol>
|
||||
</div>
|
||||
|
||||
<div class="component">
|
||||
<h3>Customize Styles</h3>
|
||||
<ol>
|
||||
<li>Edit CSS variables in <code>style.css</code></li>
|
||||
<li>Modify component styles</li>
|
||||
<li>Add custom classes</li>
|
||||
</ol>
|
||||
</div>
|
||||
|
||||
<div class="component">
|
||||
<h3>Add Functionality</h3>
|
||||
<ol>
|
||||
<li>Add functions in <code>main.js</code></li>
|
||||
<li>Create API endpoints</li>
|
||||
<li>Update templates</li>
|
||||
</ol>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div
|
||||
style="
|
||||
text-align: center;
|
||||
margin-top: 40px;
|
||||
padding: 20px;
|
||||
background: #f9f9f9;
|
||||
border-radius: 12px;
|
||||
"
|
||||
>
|
||||
<h3 style="color: #667eea">Built with ❤️</h3>
|
||||
<p>LangChain • Playwright • FastAPI • Jinja2</p>
|
||||
</div>
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
Reference in New Issue
Block a user