Initial commit for deployment
This commit is contained in:
@@ -0,0 +1,173 @@
|
||||
# AI Service Workflow and Architecture
|
||||
|
||||
## Overview
|
||||
|
||||
The AI Service is a modular, API-driven system that provides document processing, embedding, and chat functionality with multiple AI models. It's designed to support a chatbot application with document training, private/team chat options, and model switching capabilities.
|
||||
|
||||
## System Architecture
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ │ │ │ │ │
|
||||
│ Client Apps │────▶│ AI Service API │────▶│ Vector Store │
|
||||
│ │ │ │ │ (Pinecone) │
|
||||
└─────────────────┘ └────────┬────────┘ └─────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐ ┌─────────────────┐
|
||||
│ │ │ │
|
||||
│ AI Models │────▶│ Local Storage │
|
||||
│ │ │ │
|
||||
└─────────────────┘ └─────────────────┘
|
||||
```
|
||||
|
||||
## Core Components
|
||||
|
||||
1. **Document Service**: Processes documents, splits them into chunks, and stores embeddings
|
||||
2. **Embedding Service**: Generates vector embeddings for text using sentence transformers
|
||||
3. **Model Service**: Manages different AI models and generates responses
|
||||
4. **Chat Service**: Handles chat creation, message history, and team chat functionality
|
||||
|
||||
## API Endpoints Workflow
|
||||
|
||||
### Health Check
|
||||
|
||||
- **Endpoint**: `GET /health`
|
||||
- **Purpose**: Simple health check to verify the service is running
|
||||
- **Response**: `{"status": "healthy"}`
|
||||
|
||||
### Document Management Workflow
|
||||
|
||||
1. **Process Document**
|
||||
- **Endpoint**: `POST /documents`
|
||||
- **Purpose**: Process a document for embedding
|
||||
- **Workflow**:
|
||||
- Client submits document content, title, and optional metadata
|
||||
- Document is split into chunks
|
||||
- Embeddings are generated for each chunk
|
||||
- Embeddings are stored in Pinecone
|
||||
- Document metadata is stored locally
|
||||
- **Response**: Document metadata including ID and chunk count
|
||||
|
||||
2. **Get All Documents**
|
||||
- **Endpoint**: `GET /documents`
|
||||
- **Purpose**: Retrieve all processed documents
|
||||
- **Response**: List of document metadata
|
||||
|
||||
3. **Get Document by ID**
|
||||
- **Endpoint**: `GET /documents/{doc_id}`
|
||||
- **Purpose**: Retrieve a specific document's metadata
|
||||
- **Response**: Document metadata
|
||||
|
||||
4. **Delete Document**
|
||||
- **Endpoint**: `DELETE /documents/{doc_id}`
|
||||
- **Purpose**: Remove a document and its embeddings
|
||||
- **Workflow**:
|
||||
- Document chunks are deleted from Pinecone
|
||||
- Document metadata is removed from local storage
|
||||
- **Response**: Success status
|
||||
|
||||
5. **Search Documents**
|
||||
- **Endpoint**: `POST /documents/search`
|
||||
- **Purpose**: Semantic search across document embeddings
|
||||
- **Workflow**:
|
||||
- Query text is converted to an embedding
|
||||
- Similar embeddings are found in Pinecone
|
||||
- Results are returned with metadata and similarity scores
|
||||
- **Response**: List of search results with metadata
|
||||
|
||||
### Model Management Workflow
|
||||
|
||||
1. **Get Available Models**
|
||||
- **Endpoint**: `GET /models`
|
||||
- **Purpose**: List all available AI models
|
||||
- **Response**: List of model information (ID, name, description, etc.)
|
||||
|
||||
2. **Get Model Information**
|
||||
- **Endpoint**: `GET /models/{model_id}`
|
||||
- **Purpose**: Get details about a specific model
|
||||
- **Response**: Model information
|
||||
|
||||
### Chat Workflow
|
||||
|
||||
1. **Create Chat**
|
||||
- **Endpoint**: `POST /chats`
|
||||
- **Purpose**: Create a new chat session
|
||||
- **Workflow**:
|
||||
- Client provides user ID, optional title, and model ID
|
||||
- System generates a unique chat ID
|
||||
- Chat metadata is stored locally
|
||||
- **Response**: Created chat information
|
||||
|
||||
2. **Get User Chats**
|
||||
- **Endpoint**: `GET /chats/user/{user_id}`
|
||||
- **Purpose**: Get all chats for a specific user
|
||||
- **Response**: List of chat information
|
||||
|
||||
3. **Get Chat by ID**
|
||||
- **Endpoint**: `GET /chats/{chat_id}`
|
||||
- **Purpose**: Get a specific chat's information and messages
|
||||
- **Response**: Chat information including message history
|
||||
|
||||
4. **Send Message**
|
||||
- **Endpoint**: `POST /chats/{chat_id}/messages`
|
||||
- **Purpose**: Send a message and get AI response
|
||||
- **Workflow**:
|
||||
- Client sends message with user ID and optional model parameters
|
||||
- User message is added to chat history
|
||||
- If RAG is enabled, relevant documents are retrieved
|
||||
- AI model generates a response based on chat history and context
|
||||
- Bot response is added to chat history
|
||||
- **Response**: Bot response message
|
||||
|
||||
5. **Team Chat Management**
|
||||
- **Add Team Member**: `POST /chats/{chat_id}/members/{user_id}`
|
||||
- **Remove Team Member**: `DELETE /chats/{chat_id}/members/{user_id}`
|
||||
- **Purpose**: Manage team chat participants
|
||||
- **Response**: Success status
|
||||
|
||||
6. **Delete Chat**
|
||||
- **Endpoint**: `DELETE /chats/{chat_id}`
|
||||
- **Purpose**: Remove a chat and its messages
|
||||
- **Response**: Success status
|
||||
|
||||
## Retrieval-Augmented Generation (RAG) Workflow
|
||||
|
||||
When RAG is enabled in a chat message request:
|
||||
|
||||
1. User message is processed
|
||||
2. Message is converted to an embedding
|
||||
3. Similar document chunks are retrieved from Pinecone
|
||||
4. Retrieved chunks are added as context to the prompt
|
||||
5. AI model generates a response using both the chat history and document context
|
||||
6. Response is returned to the user
|
||||
|
||||
## Model Parameters
|
||||
|
||||
The API supports customizing AI model behavior through parameters:
|
||||
|
||||
- `temperature`: Controls randomness (0.0-2.0)
|
||||
- `max_tokens`: Maximum response length
|
||||
- `top_p`: Nucleus sampling parameter (0.0-1.0)
|
||||
- `frequency_penalty`: Penalizes repeated tokens (-2.0-2.0)
|
||||
- `presence_penalty`: Penalizes repeated topics (-2.0-2.0)
|
||||
- `stop_sequences`: Sequences where generation stops
|
||||
- `system_prompt`: Custom system prompt to guide the model
|
||||
|
||||
## Deployment
|
||||
|
||||
The service is deployed using uvicorn:
|
||||
|
||||
```bash
|
||||
nohup uvicorn ai_service.run:app --host 0.0.0.0 --port 5251 &
|
||||
```
|
||||
|
||||
## Example Usage Flow
|
||||
|
||||
1. Process documents for knowledge base
|
||||
2. Create a new chat session
|
||||
3. Send messages with or without RAG
|
||||
4. Optionally add team members for collaborative chats
|
||||
5. Switch models as needed for different capabilities
|
||||
|
||||
This architecture provides a flexible, scalable foundation for building AI-powered chat applications with document training capabilities.
|
||||
Reference in New Issue
Block a user