174 lines
6.7 KiB
Markdown
174 lines
6.7 KiB
Markdown
|
|
# AI Service Workflow and Architecture
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
The AI Service is a modular, API-driven system that provides document processing, embedding, and chat functionality with multiple AI models. It's designed to support a chatbot application with document training, private/team chat options, and model switching capabilities.
|
||
|
|
|
||
|
|
## System Architecture
|
||
|
|
|
||
|
|
```
|
||
|
|
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||
|
|
│ │ │ │ │ │
|
||
|
|
│ Client Apps │────▶│ AI Service API │────▶│ Vector Store │
|
||
|
|
│ │ │ │ │ (Pinecone) │
|
||
|
|
└─────────────────┘ └────────┬────────┘ └─────────────────┘
|
||
|
|
│
|
||
|
|
▼
|
||
|
|
┌─────────────────┐ ┌─────────────────┐
|
||
|
|
│ │ │ │
|
||
|
|
│ AI Models │────▶│ Local Storage │
|
||
|
|
│ │ │ │
|
||
|
|
└─────────────────┘ └─────────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
## Core Components
|
||
|
|
|
||
|
|
1. **Document Service**: Processes documents, splits them into chunks, and stores embeddings
|
||
|
|
2. **Embedding Service**: Generates vector embeddings for text using sentence transformers
|
||
|
|
3. **Model Service**: Manages different AI models and generates responses
|
||
|
|
4. **Chat Service**: Handles chat creation, message history, and team chat functionality
|
||
|
|
|
||
|
|
## API Endpoints Workflow
|
||
|
|
|
||
|
|
### Health Check
|
||
|
|
|
||
|
|
- **Endpoint**: `GET /health`
|
||
|
|
- **Purpose**: Simple health check to verify the service is running
|
||
|
|
- **Response**: `{"status": "healthy"}`
|
||
|
|
|
||
|
|
### Document Management Workflow
|
||
|
|
|
||
|
|
1. **Process Document**
|
||
|
|
- **Endpoint**: `POST /documents`
|
||
|
|
- **Purpose**: Process a document for embedding
|
||
|
|
- **Workflow**:
|
||
|
|
- Client submits document content, title, and optional metadata
|
||
|
|
- Document is split into chunks
|
||
|
|
- Embeddings are generated for each chunk
|
||
|
|
- Embeddings are stored in Pinecone
|
||
|
|
- Document metadata is stored locally
|
||
|
|
- **Response**: Document metadata including ID and chunk count
|
||
|
|
|
||
|
|
2. **Get All Documents**
|
||
|
|
- **Endpoint**: `GET /documents`
|
||
|
|
- **Purpose**: Retrieve all processed documents
|
||
|
|
- **Response**: List of document metadata
|
||
|
|
|
||
|
|
3. **Get Document by ID**
|
||
|
|
- **Endpoint**: `GET /documents/{doc_id}`
|
||
|
|
- **Purpose**: Retrieve a specific document's metadata
|
||
|
|
- **Response**: Document metadata
|
||
|
|
|
||
|
|
4. **Delete Document**
|
||
|
|
- **Endpoint**: `DELETE /documents/{doc_id}`
|
||
|
|
- **Purpose**: Remove a document and its embeddings
|
||
|
|
- **Workflow**:
|
||
|
|
- Document chunks are deleted from Pinecone
|
||
|
|
- Document metadata is removed from local storage
|
||
|
|
- **Response**: Success status
|
||
|
|
|
||
|
|
5. **Search Documents**
|
||
|
|
- **Endpoint**: `POST /documents/search`
|
||
|
|
- **Purpose**: Semantic search across document embeddings
|
||
|
|
- **Workflow**:
|
||
|
|
- Query text is converted to an embedding
|
||
|
|
- Similar embeddings are found in Pinecone
|
||
|
|
- Results are returned with metadata and similarity scores
|
||
|
|
- **Response**: List of search results with metadata
|
||
|
|
|
||
|
|
### Model Management Workflow
|
||
|
|
|
||
|
|
1. **Get Available Models**
|
||
|
|
- **Endpoint**: `GET /models`
|
||
|
|
- **Purpose**: List all available AI models
|
||
|
|
- **Response**: List of model information (ID, name, description, etc.)
|
||
|
|
|
||
|
|
2. **Get Model Information**
|
||
|
|
- **Endpoint**: `GET /models/{model_id}`
|
||
|
|
- **Purpose**: Get details about a specific model
|
||
|
|
- **Response**: Model information
|
||
|
|
|
||
|
|
### Chat Workflow
|
||
|
|
|
||
|
|
1. **Create Chat**
|
||
|
|
- **Endpoint**: `POST /chats`
|
||
|
|
- **Purpose**: Create a new chat session
|
||
|
|
- **Workflow**:
|
||
|
|
- Client provides user ID, optional title, and model ID
|
||
|
|
- System generates a unique chat ID
|
||
|
|
- Chat metadata is stored locally
|
||
|
|
- **Response**: Created chat information
|
||
|
|
|
||
|
|
2. **Get User Chats**
|
||
|
|
- **Endpoint**: `GET /chats/user/{user_id}`
|
||
|
|
- **Purpose**: Get all chats for a specific user
|
||
|
|
- **Response**: List of chat information
|
||
|
|
|
||
|
|
3. **Get Chat by ID**
|
||
|
|
- **Endpoint**: `GET /chats/{chat_id}`
|
||
|
|
- **Purpose**: Get a specific chat's information and messages
|
||
|
|
- **Response**: Chat information including message history
|
||
|
|
|
||
|
|
4. **Send Message**
|
||
|
|
- **Endpoint**: `POST /chats/{chat_id}/messages`
|
||
|
|
- **Purpose**: Send a message and get AI response
|
||
|
|
- **Workflow**:
|
||
|
|
- Client sends message with user ID and optional model parameters
|
||
|
|
- User message is added to chat history
|
||
|
|
- If RAG is enabled, relevant documents are retrieved
|
||
|
|
- AI model generates a response based on chat history and context
|
||
|
|
- Bot response is added to chat history
|
||
|
|
- **Response**: Bot response message
|
||
|
|
|
||
|
|
5. **Team Chat Management**
|
||
|
|
- **Add Team Member**: `POST /chats/{chat_id}/members/{user_id}`
|
||
|
|
- **Remove Team Member**: `DELETE /chats/{chat_id}/members/{user_id}`
|
||
|
|
- **Purpose**: Manage team chat participants
|
||
|
|
- **Response**: Success status
|
||
|
|
|
||
|
|
6. **Delete Chat**
|
||
|
|
- **Endpoint**: `DELETE /chats/{chat_id}`
|
||
|
|
- **Purpose**: Remove a chat and its messages
|
||
|
|
- **Response**: Success status
|
||
|
|
|
||
|
|
## Retrieval-Augmented Generation (RAG) Workflow
|
||
|
|
|
||
|
|
When RAG is enabled in a chat message request:
|
||
|
|
|
||
|
|
1. User message is processed
|
||
|
|
2. Message is converted to an embedding
|
||
|
|
3. Similar document chunks are retrieved from Pinecone
|
||
|
|
4. Retrieved chunks are added as context to the prompt
|
||
|
|
5. AI model generates a response using both the chat history and document context
|
||
|
|
6. Response is returned to the user
|
||
|
|
|
||
|
|
## Model Parameters
|
||
|
|
|
||
|
|
The API supports customizing AI model behavior through parameters:
|
||
|
|
|
||
|
|
- `temperature`: Controls randomness (0.0-2.0)
|
||
|
|
- `max_tokens`: Maximum response length
|
||
|
|
- `top_p`: Nucleus sampling parameter (0.0-1.0)
|
||
|
|
- `frequency_penalty`: Penalizes repeated tokens (-2.0-2.0)
|
||
|
|
- `presence_penalty`: Penalizes repeated topics (-2.0-2.0)
|
||
|
|
- `stop_sequences`: Sequences where generation stops
|
||
|
|
- `system_prompt`: Custom system prompt to guide the model
|
||
|
|
|
||
|
|
## Deployment
|
||
|
|
|
||
|
|
The service is deployed using uvicorn:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
nohup uvicorn ai_service.run:app --host 0.0.0.0 --port 5251 &
|
||
|
|
```
|
||
|
|
|
||
|
|
## Example Usage Flow
|
||
|
|
|
||
|
|
1. Process documents for knowledge base
|
||
|
|
2. Create a new chat session
|
||
|
|
3. Send messages with or without RAG
|
||
|
|
4. Optionally add team members for collaborative chats
|
||
|
|
5. Switch models as needed for different capabilities
|
||
|
|
|
||
|
|
This architecture provides a flexible, scalable foundation for building AI-powered chat applications with document training capabilities.
|