6.7 KiB
AI Service Workflow and Architecture
Overview
The AI Service is a modular, API-driven system that provides document processing, embedding, and chat functionality with multiple AI models. It's designed to support a chatbot application with document training, private/team chat options, and model switching capabilities.
System Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ │ │ │ │ │
│ Client Apps │────▶│ AI Service API │────▶│ Vector Store │
│ │ │ │ │ (Pinecone) │
└─────────────────┘ └────────┬────────┘ └─────────────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐
│ │ │ │
│ AI Models │────▶│ Local Storage │
│ │ │ │
└─────────────────┘ └─────────────────┘
Core Components
- Document Service: Processes documents, splits them into chunks, and stores embeddings
- Embedding Service: Generates vector embeddings for text using sentence transformers
- Model Service: Manages different AI models and generates responses
- Chat Service: Handles chat creation, message history, and team chat functionality
API Endpoints Workflow
Health Check
- Endpoint:
GET /health - Purpose: Simple health check to verify the service is running
- Response:
{"status": "healthy"}
Document Management Workflow
-
Process Document
- Endpoint:
POST /documents - Purpose: Process a document for embedding
- Workflow:
- Client submits document content, title, and optional metadata
- Document is split into chunks
- Embeddings are generated for each chunk
- Embeddings are stored in Pinecone
- Document metadata is stored locally
- Response: Document metadata including ID and chunk count
- Endpoint:
-
Get All Documents
- Endpoint:
GET /documents - Purpose: Retrieve all processed documents
- Response: List of document metadata
- Endpoint:
-
Get Document by ID
- Endpoint:
GET /documents/{doc_id} - Purpose: Retrieve a specific document's metadata
- Response: Document metadata
- Endpoint:
-
Delete Document
- Endpoint:
DELETE /documents/{doc_id} - Purpose: Remove a document and its embeddings
- Workflow:
- Document chunks are deleted from Pinecone
- Document metadata is removed from local storage
- Response: Success status
- Endpoint:
-
Search Documents
- Endpoint:
POST /documents/search - Purpose: Semantic search across document embeddings
- Workflow:
- Query text is converted to an embedding
- Similar embeddings are found in Pinecone
- Results are returned with metadata and similarity scores
- Response: List of search results with metadata
- Endpoint:
Model Management Workflow
-
Get Available Models
- Endpoint:
GET /models - Purpose: List all available AI models
- Response: List of model information (ID, name, description, etc.)
- Endpoint:
-
Get Model Information
- Endpoint:
GET /models/{model_id} - Purpose: Get details about a specific model
- Response: Model information
- Endpoint:
Chat Workflow
-
Create Chat
- Endpoint:
POST /chats - Purpose: Create a new chat session
- Workflow:
- Client provides user ID, optional title, and model ID
- System generates a unique chat ID
- Chat metadata is stored locally
- Response: Created chat information
- Endpoint:
-
Get User Chats
- Endpoint:
GET /chats/user/{user_id} - Purpose: Get all chats for a specific user
- Response: List of chat information
- Endpoint:
-
Get Chat by ID
- Endpoint:
GET /chats/{chat_id} - Purpose: Get a specific chat's information and messages
- Response: Chat information including message history
- Endpoint:
-
Send Message
- Endpoint:
POST /chats/{chat_id}/messages - Purpose: Send a message and get AI response
- Workflow:
- Client sends message with user ID and optional model parameters
- User message is added to chat history
- If RAG is enabled, relevant documents are retrieved
- AI model generates a response based on chat history and context
- Bot response is added to chat history
- Response: Bot response message
- Endpoint:
-
Team Chat Management
- Add Team Member:
POST /chats/{chat_id}/members/{user_id} - Remove Team Member:
DELETE /chats/{chat_id}/members/{user_id} - Purpose: Manage team chat participants
- Response: Success status
- Add Team Member:
-
Delete Chat
- Endpoint:
DELETE /chats/{chat_id} - Purpose: Remove a chat and its messages
- Response: Success status
- Endpoint:
Retrieval-Augmented Generation (RAG) Workflow
When RAG is enabled in a chat message request:
- User message is processed
- Message is converted to an embedding
- Similar document chunks are retrieved from Pinecone
- Retrieved chunks are added as context to the prompt
- AI model generates a response using both the chat history and document context
- Response is returned to the user
Model Parameters
The API supports customizing AI model behavior through parameters:
temperature: Controls randomness (0.0-2.0)max_tokens: Maximum response lengthtop_p: Nucleus sampling parameter (0.0-1.0)frequency_penalty: Penalizes repeated tokens (-2.0-2.0)presence_penalty: Penalizes repeated topics (-2.0-2.0)stop_sequences: Sequences where generation stopssystem_prompt: Custom system prompt to guide the model
Deployment
The service is deployed using uvicorn:
nohup uvicorn ai_service.run:app --host 0.0.0.0 --port 5251 &
Example Usage Flow
- Process documents for knowledge base
- Create a new chat session
- Send messages with or without RAG
- Optionally add team members for collaborative chats
- Switch models as needed for different capabilities
This architecture provides a flexible, scalable foundation for building AI-powered chat applications with document training capabilities.