AI Service Workflow and Architecture

Overview

The AI Service is a modular, API-driven system that provides document processing, embedding, and chat functionality with multiple AI models. It's designed to support a chatbot application with document training, private/team chat options, and model switching capabilities.

System Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│                 │     │                 │     │                 │
│  Client Apps    │────▶│  AI Service API │────▶│  Vector Store   │
│                 │     │                 │     │   (Pinecone)    │
└─────────────────┘     └────────┬────────┘     └─────────────────┘
                                 │
                                 ▼
                        ┌─────────────────┐     ┌─────────────────┐
                        │                 │     │                 │
                        │   AI Models     │────▶│  Local Storage  │
                        │                 │     │                 │
                        └─────────────────┘     └─────────────────┘

Core Components

Document Service: Processes documents, splits them into chunks, and stores embeddings
Embedding Service: Generates vector embeddings for text using sentence transformers
Model Service: Manages different AI models and generates responses
Chat Service: Handles chat creation, message history, and team chat functionality

API Endpoints Workflow

Health Check

Endpoint: GET /health
Purpose: Simple health check to verify the service is running
Response: {"status": "healthy"}

Document Management Workflow

Process Document
- Endpoint: POST /documents
- Purpose: Process a document for embedding
- Workflow:
  - Client submits document content, title, and optional metadata
  - Document is split into chunks
  - Embeddings are generated for each chunk
  - Embeddings are stored in Pinecone
  - Document metadata is stored locally
- Response: Document metadata including ID and chunk count
Get All Documents
- Endpoint: GET /documents
- Purpose: Retrieve all processed documents
- Response: List of document metadata
Get Document by ID
- Endpoint: GET /documents/{doc_id}
- Purpose: Retrieve a specific document's metadata
- Response: Document metadata
Delete Document
- Endpoint: DELETE /documents/{doc_id}
- Purpose: Remove a document and its embeddings
- Workflow:
  - Document chunks are deleted from Pinecone
  - Document metadata is removed from local storage
- Response: Success status
Search Documents
- Endpoint: POST /documents/search
- Purpose: Semantic search across document embeddings
- Workflow:
  - Query text is converted to an embedding
  - Similar embeddings are found in Pinecone
  - Results are returned with metadata and similarity scores
- Response: List of search results with metadata

Model Management Workflow

Get Available Models
- Endpoint: GET /models
- Purpose: List all available AI models
- Response: List of model information (ID, name, description, etc.)
Get Model Information
- Endpoint: GET /models/{model_id}
- Purpose: Get details about a specific model
- Response: Model information

Chat Workflow

Create Chat
- Endpoint: POST /chats
- Purpose: Create a new chat session
- Workflow:
  - Client provides user ID, optional title, and model ID
  - System generates a unique chat ID
  - Chat metadata is stored locally
- Response: Created chat information
Get User Chats
- Endpoint: GET /chats/user/{user_id}
- Purpose: Get all chats for a specific user
- Response: List of chat information
Get Chat by ID
- Endpoint: GET /chats/{chat_id}
- Purpose: Get a specific chat's information and messages
- Response: Chat information including message history
Send Message
- Endpoint: POST /chats/{chat_id}/messages
- Purpose: Send a message and get AI response
- Workflow:
  - Client sends message with user ID and optional model parameters
  - User message is added to chat history
  - If RAG is enabled, relevant documents are retrieved
  - AI model generates a response based on chat history and context
  - Bot response is added to chat history
- Response: Bot response message
Team Chat Management
- Add Team Member: POST /chats/{chat_id}/members/{user_id}
- Remove Team Member: DELETE /chats/{chat_id}/members/{user_id}
- Purpose: Manage team chat participants
- Response: Success status
Delete Chat
- Endpoint: DELETE /chats/{chat_id}
- Purpose: Remove a chat and its messages
- Response: Success status

Retrieval-Augmented Generation (RAG) Workflow

When RAG is enabled in a chat message request:

User message is processed
Message is converted to an embedding
Similar document chunks are retrieved from Pinecone
Retrieved chunks are added as context to the prompt
AI model generates a response using both the chat history and document context
Response is returned to the user

Model Parameters

The API supports customizing AI model behavior through parameters:

temperature: Controls randomness (0.0-2.0)
max_tokens: Maximum response length
top_p: Nucleus sampling parameter (0.0-1.0)
frequency_penalty: Penalizes repeated tokens (-2.0-2.0)
presence_penalty: Penalizes repeated topics (-2.0-2.0)
stop_sequences: Sequences where generation stops
system_prompt: Custom system prompt to guide the model

Deployment

The service is deployed using uvicorn:

nohup uvicorn ai_service.run:app --host 0.0.0.0 --port 5251 &

Example Usage Flow

Process documents for knowledge base
Create a new chat session
Send messages with or without RAG
Optionally add team members for collaborative chats
Switch models as needed for different capabilities

This architecture provides a flexible, scalable foundation for building AI-powered chat applications with document training capabilities.

6.7 KiB Raw Blame History