Files
ds_zagres_ai/ai_service_workflow.md
T
2025-05-09 15:41:16 +01:00

6.7 KiB

AI Service Workflow and Architecture

Overview

The AI Service is a modular, API-driven system that provides document processing, embedding, and chat functionality with multiple AI models. It's designed to support a chatbot application with document training, private/team chat options, and model switching capabilities.

System Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│                 │     │                 │     │                 │
│  Client Apps    │────▶│  AI Service API │────▶│  Vector Store   │
│                 │     │                 │     │   (Pinecone)    │
└─────────────────┘     └────────┬────────┘     └─────────────────┘
                                 │
                                 ▼
                        ┌─────────────────┐     ┌─────────────────┐
                        │                 │     │                 │
                        │   AI Models     │────▶│  Local Storage  │
                        │                 │     │                 │
                        └─────────────────┘     └─────────────────┘

Core Components

  1. Document Service: Processes documents, splits them into chunks, and stores embeddings
  2. Embedding Service: Generates vector embeddings for text using sentence transformers
  3. Model Service: Manages different AI models and generates responses
  4. Chat Service: Handles chat creation, message history, and team chat functionality

API Endpoints Workflow

Health Check

  • Endpoint: GET /health
  • Purpose: Simple health check to verify the service is running
  • Response: {"status": "healthy"}

Document Management Workflow

  1. Process Document

    • Endpoint: POST /documents
    • Purpose: Process a document for embedding
    • Workflow:
      • Client submits document content, title, and optional metadata
      • Document is split into chunks
      • Embeddings are generated for each chunk
      • Embeddings are stored in Pinecone
      • Document metadata is stored locally
    • Response: Document metadata including ID and chunk count
  2. Get All Documents

    • Endpoint: GET /documents
    • Purpose: Retrieve all processed documents
    • Response: List of document metadata
  3. Get Document by ID

    • Endpoint: GET /documents/{doc_id}
    • Purpose: Retrieve a specific document's metadata
    • Response: Document metadata
  4. Delete Document

    • Endpoint: DELETE /documents/{doc_id}
    • Purpose: Remove a document and its embeddings
    • Workflow:
      • Document chunks are deleted from Pinecone
      • Document metadata is removed from local storage
    • Response: Success status
  5. Search Documents

    • Endpoint: POST /documents/search
    • Purpose: Semantic search across document embeddings
    • Workflow:
      • Query text is converted to an embedding
      • Similar embeddings are found in Pinecone
      • Results are returned with metadata and similarity scores
    • Response: List of search results with metadata

Model Management Workflow

  1. Get Available Models

    • Endpoint: GET /models
    • Purpose: List all available AI models
    • Response: List of model information (ID, name, description, etc.)
  2. Get Model Information

    • Endpoint: GET /models/{model_id}
    • Purpose: Get details about a specific model
    • Response: Model information

Chat Workflow

  1. Create Chat

    • Endpoint: POST /chats
    • Purpose: Create a new chat session
    • Workflow:
      • Client provides user ID, optional title, and model ID
      • System generates a unique chat ID
      • Chat metadata is stored locally
    • Response: Created chat information
  2. Get User Chats

    • Endpoint: GET /chats/user/{user_id}
    • Purpose: Get all chats for a specific user
    • Response: List of chat information
  3. Get Chat by ID

    • Endpoint: GET /chats/{chat_id}
    • Purpose: Get a specific chat's information and messages
    • Response: Chat information including message history
  4. Send Message

    • Endpoint: POST /chats/{chat_id}/messages
    • Purpose: Send a message and get AI response
    • Workflow:
      • Client sends message with user ID and optional model parameters
      • User message is added to chat history
      • If RAG is enabled, relevant documents are retrieved
      • AI model generates a response based on chat history and context
      • Bot response is added to chat history
    • Response: Bot response message
  5. Team Chat Management

    • Add Team Member: POST /chats/{chat_id}/members/{user_id}
    • Remove Team Member: DELETE /chats/{chat_id}/members/{user_id}
    • Purpose: Manage team chat participants
    • Response: Success status
  6. Delete Chat

    • Endpoint: DELETE /chats/{chat_id}
    • Purpose: Remove a chat and its messages
    • Response: Success status

Retrieval-Augmented Generation (RAG) Workflow

When RAG is enabled in a chat message request:

  1. User message is processed
  2. Message is converted to an embedding
  3. Similar document chunks are retrieved from Pinecone
  4. Retrieved chunks are added as context to the prompt
  5. AI model generates a response using both the chat history and document context
  6. Response is returned to the user

Model Parameters

The API supports customizing AI model behavior through parameters:

  • temperature: Controls randomness (0.0-2.0)
  • max_tokens: Maximum response length
  • top_p: Nucleus sampling parameter (0.0-1.0)
  • frequency_penalty: Penalizes repeated tokens (-2.0-2.0)
  • presence_penalty: Penalizes repeated topics (-2.0-2.0)
  • stop_sequences: Sequences where generation stops
  • system_prompt: Custom system prompt to guide the model

Deployment

The service is deployed using uvicorn:

nohup uvicorn ai_service.run:app --host 0.0.0.0 --port 5251 &

Example Usage Flow

  1. Process documents for knowledge base
  2. Create a new chat session
  3. Send messages with or without RAG
  4. Optionally add team members for collaborative chats
  5. Switch models as needed for different capabilities

This architecture provides a flexible, scalable foundation for building AI-powered chat applications with document training capabilities.