enrichment of questions

2025-07-16 12:46:14 +01:00
parent d654707751
commit ec8ec6f190
5 changed files with 628 additions and 1 deletions
@@ -0,0 +1,301 @@
 # Enrich Questions API Documentation
 ## Overview
 The `enrich-questions` endpoint is a reverse API that takes existing questions and assigns them to specific areas and members. This endpoint returns the exact same response structure as `generate_questions_from_sop_v3`. Each question is intelligently assigned to the most relevant area_tag and member using OpenAI analysis.
 ## Endpoint
 ```
 POST /api/v1/common/enrich-questions
 ```
 ## Authentication
 Requires Bearer token authentication:
 ```
 Authorization: Bearer <your-api-key>
 ```
 ## Request Format
 ### Headers
 ```
 Content-Type: application/json
 Authorization: Bearer <your-api-key>
 ```
 ### Request Body
 The request body should be a JSON array of question objects. Each question object must contain:
 - `question` (string): The question text
 - `role` (string): The role associated with the question
 - `position_id` (integer): The position ID (used as role ID in response)
 - `area_tags` (array): Array of area tag objects with `name` and `id` (OpenAI selects the most relevant one)
 - `members` (array): Array of member objects with `id` (algorithm selects the most appropriate one)
 ### Example Request
 ```json
 [
  {
    "question": "Is the system monitoring working properly?",
    "role": "IT Expert",
    "position_id": 522,
    "area_tags": [
      {
        "name": "IT Operations",
        "id": 1276
      },
      {
        "name": "Communication & Coordination",
        "id": 1426
      },
      {
        "name": "Quality Assurance",
        "id": 1427
      }
    ],
    "members": [
      {
        "id": 159
      }
    ]
  },
  {
    "question": "Are safety protocols being followed?",
    "role": "IT Expert",
    "position_id": 522,
    "area_tags": [
      {
        "name": "IT Operations",
        "id": 1276
      },
      {
        "name": "Safety Protocols",
        "id": 1436
      }
    ],
    "members": [
      {
        "id": 159
      }
    ]
  }
 ]
 ```
 ## Response Format
 ### Success Response (200 OK)
 The response structure is identical to `generate_questions_from_sop_v3`. Each question is assigned to ONE area_tag and ONE member:
 ```json
 {
  "questions": {
    "items": [
      {
        "area_tag": 1276,
        "area_name": "IT Operations",
        "assigned_to": 159,
        "questions": "Is the system monitoring working properly?",
        "role": 522
      },
      {
        "area_tag": 1436,
        "area_name": "Safety Protocols",
        "assigned_to": 159,
        "questions": "Are safety protocols being followed?",
        "role": 522
      }
    ]
  }
 }
 ```
 ### Response Structure Explanation
 - Each question creates exactly ONE item in the response
 - OpenAI analyzes the question content and selects the most relevant `area_tag` from available options
 - The algorithm selects the most appropriate `member` from the available members
 - `area_tag`: The OpenAI-selected area tag ID
 - `area_name`: The OpenAI-selected area tag name
 - `assigned_to`: The selected member ID
 - `questions`: The question text
 - `role`: The position_id from the request (used as role identifier)
 ## AI-Powered Assignment Algorithm
 ### OpenAI Area Tag Selection
 The system uses OpenAI's GPT-4o-mini model to intelligently analyze each question and select the most relevant area tag:
 1. **Content Analysis**: OpenAI analyzes the question content, context, and meaning
 2. **Domain Matching**: Determines which area/domain the question is actually testing or assessing
 3. **Relevance Scoring**: Considers the purpose and intent of the question
 4. **Smart Selection**: Chooses the most specific and primary area tag from available options
 5. **Fallback**: If OpenAI analysis fails, defaults to the first available area tag
 **OpenAI Prompt Guidelines:**
 - Analyze question content and context
 - Match questions to appropriate area tags based on meaning and purpose
 - Consider what domain/area the question is actually testing
 - Choose only ONE area tag per question - the most relevant one
 - If multiple areas seem relevant, choose the most specific or primary one
 ### Member Selection
 Currently uses a simple selection algorithm (first member), but can be enhanced to consider:
 - Member skills and expertise
 - Current workload distribution
 - Availability and capacity
 - Historical performance
 ### Error Responses
 #### 400 Bad Request - Invalid Input Format
 ```json
 {
  "error": "Invalid input",
  "message": "Input data must be in JSON format."
 }
 ```
 #### 400 Bad Request - Missing Required Fields
 ```json
 {
  "error": "Invalid data",
  "message": "Question object at index 0 is missing required field 'question'."
 }
 ```
 #### 400 Bad Request - Invalid Array Structure
 ```json
 {
  "error": "Invalid input",
  "message": "Input data must be an array of question objects."
 }
 ```
 #### 401 Unauthorized
 ```json
 {
  "error": "Unauthorized",
  "message": "API key is missing or invalid."
 }
 ```
 #### 500 Internal Server Error
 ```json
 {
  "error": "Internal Server Error",
  "message": "An unexpected error occurred."
 }
 ```
 ## Usage Examples
 ### Basic Usage
 ```bash
 curl -X POST "http://localhost:5402/api/v1/common/enrich-questions" \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '[
    {
      "question": "Is the system performance being monitored?",
      "role": "Developer",
      "position_id": 123,
      "area_tags": [
        {"name": "Development", "id": 1},
        {"name": "Performance Monitoring", "id": 2}
      ],
      "members": [
        {"id": 456}
      ]
    }
  ]'
 ```
 ### Python Example
 ```python
 import requests
 import json
 url = "http://localhost:5402/api/v1/common/enrich-questions"
 headers = {
    "Authorization": "Bearer your-api-key",
    "Content-Type": "application/json"
 }
 payload = [
    {
        "question": "Is the system performance being monitored?",
        "role": "Developer",
        "position_id": 123,
        "area_tags": [
            {"name": "Development", "id": 1},
            {"name": "Performance Monitoring", "id": 2}
        ],
        "members": [
            {"id": 456}
        ]
    }
 ]
 response = requests.post(url, json=payload, headers=headers)
 result = response.json()
 print(result)
 ```
 ## Validation Rules
 1. **Input must be a JSON array** of question objects
 2. **Each question object must contain all required fields**:
   - `question`: Non-empty string
   - `role`: Non-empty string
   - `position_id`: Integer
   - `area_tags`: Array of objects with `name` and `id`
   - `members`: Array of objects with `id`
 3. **Area tags must be valid objects** with both `name` (string) and `id` (integer/string)
 4. **Members must be valid objects** with `id` (integer/string)
 5. **Arrays can be empty** but must be present
 ## Response Logic
 The endpoint uses AI to intelligently assign each question to the most relevant area and member:
 - **Input**: 2 questions with multiple area_tags and members each
 - **Output**: 2 items (one per question) with the best area_tag and member selected for each
 - **AI Analysis**: OpenAI analyzes question content and meaning to find the most relevant area_tag
 - **Smart Assignment**: Uses natural language understanding to make intelligent assignments
 - **No Cartesian Product**: Each question gets exactly one area assignment and one member assignment
 ## Performance Considerations
 - **Batch Processing**: OpenAI analysis is performed in batches for efficiency
 - **Caching**: Consider implementing caching for frequently assigned questions
 - **Fallback**: Robust fallback mechanisms ensure the endpoint always returns valid assignments
 - **Error Handling**: Comprehensive error handling for OpenAI API failures
 ## Integration with Existing System
 This endpoint complements the existing question generation APIs:
 - `POST /api/v1/qs/generate_questions_from_sop` - Generates questions from SOPs
 - `POST /api/v1/qs/generate_questions_from_sop-latest` - Enhanced question generation
 - `POST /api/v1/common/enrich-questions` - Enriches existing questions (NEW)
 The enrich-questions endpoint returns the **exact same structure** as `generate_questions_from_sop_v3`, with AI-powered intelligent assignment of questions to the most relevant areas and members, making it seamlessly interchangeable in your application workflow.
@@ -3,6 +3,7 @@ from flask import Flask
 from src.api.routes.sops import sops_bp
 from src.api.routes.questions import qs_b
 from src.api.routes.chatbot import bot
 from src.api.routes.common import common_bp
 def create_app():
    app = Flask(__name__)
@@ -11,6 +12,7 @@ def create_app():
    app.register_blueprint(sops_bp, url_prefix='/api/v1/sop')
    app.register_blueprint(qs_b,url_prefix='/api/v1/qs')
    app.register_blueprint(bot,url_prefix='/api/v1/bot')
    app.register_blueprint(common_bp, url_prefix='/api/v1/common')
    # Set up the upload folder configuration inside the src directory
    UPLOAD_FOLDER = os.path.join(os.path.dirname(os.path.abspath(__file__)), '../../uploads')
@@ -0,0 +1,53 @@
 import os
 from flask import Blueprint, request, jsonify
 from src.utils.auth import auth_check
 from src.services.question_enrichment import QuestionEnrichmentService
 import json
 # Initialize the Blueprint
 common_bp = Blueprint('common', __name__)
@common_bp.route('/enrich-questions', methods=['POST'])
@auth_check
 def enrich_questions():
    """
    Reverse API endpoint that takes questions and assigns them to areas and members.
    Returns the exact same structure as generate_questions_from_sop_v3.
    Expected payload: Array of question objects with question, role, position_id, area_tags, and members.
    Example payload:
    [
        {
            "question": "Minor",
            "role": "IT Expert",
            "position_id": 522,
            "area_tags": [
                {"name": "IT Operations", "id": 1276},
                {"name": "Communication & Coordination", "id": 1426}
            ],
            "members": [
                {"id": 159}
            ]
        }
    ]
    """
    if not request.is_json:
        return jsonify({"error": "Invalid input", "message": "Input data must be in JSON format."}), 400
    input_data = request.get_json()
    try:
        # Initialize the question enrichment service
        enrichment_service = QuestionEnrichmentService()
        # Enrich the questions
        result = enrichment_service.enrich_questions(input_data)
        if not result['success']:
            return jsonify({"error": "Invalid data", "message": result['error']}), 400
        # Return the exact same structure as generate_questions_from_sop_v3
        return jsonify({"questions": result['questions']}), 200
    except Exception as e:
        return jsonify({"error": "Internal Server Error", "message": str(e)}), 500 
@@ -0,0 +1,271 @@
 import os
 from typing import List, Dict, Any
 from datetime import datetime
 import json
 import random
 from openai import OpenAI
 from dotenv import load_dotenv
 load_dotenv()
 class QuestionEnrichmentService:
    """
    Service class to handle question enrichment with area and member assignments.
    This is the reverse of question generation - it takes existing questions and assigns them to areas and members.
    """
    def __init__(self):
        self.api_key = os.getenv("OPENAI_API_KEY")
        self.client = OpenAI(api_key=self.api_key)
        self.model = "gpt-4o-mini"
    def validate_question_object(self, question_obj: Dict[str, Any], index: int) -> Dict[str, str]:
        """
        Validate a single question object structure.
        Args:
            question_obj: The question object to validate
            index: The index of the question object in the array (for error messages)
        Returns:
            Dict with 'valid' boolean and 'error' message if invalid
        """
        required_fields = ['question', 'role', 'position_id', 'area_tags', 'members']
        for field in required_fields:
            if field not in question_obj:
                return {
                    'valid': False,
                    'error': f"Question object at index {index} is missing required field '{field}'."
                }
        # Validate area_tags structure
        if not isinstance(question_obj['area_tags'], list):
            return {
                'valid': False,
                'error': f"Question object at index {index}: 'area_tags' must be an array."
            }
        for area_idx, area_tag in enumerate(question_obj['area_tags']):
            if not isinstance(area_tag, dict) or 'name' not in area_tag or 'id' not in area_tag:
                return {
                    'valid': False,
                    'error': f"Question object at index {index}: area_tag at index {area_idx} must have 'name' and 'id' fields."
                }
        # Validate members structure
        if not isinstance(question_obj['members'], list):
            return {
                'valid': False,
                'error': f"Question object at index {index}: 'members' must be an array."
            }
        for member_idx, member in enumerate(question_obj['members']):
            if not isinstance(member, dict) or 'id' not in member:
                return {
                    'valid': False,
                    'error': f"Question object at index {index}: member at index {member_idx} must have 'id' field."
                }
        return {'valid': True}
    def _get_question_area_assignment_prompt(self):
        """
        Get the prompt for OpenAI to assign questions to the most relevant area tags.
        """
        return """
        You are an expert at analyzing questions and determining which area/domain they belong to.
        Your task is to analyze each question and assign it to the most relevant area tag from the provided list.
        Guidelines:
        1. Analyze the question content and context
        2. Match the question to the most appropriate area tag based on its meaning and purpose
        3. Consider what domain/area the question is actually testing or assessing
        4. Choose only ONE area tag per question - the most relevant one
        5. If multiple areas seem relevant, choose the most specific or primary one
        Return your response as a JSON object with the question text as key and the selected area tag ID as value.
        Example format:
        {
            "Is the system monitoring working properly?": 1276,
            "Are safety protocols being followed?": 1436
        }
        """
    def _use_openai_for_area_assignment(self, questions_data: List[Dict[str, Any]]) -> Dict[str, int]:
        """
        Use OpenAI to intelligently assign questions to the most relevant area tags.
        Args:
            questions_data: List of question objects
        Returns:
            Dict mapping question text to selected area tag ID
        """
        try:
            # Prepare the data for OpenAI
            questions_info = []
            all_area_tags = {}
            for question_obj in questions_data:
                question_text = question_obj['question']
                area_tags = question_obj['area_tags']
                questions_info.append({
                    "question": question_text,
                    "available_area_tags": area_tags
                })
                # Collect all unique area tags
                for area_tag in area_tags:
                    all_area_tags[area_tag['id']] = area_tag['name']
            # Create the prompt content
            prompt_content = f"""
            Questions to analyze and assign:
            {json.dumps(questions_info, indent=2)}
            Available area tags:
            {json.dumps(all_area_tags, indent=2)}
            For each question, select the most relevant area tag ID from its available_area_tags list.
            """
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": self._get_question_area_assignment_prompt()},
                    {"role": "user", "content": prompt_content}
                ],
                temperature=0.1,
                max_tokens=1000
            )
            # Parse the response
            response_content = response.choices[0].message.content
            # Try to extract JSON from the response
            try:
                # Look for JSON in the response
                start_idx = response_content.find('{')
                end_idx = response_content.rfind('}') + 1
                if start_idx != -1 and end_idx != -1:
                    json_str = response_content[start_idx:end_idx]
                    assignments = json.loads(json_str)
                    return assignments
            except:
                pass
            # Fallback: return empty dict if parsing fails
            return {}
        except Exception as e:
            print(f"Error in OpenAI area assignment: {e}")
            return {}
    def _find_best_area_tag_for_question(self, question_text: str, area_tags: List[Dict], openai_assignments: Dict[str, int]) -> Dict:
        """
        Find the most relevant area tag for a given question using OpenAI assignments.
        Args:
            question_text: The question text to match
            area_tags: List of available area tags
            openai_assignments: OpenAI assignments from batch processing
        Returns:
            The most relevant area tag
        """
        # First try to use OpenAI assignment
        if question_text in openai_assignments:
            selected_area_id = openai_assignments[question_text]
            for area_tag in area_tags:
                if area_tag['id'] == selected_area_id:
                    return area_tag
        # Fallback to first area tag if OpenAI assignment not found
        return area_tags[0] if area_tags else None
    def _select_member_for_question(self, question_text: str, members: List[Dict]) -> Dict:
        """
        Select the most appropriate member for a given question.
        For now, this is a simple selection, but could be enhanced with more logic.
        Args:
            question_text: The question text
            members: List of available members
        Returns:
            Selected member
        """
        # For now, just select the first member
        # In a real implementation, this could consider member skills, workload, etc.
        return members[0] if members else None
    def enrich_questions(self, questions_data: List[Dict[str, Any]]) -> Dict[str, Any]:
        """
        Enrich multiple questions with area and member assignments.
        Each question gets assigned to ONE area_tag and ONE member based on OpenAI analysis.
        Returns the exact same structure as generate_questions_from_sop_v3.
        Args:
            questions_data: List of question objects to enrich
        Returns:
            Dict in the same format as AllQuestions model
        """
        # Validate input is a list
        if not isinstance(questions_data, list):
            return {
                'success': False,
                'error': "Input data must be an array of question objects."
            }
        # Validate each question object
        for idx, question_obj in enumerate(questions_data):
            validation_result = self.validate_question_object(question_obj, idx)
            if not validation_result['valid']:
                return {
                    'success': False,
                    'error': validation_result['error']
                }
        # Use OpenAI to get intelligent area assignments
        openai_assignments = self._use_openai_for_area_assignment(questions_data)
        # Process the enriched questions - each question gets ONE area_tag and ONE member
        enriched_items = []
        for question_obj in questions_data:
            # Find the best area tag for this question using OpenAI
            best_area_tag = self._find_best_area_tag_for_question(
                question_obj['question'], 
                question_obj['area_tags'],
                openai_assignments
            )
            # Select the best member for this question
            selected_member = self._select_member_for_question(
                question_obj['question'],
                question_obj['members']
            )
            # Create a single item for this question
            if best_area_tag and selected_member:
                item = {
                    "area_tag": best_area_tag['id'],
                    "area_name": best_area_tag['name'],
                    "assigned_to": selected_member['id'],
                    "questions": question_obj['question'],
                    "role": question_obj['position_id']  # Using position_id as role ID
                }
                enriched_items.append(item)
        # Return in the exact same format as generate_questions_from_sop_v3
        return {
            'success': True,
            'questions': {
                'items': enriched_items
            }
        }