enrichment of questions

This commit is contained in:
teslim
2025-07-16 12:46:14 +01:00
parent d654707751
commit ec8ec6f190
5 changed files with 628 additions and 1 deletions
+301
View File
@@ -0,0 +1,301 @@
# Enrich Questions API Documentation
## Overview
The `enrich-questions` endpoint is a reverse API that takes existing questions and assigns them to specific areas and members. This endpoint returns the exact same response structure as `generate_questions_from_sop_v3`. Each question is intelligently assigned to the most relevant area_tag and member using OpenAI analysis.
## Endpoint
```
POST /api/v1/common/enrich-questions
```
## Authentication
Requires Bearer token authentication:
```
Authorization: Bearer <your-api-key>
```
## Request Format
### Headers
```
Content-Type: application/json
Authorization: Bearer <your-api-key>
```
### Request Body
The request body should be a JSON array of question objects. Each question object must contain:
- `question` (string): The question text
- `role` (string): The role associated with the question
- `position_id` (integer): The position ID (used as role ID in response)
- `area_tags` (array): Array of area tag objects with `name` and `id` (OpenAI selects the most relevant one)
- `members` (array): Array of member objects with `id` (algorithm selects the most appropriate one)
### Example Request
```json
[
{
"question": "Is the system monitoring working properly?",
"role": "IT Expert",
"position_id": 522,
"area_tags": [
{
"name": "IT Operations",
"id": 1276
},
{
"name": "Communication & Coordination",
"id": 1426
},
{
"name": "Quality Assurance",
"id": 1427
}
],
"members": [
{
"id": 159
}
]
},
{
"question": "Are safety protocols being followed?",
"role": "IT Expert",
"position_id": 522,
"area_tags": [
{
"name": "IT Operations",
"id": 1276
},
{
"name": "Safety Protocols",
"id": 1436
}
],
"members": [
{
"id": 159
}
]
}
]
```
## Response Format
### Success Response (200 OK)
The response structure is identical to `generate_questions_from_sop_v3`. Each question is assigned to ONE area_tag and ONE member:
```json
{
"questions": {
"items": [
{
"area_tag": 1276,
"area_name": "IT Operations",
"assigned_to": 159,
"questions": "Is the system monitoring working properly?",
"role": 522
},
{
"area_tag": 1436,
"area_name": "Safety Protocols",
"assigned_to": 159,
"questions": "Are safety protocols being followed?",
"role": 522
}
]
}
}
```
### Response Structure Explanation
- Each question creates exactly ONE item in the response
- OpenAI analyzes the question content and selects the most relevant `area_tag` from available options
- The algorithm selects the most appropriate `member` from the available members
- `area_tag`: The OpenAI-selected area tag ID
- `area_name`: The OpenAI-selected area tag name
- `assigned_to`: The selected member ID
- `questions`: The question text
- `role`: The position_id from the request (used as role identifier)
## AI-Powered Assignment Algorithm
### OpenAI Area Tag Selection
The system uses OpenAI's GPT-4o-mini model to intelligently analyze each question and select the most relevant area tag:
1. **Content Analysis**: OpenAI analyzes the question content, context, and meaning
2. **Domain Matching**: Determines which area/domain the question is actually testing or assessing
3. **Relevance Scoring**: Considers the purpose and intent of the question
4. **Smart Selection**: Chooses the most specific and primary area tag from available options
5. **Fallback**: If OpenAI analysis fails, defaults to the first available area tag
**OpenAI Prompt Guidelines:**
- Analyze question content and context
- Match questions to appropriate area tags based on meaning and purpose
- Consider what domain/area the question is actually testing
- Choose only ONE area tag per question - the most relevant one
- If multiple areas seem relevant, choose the most specific or primary one
### Member Selection
Currently uses a simple selection algorithm (first member), but can be enhanced to consider:
- Member skills and expertise
- Current workload distribution
- Availability and capacity
- Historical performance
### Error Responses
#### 400 Bad Request - Invalid Input Format
```json
{
"error": "Invalid input",
"message": "Input data must be in JSON format."
}
```
#### 400 Bad Request - Missing Required Fields
```json
{
"error": "Invalid data",
"message": "Question object at index 0 is missing required field 'question'."
}
```
#### 400 Bad Request - Invalid Array Structure
```json
{
"error": "Invalid input",
"message": "Input data must be an array of question objects."
}
```
#### 401 Unauthorized
```json
{
"error": "Unauthorized",
"message": "API key is missing or invalid."
}
```
#### 500 Internal Server Error
```json
{
"error": "Internal Server Error",
"message": "An unexpected error occurred."
}
```
## Usage Examples
### Basic Usage
```bash
curl -X POST "http://localhost:5402/api/v1/common/enrich-questions" \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '[
{
"question": "Is the system performance being monitored?",
"role": "Developer",
"position_id": 123,
"area_tags": [
{"name": "Development", "id": 1},
{"name": "Performance Monitoring", "id": 2}
],
"members": [
{"id": 456}
]
}
]'
```
### Python Example
```python
import requests
import json
url = "http://localhost:5402/api/v1/common/enrich-questions"
headers = {
"Authorization": "Bearer your-api-key",
"Content-Type": "application/json"
}
payload = [
{
"question": "Is the system performance being monitored?",
"role": "Developer",
"position_id": 123,
"area_tags": [
{"name": "Development", "id": 1},
{"name": "Performance Monitoring", "id": 2}
],
"members": [
{"id": 456}
]
}
]
response = requests.post(url, json=payload, headers=headers)
result = response.json()
print(result)
```
## Validation Rules
1. **Input must be a JSON array** of question objects
2. **Each question object must contain all required fields**:
- `question`: Non-empty string
- `role`: Non-empty string
- `position_id`: Integer
- `area_tags`: Array of objects with `name` and `id`
- `members`: Array of objects with `id`
3. **Area tags must be valid objects** with both `name` (string) and `id` (integer/string)
4. **Members must be valid objects** with `id` (integer/string)
5. **Arrays can be empty** but must be present
## Response Logic
The endpoint uses AI to intelligently assign each question to the most relevant area and member:
- **Input**: 2 questions with multiple area_tags and members each
- **Output**: 2 items (one per question) with the best area_tag and member selected for each
- **AI Analysis**: OpenAI analyzes question content and meaning to find the most relevant area_tag
- **Smart Assignment**: Uses natural language understanding to make intelligent assignments
- **No Cartesian Product**: Each question gets exactly one area assignment and one member assignment
## Performance Considerations
- **Batch Processing**: OpenAI analysis is performed in batches for efficiency
- **Caching**: Consider implementing caching for frequently assigned questions
- **Fallback**: Robust fallback mechanisms ensure the endpoint always returns valid assignments
- **Error Handling**: Comprehensive error handling for OpenAI API failures
## Integration with Existing System
This endpoint complements the existing question generation APIs:
- `POST /api/v1/qs/generate_questions_from_sop` - Generates questions from SOPs
- `POST /api/v1/qs/generate_questions_from_sop-latest` - Enhanced question generation
- `POST /api/v1/common/enrich-questions` - Enriches existing questions (NEW)
The enrich-questions endpoint returns the **exact same structure** as `generate_questions_from_sop_v3`, with AI-powered intelligent assignment of questions to the most relevant areas and members, making it seamlessly interchangeable in your application workflow.
+2
View File
@@ -3,6 +3,7 @@ from flask import Flask
from src.api.routes.sops import sops_bp from src.api.routes.sops import sops_bp
from src.api.routes.questions import qs_b from src.api.routes.questions import qs_b
from src.api.routes.chatbot import bot from src.api.routes.chatbot import bot
from src.api.routes.common import common_bp
def create_app(): def create_app():
app = Flask(__name__) app = Flask(__name__)
@@ -11,6 +12,7 @@ def create_app():
app.register_blueprint(sops_bp, url_prefix='/api/v1/sop') app.register_blueprint(sops_bp, url_prefix='/api/v1/sop')
app.register_blueprint(qs_b,url_prefix='/api/v1/qs') app.register_blueprint(qs_b,url_prefix='/api/v1/qs')
app.register_blueprint(bot,url_prefix='/api/v1/bot') app.register_blueprint(bot,url_prefix='/api/v1/bot')
app.register_blueprint(common_bp, url_prefix='/api/v1/common')
# Set up the upload folder configuration inside the src directory # Set up the upload folder configuration inside the src directory
UPLOAD_FOLDER = os.path.join(os.path.dirname(os.path.abspath(__file__)), '../../uploads') UPLOAD_FOLDER = os.path.join(os.path.dirname(os.path.abspath(__file__)), '../../uploads')
+53
View File
@@ -0,0 +1,53 @@
import os
from flask import Blueprint, request, jsonify
from src.utils.auth import auth_check
from src.services.question_enrichment import QuestionEnrichmentService
import json
# Initialize the Blueprint
common_bp = Blueprint('common', __name__)
@common_bp.route('/enrich-questions', methods=['POST'])
@auth_check
def enrich_questions():
"""
Reverse API endpoint that takes questions and assigns them to areas and members.
Returns the exact same structure as generate_questions_from_sop_v3.
Expected payload: Array of question objects with question, role, position_id, area_tags, and members.
Example payload:
[
{
"question": "Minor",
"role": "IT Expert",
"position_id": 522,
"area_tags": [
{"name": "IT Operations", "id": 1276},
{"name": "Communication & Coordination", "id": 1426}
],
"members": [
{"id": 159}
]
}
]
"""
if not request.is_json:
return jsonify({"error": "Invalid input", "message": "Input data must be in JSON format."}), 400
input_data = request.get_json()
try:
# Initialize the question enrichment service
enrichment_service = QuestionEnrichmentService()
# Enrich the questions
result = enrichment_service.enrich_questions(input_data)
if not result['success']:
return jsonify({"error": "Invalid data", "message": result['error']}), 400
# Return the exact same structure as generate_questions_from_sop_v3
return jsonify({"questions": result['questions']}), 200
except Exception as e:
return jsonify({"error": "Internal Server Error", "message": str(e)}), 500
+271
View File
@@ -0,0 +1,271 @@
import os
from typing import List, Dict, Any
from datetime import datetime
import json
import random
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
class QuestionEnrichmentService:
"""
Service class to handle question enrichment with area and member assignments.
This is the reverse of question generation - it takes existing questions and assigns them to areas and members.
"""
def __init__(self):
self.api_key = os.getenv("OPENAI_API_KEY")
self.client = OpenAI(api_key=self.api_key)
self.model = "gpt-4o-mini"
def validate_question_object(self, question_obj: Dict[str, Any], index: int) -> Dict[str, str]:
"""
Validate a single question object structure.
Args:
question_obj: The question object to validate
index: The index of the question object in the array (for error messages)
Returns:
Dict with 'valid' boolean and 'error' message if invalid
"""
required_fields = ['question', 'role', 'position_id', 'area_tags', 'members']
for field in required_fields:
if field not in question_obj:
return {
'valid': False,
'error': f"Question object at index {index} is missing required field '{field}'."
}
# Validate area_tags structure
if not isinstance(question_obj['area_tags'], list):
return {
'valid': False,
'error': f"Question object at index {index}: 'area_tags' must be an array."
}
for area_idx, area_tag in enumerate(question_obj['area_tags']):
if not isinstance(area_tag, dict) or 'name' not in area_tag or 'id' not in area_tag:
return {
'valid': False,
'error': f"Question object at index {index}: area_tag at index {area_idx} must have 'name' and 'id' fields."
}
# Validate members structure
if not isinstance(question_obj['members'], list):
return {
'valid': False,
'error': f"Question object at index {index}: 'members' must be an array."
}
for member_idx, member in enumerate(question_obj['members']):
if not isinstance(member, dict) or 'id' not in member:
return {
'valid': False,
'error': f"Question object at index {index}: member at index {member_idx} must have 'id' field."
}
return {'valid': True}
def _get_question_area_assignment_prompt(self):
"""
Get the prompt for OpenAI to assign questions to the most relevant area tags.
"""
return """
You are an expert at analyzing questions and determining which area/domain they belong to.
Your task is to analyze each question and assign it to the most relevant area tag from the provided list.
Guidelines:
1. Analyze the question content and context
2. Match the question to the most appropriate area tag based on its meaning and purpose
3. Consider what domain/area the question is actually testing or assessing
4. Choose only ONE area tag per question - the most relevant one
5. If multiple areas seem relevant, choose the most specific or primary one
Return your response as a JSON object with the question text as key and the selected area tag ID as value.
Example format:
{
"Is the system monitoring working properly?": 1276,
"Are safety protocols being followed?": 1436
}
"""
def _use_openai_for_area_assignment(self, questions_data: List[Dict[str, Any]]) -> Dict[str, int]:
"""
Use OpenAI to intelligently assign questions to the most relevant area tags.
Args:
questions_data: List of question objects
Returns:
Dict mapping question text to selected area tag ID
"""
try:
# Prepare the data for OpenAI
questions_info = []
all_area_tags = {}
for question_obj in questions_data:
question_text = question_obj['question']
area_tags = question_obj['area_tags']
questions_info.append({
"question": question_text,
"available_area_tags": area_tags
})
# Collect all unique area tags
for area_tag in area_tags:
all_area_tags[area_tag['id']] = area_tag['name']
# Create the prompt content
prompt_content = f"""
Questions to analyze and assign:
{json.dumps(questions_info, indent=2)}
Available area tags:
{json.dumps(all_area_tags, indent=2)}
For each question, select the most relevant area tag ID from its available_area_tags list.
"""
response = self.client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": self._get_question_area_assignment_prompt()},
{"role": "user", "content": prompt_content}
],
temperature=0.1,
max_tokens=1000
)
# Parse the response
response_content = response.choices[0].message.content
# Try to extract JSON from the response
try:
# Look for JSON in the response
start_idx = response_content.find('{')
end_idx = response_content.rfind('}') + 1
if start_idx != -1 and end_idx != -1:
json_str = response_content[start_idx:end_idx]
assignments = json.loads(json_str)
return assignments
except:
pass
# Fallback: return empty dict if parsing fails
return {}
except Exception as e:
print(f"Error in OpenAI area assignment: {e}")
return {}
def _find_best_area_tag_for_question(self, question_text: str, area_tags: List[Dict], openai_assignments: Dict[str, int]) -> Dict:
"""
Find the most relevant area tag for a given question using OpenAI assignments.
Args:
question_text: The question text to match
area_tags: List of available area tags
openai_assignments: OpenAI assignments from batch processing
Returns:
The most relevant area tag
"""
# First try to use OpenAI assignment
if question_text in openai_assignments:
selected_area_id = openai_assignments[question_text]
for area_tag in area_tags:
if area_tag['id'] == selected_area_id:
return area_tag
# Fallback to first area tag if OpenAI assignment not found
return area_tags[0] if area_tags else None
def _select_member_for_question(self, question_text: str, members: List[Dict]) -> Dict:
"""
Select the most appropriate member for a given question.
For now, this is a simple selection, but could be enhanced with more logic.
Args:
question_text: The question text
members: List of available members
Returns:
Selected member
"""
# For now, just select the first member
# In a real implementation, this could consider member skills, workload, etc.
return members[0] if members else None
def enrich_questions(self, questions_data: List[Dict[str, Any]]) -> Dict[str, Any]:
"""
Enrich multiple questions with area and member assignments.
Each question gets assigned to ONE area_tag and ONE member based on OpenAI analysis.
Returns the exact same structure as generate_questions_from_sop_v3.
Args:
questions_data: List of question objects to enrich
Returns:
Dict in the same format as AllQuestions model
"""
# Validate input is a list
if not isinstance(questions_data, list):
return {
'success': False,
'error': "Input data must be an array of question objects."
}
# Validate each question object
for idx, question_obj in enumerate(questions_data):
validation_result = self.validate_question_object(question_obj, idx)
if not validation_result['valid']:
return {
'success': False,
'error': validation_result['error']
}
# Use OpenAI to get intelligent area assignments
openai_assignments = self._use_openai_for_area_assignment(questions_data)
# Process the enriched questions - each question gets ONE area_tag and ONE member
enriched_items = []
for question_obj in questions_data:
# Find the best area tag for this question using OpenAI
best_area_tag = self._find_best_area_tag_for_question(
question_obj['question'],
question_obj['area_tags'],
openai_assignments
)
# Select the best member for this question
selected_member = self._select_member_for_question(
question_obj['question'],
question_obj['members']
)
# Create a single item for this question
if best_area_tag and selected_member:
item = {
"area_tag": best_area_tag['id'],
"area_name": best_area_tag['name'],
"assigned_to": selected_member['id'],
"questions": question_obj['question'],
"role": question_obj['position_id'] # Using position_id as role ID
}
enriched_items.append(item)
# Return in the exact same format as generate_questions_from_sop_v3
return {
'success': True,
'questions': {
'items': enriched_items
}
}