Files
email_alerts_v2/AI_ANALYSIS_GUIDE.md
T
bolade 75a0a3fde7 feat: Implement async AI analysis for email threads
- Added `get_latest_email_date()` function in `database.py` to retrieve the most recent email date for a given account and folder.
- Enhanced `fetch_folder_emails()` in `zoho_client.py` to intelligently determine the start date for fetching emails based on the latest email date in the database.
- Introduced `analyze_and_update_threads_async()` for asynchronous analysis of email threads, allowing concurrent processing.
- Created a synchronous wrapper `analyze_and_update_threads()` for easier integration.
- Updated `fetch_emails()` to support database session and account email parameters.
- Added comprehensive documentation in `AI_ANALYSIS_GUIDE.md` detailing the new AI analysis functionality.
- Implemented tests for the new features, including `test_fetch_with_db.py`, `test_ai_analysis.py`, and `test_single_analysis.py`.
- Added error handling and logging improvements throughout the codebase.
2025-08-11 23:20:20 +01:00

165 lines
4.7 KiB
Markdown

# AI Thread Analysis with Asyncio
This document explains how to use the new async AI analysis functionality for email threads.
## Overview
The new functionality adds AI-powered analysis to email threads, determining if they require attention (are "actionable") and generating concise summaries. It uses asyncio to process multiple threads concurrently for better performance.
## Key Functions
### `analyze_and_update_threads()`
This is the main function you'll use to analyze threads.
```python
from src.database import analyze_and_update_threads
# Analyze all unanalyzed threads for an account
analyze_and_update_threads(
account_email="user@company.com",
max_concurrent=5,
only_unanalyzed=True
)
# Analyze specific threads
analyze_and_update_threads(
account_email="user@company.com",
thread_ids=[1, 2, 3],
max_concurrent=3
)
```
**Parameters:**
- `account_email`: The email account to process
- `thread_ids`: Optional list of specific thread IDs to analyze
- `max_concurrent`: Maximum number of concurrent AI analysis tasks (default: 5)
- `only_unanalyzed`: If True, only analyze threads that haven't been analyzed yet (default: True)
### `get_threads_needing_analysis()`
Check which threads need analysis:
```python
from src.database import get_threads_needing_analysis, SessionLocal
db = SessionLocal()
threads = get_threads_needing_analysis(db, "user@company.com")
print(f"Found {len(threads)} threads needing analysis")
db.close()
```
## Database Schema Updates
The function updates the following Thread model fields:
- `actionable`: Boolean indicating if the thread requires action
- `ai_summary`: Text summary of the thread content
- `ai_confidence`: Float (0.0-1.0) confidence score
- `last_analyzed_at`: Timestamp of when analysis was performed
## Complete Workflow Example
Here's a complete workflow from email ingestion to AI analysis:
```python
from src.database import (
SessionLocal,
ingest_emails,
analyze_and_update_threads,
get_threads_requiring_reply
)
# 1. Ingest emails (using your existing email fetching logic)
db = SessionLocal()
try:
# Assuming you have fetched emails from your email provider
emails = [...] # Your email data
ingest_emails(db, "user@company.com", emails)
# 2. Run AI analysis on new threads
analyze_and_update_threads(
account_email="user@company.com",
max_concurrent=5,
only_unanalyzed=True
)
# 3. Get threads that need replies and are actionable
reply_threads = get_threads_requiring_reply(db, "user@company.com")
actionable_threads = [t for t in reply_threads if t.actionable]
print(f"Found {len(actionable_threads)} actionable threads requiring replies")
finally:
db.close()
```
## AI Analysis Details
The AI analysis:
- Uses the Groq API if `GROQ_API_KEY` environment variable is set
- Falls back to heuristic analysis if Groq is unavailable
- Analyzes the last 4 messages in each thread by default
- Generates summaries of ≤80 words
- Identifies questions, requests, and actionable items
- Ignores automated/newsletter emails
## Performance
- Uses asyncio for concurrent processing
- Configurable concurrency limit (default: 5 concurrent analyses)
- AI analysis runs in thread pool to avoid blocking
- Efficient database operations with single commit per batch
## Error Handling
- Gracefully handles individual thread analysis failures
- Continues processing other threads if one fails
- Provides detailed error logging
- Automatically rolls back database changes on failure
## Usage Tips
1. **Start with small batches**: Use `max_concurrent=3` initially to avoid overwhelming the AI service
2. **Regular analysis**: Run analysis after each email ingestion cycle
3. **Focus on actionable threads**: Prioritize threads that are both `requires_reply=True` and `actionable=True`
4. **Monitor confidence scores**: Lower confidence may indicate uncertain analysis
5. **Environment setup**: Set `GROQ_API_KEY` for better AI analysis quality
## Testing
Use the provided test scripts:
```bash
# Test the complete workflow
python3 example_workflow.py
# Test single thread analysis
python3 test_single_analysis.py
# Reset analysis data for testing
python3 reset_analysis.py
```
## Integration with Existing Code
To integrate with your existing email processing:
```python
# After your existing email ingestion
from src.database import analyze_and_update_threads
def process_emails(account_email: str):
# Your existing email fetching and ingestion code
fetch_and_ingest_emails(account_email)
# Add AI analysis
analyze_and_update_threads(
account_email=account_email,
only_unanalyzed=True
)
```
This ensures that new threads are automatically analyzed for actionability after each email sync.