75a0a3fde7
- Added `get_latest_email_date()` function in `database.py` to retrieve the most recent email date for a given account and folder. - Enhanced `fetch_folder_emails()` in `zoho_client.py` to intelligently determine the start date for fetching emails based on the latest email date in the database. - Introduced `analyze_and_update_threads_async()` for asynchronous analysis of email threads, allowing concurrent processing. - Created a synchronous wrapper `analyze_and_update_threads()` for easier integration. - Updated `fetch_emails()` to support database session and account email parameters. - Added comprehensive documentation in `AI_ANALYSIS_GUIDE.md` detailing the new AI analysis functionality. - Implemented tests for the new features, including `test_fetch_with_db.py`, `test_ai_analysis.py`, and `test_single_analysis.py`. - Added error handling and logging improvements throughout the codebase.
165 lines
4.7 KiB
Markdown
165 lines
4.7 KiB
Markdown
# AI Thread Analysis with Asyncio
|
|
|
|
This document explains how to use the new async AI analysis functionality for email threads.
|
|
|
|
## Overview
|
|
|
|
The new functionality adds AI-powered analysis to email threads, determining if they require attention (are "actionable") and generating concise summaries. It uses asyncio to process multiple threads concurrently for better performance.
|
|
|
|
## Key Functions
|
|
|
|
### `analyze_and_update_threads()`
|
|
|
|
This is the main function you'll use to analyze threads.
|
|
|
|
```python
|
|
from src.database import analyze_and_update_threads
|
|
|
|
# Analyze all unanalyzed threads for an account
|
|
analyze_and_update_threads(
|
|
account_email="user@company.com",
|
|
max_concurrent=5,
|
|
only_unanalyzed=True
|
|
)
|
|
|
|
# Analyze specific threads
|
|
analyze_and_update_threads(
|
|
account_email="user@company.com",
|
|
thread_ids=[1, 2, 3],
|
|
max_concurrent=3
|
|
)
|
|
```
|
|
|
|
**Parameters:**
|
|
- `account_email`: The email account to process
|
|
- `thread_ids`: Optional list of specific thread IDs to analyze
|
|
- `max_concurrent`: Maximum number of concurrent AI analysis tasks (default: 5)
|
|
- `only_unanalyzed`: If True, only analyze threads that haven't been analyzed yet (default: True)
|
|
|
|
### `get_threads_needing_analysis()`
|
|
|
|
Check which threads need analysis:
|
|
|
|
```python
|
|
from src.database import get_threads_needing_analysis, SessionLocal
|
|
|
|
db = SessionLocal()
|
|
threads = get_threads_needing_analysis(db, "user@company.com")
|
|
print(f"Found {len(threads)} threads needing analysis")
|
|
db.close()
|
|
```
|
|
|
|
## Database Schema Updates
|
|
|
|
The function updates the following Thread model fields:
|
|
|
|
- `actionable`: Boolean indicating if the thread requires action
|
|
- `ai_summary`: Text summary of the thread content
|
|
- `ai_confidence`: Float (0.0-1.0) confidence score
|
|
- `last_analyzed_at`: Timestamp of when analysis was performed
|
|
|
|
## Complete Workflow Example
|
|
|
|
Here's a complete workflow from email ingestion to AI analysis:
|
|
|
|
```python
|
|
from src.database import (
|
|
SessionLocal,
|
|
ingest_emails,
|
|
analyze_and_update_threads,
|
|
get_threads_requiring_reply
|
|
)
|
|
|
|
# 1. Ingest emails (using your existing email fetching logic)
|
|
db = SessionLocal()
|
|
try:
|
|
# Assuming you have fetched emails from your email provider
|
|
emails = [...] # Your email data
|
|
ingest_emails(db, "user@company.com", emails)
|
|
|
|
# 2. Run AI analysis on new threads
|
|
analyze_and_update_threads(
|
|
account_email="user@company.com",
|
|
max_concurrent=5,
|
|
only_unanalyzed=True
|
|
)
|
|
|
|
# 3. Get threads that need replies and are actionable
|
|
reply_threads = get_threads_requiring_reply(db, "user@company.com")
|
|
actionable_threads = [t for t in reply_threads if t.actionable]
|
|
|
|
print(f"Found {len(actionable_threads)} actionable threads requiring replies")
|
|
|
|
finally:
|
|
db.close()
|
|
```
|
|
|
|
## AI Analysis Details
|
|
|
|
The AI analysis:
|
|
|
|
- Uses the Groq API if `GROQ_API_KEY` environment variable is set
|
|
- Falls back to heuristic analysis if Groq is unavailable
|
|
- Analyzes the last 4 messages in each thread by default
|
|
- Generates summaries of ≤80 words
|
|
- Identifies questions, requests, and actionable items
|
|
- Ignores automated/newsletter emails
|
|
|
|
## Performance
|
|
|
|
- Uses asyncio for concurrent processing
|
|
- Configurable concurrency limit (default: 5 concurrent analyses)
|
|
- AI analysis runs in thread pool to avoid blocking
|
|
- Efficient database operations with single commit per batch
|
|
|
|
## Error Handling
|
|
|
|
- Gracefully handles individual thread analysis failures
|
|
- Continues processing other threads if one fails
|
|
- Provides detailed error logging
|
|
- Automatically rolls back database changes on failure
|
|
|
|
## Usage Tips
|
|
|
|
1. **Start with small batches**: Use `max_concurrent=3` initially to avoid overwhelming the AI service
|
|
2. **Regular analysis**: Run analysis after each email ingestion cycle
|
|
3. **Focus on actionable threads**: Prioritize threads that are both `requires_reply=True` and `actionable=True`
|
|
4. **Monitor confidence scores**: Lower confidence may indicate uncertain analysis
|
|
5. **Environment setup**: Set `GROQ_API_KEY` for better AI analysis quality
|
|
|
|
## Testing
|
|
|
|
Use the provided test scripts:
|
|
|
|
```bash
|
|
# Test the complete workflow
|
|
python3 example_workflow.py
|
|
|
|
# Test single thread analysis
|
|
python3 test_single_analysis.py
|
|
|
|
# Reset analysis data for testing
|
|
python3 reset_analysis.py
|
|
```
|
|
|
|
## Integration with Existing Code
|
|
|
|
To integrate with your existing email processing:
|
|
|
|
```python
|
|
# After your existing email ingestion
|
|
from src.database import analyze_and_update_threads
|
|
|
|
def process_emails(account_email: str):
|
|
# Your existing email fetching and ingestion code
|
|
fetch_and_ingest_emails(account_email)
|
|
|
|
# Add AI analysis
|
|
analyze_and_update_threads(
|
|
account_email=account_email,
|
|
only_unanalyzed=True
|
|
)
|
|
```
|
|
|
|
This ensures that new threads are automatically analyzed for actionability after each email sync.
|