Files

T

bolade 75a0a3fde7 feat: Implement async AI analysis for email threads

- Added `get_latest_email_date()` function in `database.py` to retrieve the most recent email date for a given account and folder.
- Enhanced `fetch_folder_emails()` in `zoho_client.py` to intelligently determine the start date for fetching emails based on the latest email date in the database.
- Introduced `analyze_and_update_threads_async()` for asynchronous analysis of email threads, allowing concurrent processing.
- Created a synchronous wrapper `analyze_and_update_threads()` for easier integration.
- Updated `fetch_emails()` to support database session and account email parameters.
- Added comprehensive documentation in `AI_ANALYSIS_GUIDE.md` detailing the new AI analysis functionality.
- Implemented tests for the new features, including `test_fetch_with_db.py`, `test_ai_analysis.py`, and `test_single_analysis.py`.
- Added error handling and logging improvements throughout the codebase.

2025-08-11 23:20:20 +01:00

4.7 KiB

Raw Blame History

AI Thread Analysis with Asyncio

This document explains how to use the new async AI analysis functionality for email threads.

Overview

The new functionality adds AI-powered analysis to email threads, determining if they require attention (are "actionable") and generating concise summaries. It uses asyncio to process multiple threads concurrently for better performance.

Key Functions

`analyze_and_update_threads()`

This is the main function you'll use to analyze threads.

from src.database import analyze_and_update_threads

# Analyze all unanalyzed threads for an account
analyze_and_update_threads(
    account_email="user@company.com",
    max_concurrent=5,
    only_unanalyzed=True
)

# Analyze specific threads
analyze_and_update_threads(
    account_email="user@company.com", 
    thread_ids=[1, 2, 3],
    max_concurrent=3
)

Parameters:

account_email: The email account to process
thread_ids: Optional list of specific thread IDs to analyze
max_concurrent: Maximum number of concurrent AI analysis tasks (default: 5)
only_unanalyzed: If True, only analyze threads that haven't been analyzed yet (default: True)

`get_threads_needing_analysis()`

Check which threads need analysis:

from src.database import get_threads_needing_analysis, SessionLocal

db = SessionLocal()
threads = get_threads_needing_analysis(db, "user@company.com")
print(f"Found {len(threads)} threads needing analysis")
db.close()

Database Schema Updates

The function updates the following Thread model fields:

actionable: Boolean indicating if the thread requires action
ai_summary: Text summary of the thread content
ai_confidence: Float (0.0-1.0) confidence score
last_analyzed_at: Timestamp of when analysis was performed

Complete Workflow Example

Here's a complete workflow from email ingestion to AI analysis:

from src.database import (
    SessionLocal, 
    ingest_emails,
    analyze_and_update_threads,
    get_threads_requiring_reply
)

# 1. Ingest emails (using your existing email fetching logic)
db = SessionLocal()
try:
    # Assuming you have fetched emails from your email provider
    emails = [...] # Your email data
    ingest_emails(db, "user@company.com", emails)
    
    # 2. Run AI analysis on new threads
    analyze_and_update_threads(
        account_email="user@company.com",
        max_concurrent=5,
        only_unanalyzed=True
    )
    
    # 3. Get threads that need replies and are actionable
    reply_threads = get_threads_requiring_reply(db, "user@company.com")
    actionable_threads = [t for t in reply_threads if t.actionable]
    
    print(f"Found {len(actionable_threads)} actionable threads requiring replies")
    
finally:
    db.close()

AI Analysis Details

The AI analysis:

Uses the Groq API if GROQ_API_KEY environment variable is set
Falls back to heuristic analysis if Groq is unavailable
Analyzes the last 4 messages in each thread by default
Generates summaries of ≤80 words
Identifies questions, requests, and actionable items
Ignores automated/newsletter emails

Performance

Uses asyncio for concurrent processing
Configurable concurrency limit (default: 5 concurrent analyses)
AI analysis runs in thread pool to avoid blocking
Efficient database operations with single commit per batch

Error Handling

Gracefully handles individual thread analysis failures
Continues processing other threads if one fails
Provides detailed error logging
Automatically rolls back database changes on failure

Usage Tips

Start with small batches: Use max_concurrent=3 initially to avoid overwhelming the AI service
Regular analysis: Run analysis after each email ingestion cycle
Focus on actionable threads: Prioritize threads that are both requires_reply=True and actionable=True
Monitor confidence scores: Lower confidence may indicate uncertain analysis
Environment setup: Set GROQ_API_KEY for better AI analysis quality

Testing

Use the provided test scripts:

# Test the complete workflow
python3 example_workflow.py

# Test single thread analysis
python3 test_single_analysis.py

# Reset analysis data for testing
python3 reset_analysis.py

Integration with Existing Code

To integrate with your existing email processing:

# After your existing email ingestion
from src.database import analyze_and_update_threads

def process_emails(account_email: str):
    # Your existing email fetching and ingestion code
    fetch_and_ingest_emails(account_email)
    
    # Add AI analysis
    analyze_and_update_threads(
        account_email=account_email,
        only_unanalyzed=True
    )

This ensures that new threads are automatically analyzed for actionability after each email sync.

4.7 KiB Raw Blame History