# AI Bookkeeper - Data Science Engine AI-powered receipt-to-transaction matching engine using Groq LLM. This is a **Data Science Engine** that provides intelligent matching capabilities for backend applications. ## 🎯 Purpose This Data Science Engine receives QuickBooks transaction data from backend applications and provides: - **AI-powered receipt processing** (OCR and data extraction) - **Intelligent receipt-transaction matching** with confidence scores - **Google Drive integration** for batch receipt processing - **Configurable AI rules** for business logic - **Feedback logging** for continuous improvement ## 🚀 Quick Start ### 1. Install Dependencies ```bash pip install -r requirements.txt ``` ### 2. Configure API Keys The Groq API key is already configured in `config.py` ### 3. Start the DS Engine ```bash python main.py ``` ### 4. Access API Documentation - **Swagger UI**: http://localhost:8343/docs - **ReDoc**: http://localhost:8343/redoc ## 📋 API Endpoints ### QuickBooks Data Import - `POST /transactions/import/quickbooks` - Import and convert QuickBooks transactions ### Receipt Processing - `POST /upload` - Upload receipt documents (PDF/images) - `POST /process/{file_id}` - Extract data from uploaded documents - `GET /documents` - List all processed documents ### Google Drive Integration - `POST /drive/sync` - Sync and process receipts from Google Drive - `GET /drive/folders` - List accessible Google Drive folders - `GET /drive/folder/{folder_id}` - Get folder information ### AI Matching Engine - `POST /match` - Match receipts to transactions using AI - `POST /approve` - Approve or reject AI matches ### AI Rules Management - `POST /rules` - Add new AI rules - `GET /rules` - List all active rules - `DELETE /rules/{rule_name}` - Delete rules ### System Monitoring - `GET /stats` - Get system statistics and performance metrics ## 🔧 Core Components ### **AIMatcher** (`ai_matcher.py`) - Uses Groq LLM to compare receipts and transactions - Provides confidence scores and reasoning - Configurable matching criteria (amount, date, vendor) ### **AIRulesEngine** (`ai_rules.py`) - Applies business rules for auto-approval and categorization - Configurable rule conditions and actions - Supports system and user-generated rules ### **DocumentProcessor** (`document_processor.py`) - AI-powered receipt data extraction - Supports PDF and image formats - Uses Groq vision model for OCR ### **MatchingEngine** (`matching_engine.py`) - Main orchestrator combining all components - Handles the complete matching workflow - Provides statistics and feedback logging ### **FeedbackLogger** (`feedback_logger.py`) - Tracks manual overrides for AI training - Maintains audit trail of user decisions - Enables continuous model improvement ## 📊 Configuration Edit `config.py` to adjust: - **Confidence threshold** (default: 0.8) - **Date tolerance days** (default: 7) - **Amount tolerance percent** (default: 5%) - **Groq API key** (already configured) ## 🔄 Integration Workflow ### 1. Backend Sends QuickBooks Data ```python # Backend sends QuickBooks transactions response = requests.post( "http://localhost:8343/transactions/import/quickbooks", json={ "transactions": [ { "id": "QB_TXN_123", "txn_date": "2024-01-15", "amount": 12.50, "payee_name": "Starbucks", "memo": "Coffee purchase" } ] } ) ``` ### 2. Process Receipts ```python # Sync from Google Drive response = requests.post( "http://localhost:8343/drive/sync", json={"folder_id": "your_folder_id"} ) # Or upload directly response = requests.post( "http://localhost:8343/upload", files={"file": receipt_file} ) ``` ### 3. AI Matching ```python # Match receipts to transactions response = requests.post( "http://localhost:8343/match", json={ "receipts": processed_receipts, "transactions": converted_transactions } ) ``` ### 4. User Feedback ```python # Approve or reject matches response = requests.post( "http://localhost:8343/approve", json={ "match_id": "match_123", "user_id": "user_456", "action": "approve" } ) ``` ## 🎯 Key Features - **AI-powered matching** with confidence scores - **Rule-based auto-approval** and categorization - **Feedback logging** for continuous improvement - **Configurable matching parameters** - **Google Drive integration** for batch processing - **JSON API** for easy backend integration - **Comprehensive error handling** ## 📝 Data Formats ### QuickBooks Transaction Input ```json { "id": "string", "txn_date": "YYYY-MM-DD", "amount": 0.00, "payee_name": "string", "memo": "string (optional)", "account_name": "string (optional)", "txn_type": "string (optional)" } ``` ### Match Result Output ```json { "receipt_id": "string", "transaction_id": "string", "confidence_score": 0.95, "match_reason": "string", "receipt_vendor": "string", "receipt_amount": 0.00, "transaction_vendor": "string", "transaction_amount": 0.00 } ``` ## 🔍 AI Matching Criteria The engine uses three primary criteria for matching: 1. **Amount Similarity** - Compares receipt and transaction amounts (5% tolerance) 2. **Date Proximity** - Checks date closeness (7-day tolerance) 3. **Vendor Matching** - AI-powered vendor name comparison ## 🚀 Production Deployment For production deployment: - Replace in-memory storage with a database - Configure proper authentication - Set up monitoring and logging - Use environment variables for configuration - Implement proper error handling and retries ## 📞 Support This Data Science Engine is designed to be integrated with backend applications that handle: - QuickBooks API connections - User interface and workflows - Data persistence and management - External integrations The engine focuses purely on AI/ML capabilities and provides a clean JSON API for backend integration.