# AI Bookkeeper - Data Science Engine AI-powered receipt-to-transaction matching engine using Groq LLM. This is a **Data Science Engine** that provides intelligent matching capabilities for backend applications. ## 🎯 Purpose This Data Science Engine receives QuickBooks transaction data from backend applications and provides: - **AI-powered receipt processing** (OCR and data extraction) - **Intelligent receipt-transaction matching** with confidence scores - **Configurable AI rules** for business logic - **Feedback logging** for continuous improvement - **RESTful API** for easy integration ## 🚀 Quick Start ### 1. Install Dependencies ```bash pip install -r requirements.txt ``` ### 2. Configure API Keys Create a `.env` file in the project root with your Groq API key: ```bash # Create .env file echo "GROQ_API_KEY=your_actual_groq_api_key_here" > .env ``` **Important**: Get your API key from [Groq Console](https://console.groq.com/) ### 3. Start the Server ```bash # Option 1: Using the main script python main.py # Option 2: Using uvicorn directly uvicorn main:app --host 0.0.0.0 --port 8343 --reload ``` ### 4. Access API Documentation - **Swagger UI**: http://localhost:8343/docs - **ReDoc**: http://localhost:8343/redoc ## 📋 API Endpoints ### Transaction Import - `POST /transactions/import/csv` - Import transactions from CSV file - `POST /transactions/import/image` - Import transactions from image/PDF ### Receipt Processing - `POST /upload-multiple` - Upload multiple receipt documents - `POST /process/{file_id}` - Extract data from uploaded documents ### AI Matching Engine - `POST /match-specific` - Match specific receipts to transactions using AI ### AI Rules Management - `POST /rules` - Add new AI rules - `GET /rules` - List all active rules - `DELETE /rules/{rule_name}` - Delete rules ### System Monitoring - `GET /stats` - Get system statistics and performance metrics - `GET /` - Health check endpoint ## 🔧 Core Components ### **AIMatcher** (`ai_matcher.py`) - Uses Groq LLM to compare receipts and transactions - Provides confidence scores and reasoning - Configurable matching criteria (amount, date, vendor) - Rate limiting to prevent API quota exhaustion ### **AIRulesEngine** (`ai_rules.py`) - Applies business rules for auto-approval and categorization - Configurable rule conditions and actions - Supports system and user-generated rules - Safe condition evaluation with proper error handling ### **DocumentProcessor** (`document_processor.py`) - AI-powered receipt data extraction using Groq vision model - Supports PDF and image formats - Robust JSON parsing with error handling - Extracts vendor, amount, date, tax, and category information ### **MatchingEngine** (`matching_engine.py`) - Main orchestrator combining all components - Handles the complete matching workflow - Provides statistics and feedback logging - Configurable confidence thresholds ### **FeedbackLogger** (`feedback_logger.py`) - Tracks manual overrides for AI training - Maintains audit trail of user decisions - Enables continuous model improvement ## 📊 Configuration Edit `config.py` to adjust: - **Confidence threshold** (default: 0.3) - **Date tolerance days** (default: 7) - **Amount tolerance percent** (default: 5%) - **Groq API key** (from environment variable) ## 🔄 Integration Workflow ### 1. Import Transactions ```bash # Import from CSV curl -X POST -F "file=@transactions.csv" http://localhost:8343/transactions/import/csv # Import from image curl -X POST -F "file=@statement.jpg" http://localhost:8343/transactions/import/image ``` ### 2. Upload and Process Receipts ```bash # Upload receipts curl -X POST -F "files=@receipt1.jpg" -F "files=@receipt2.jpg" http://localhost:8343/upload-multiple # Process a specific receipt curl -X POST http://localhost:8343/process/{file_id} ``` ### 3. AI Matching ```bash # Match specific receipts curl -X POST -H "Content-Type: application/json" \ -d '["file_id_1", "file_id_2"]' \ http://localhost:8343/match-specific ``` ### 4. Check Results ```bash # Get system stats curl http://localhost:8343/stats # View AI rules curl http://localhost:8343/rules ``` ## 🎯 Key Features - **AI-powered matching** with confidence scores - **Rule-based auto-approval** and categorization - **Feedback logging** for continuous improvement - **Configurable matching parameters** - **RESTful JSON API** for easy backend integration - **Comprehensive error handling** - **Rate limiting** to prevent API quota exhaustion - **Robust JSON parsing** for AI responses ## 📝 Data Formats ### Transaction Input (CSV) ```csv Date,Description,Amount,Category 2024-01-15,Starbucks Coffee,12.50,Food & Dining 2024-01-16,Office Supplies,45.99,Office ``` ### Receipt Processing Output ```json { "vendor": "Starbucks", "total_amount": 12.50, "tax_amount": 1.25, "date": "2024-01-15", "category": "Food & Dining", "confidence": 0.95, "extraction_success": true } ``` ### Match Result Output ```json { "receipt_id": "uuid", "transaction_id": "transaction_123", "confidence_score": 0.95, "match_reason": "Same vendor, minor date difference (Auto-approved by rules)", "receipt_vendor": "Starbucks", "receipt_amount": 12.50, "transaction_vendor": "STARBUCKS", "transaction_amount": 12.50 } ``` ## 🔍 AI Matching Criteria The engine uses multiple criteria for matching: 1. **Amount Similarity** - Compares receipt and transaction amounts (5% tolerance) 2. **Date Proximity** - Checks date closeness (7-day tolerance) 3. **Vendor Matching** - AI-powered vendor name comparison using Groq LLM 4. **Rule-based Auto-approval** - Automatic approval for exact matches and high-confidence matches ## 🛠️ Development ### Project Structure ``` ├── main.py # FastAPI application entry point ├── ai_matcher.py # AI-powered matching logic ├── ai_rules.py # Business rules engine ├── document_processor.py # Receipt data extraction ├── matching_engine.py # Main matching orchestrator ├── feedback_logger.py # User feedback tracking ├── models.py # Pydantic data models ├── api_models.py # API request/response models ├── config.py # Configuration settings ├── requirements.txt # Python dependencies └── test_images/ # Test image files ``` ### Running Tests ```bash # Test the server curl http://localhost:8343/ # Test stats endpoint curl http://localhost:8343/stats # Test rules endpoint curl http://localhost:8343/rules ``` ## 🚀 Production Deployment For production deployment: - Replace in-memory storage with a database (PostgreSQL recommended) - Configure proper authentication and authorization - Set up monitoring and logging (ELK stack recommended) - Use environment variables for all configuration - Implement proper error handling and retries - Set up rate limiting and API quotas - Configure CORS for frontend integration - Use HTTPS in production ## 📞 Support This Data Science Engine is designed to be integrated with backend applications that handle: - QuickBooks API connections - User interface and workflows - Data persistence and management - External integrations The engine focuses purely on AI/ML capabilities and provides a clean JSON API for backend integration. ## 🔧 Troubleshooting ### Common Issues 1. **API Key Error**: Ensure `GROQ_API_KEY` is set in your `.env` file 2. **Port Already in Use**: Kill existing process with `pkill -f "python main.py"` 3. **Import Errors**: Install dependencies with `pip install -r requirements.txt` 4. **Rate Limiting**: The system includes built-in rate limiting to prevent API quota exhaustion ### Logs Check the application logs for detailed error information: ```bash tail -f app.log ```