2025-07-02 16:52:49 +01:00

AI Bookkeeper - Data Science Engine

AI-powered receipt-to-transaction matching engine using Groq LLM. This is a Data Science Engine that provides intelligent matching capabilities for backend applications.

🎯 Purpose

This Data Science Engine receives QuickBooks transaction data from backend applications and provides:

  • AI-powered receipt processing (OCR and data extraction)
  • Intelligent receipt-transaction matching with confidence scores
  • Google Drive integration for batch receipt processing
  • Configurable AI rules for business logic
  • Feedback logging for continuous improvement

🚀 Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Configure API Keys

The Groq API key is already configured in config.py

3. Start the DS Engine

python main.py

4. Access API Documentation

📋 API Endpoints

QuickBooks Data Import

  • POST /transactions/import/quickbooks - Import and convert QuickBooks transactions

Receipt Processing

  • POST /upload - Upload receipt documents (PDF/images)
  • POST /process/{file_id} - Extract data from uploaded documents
  • GET /documents - List all processed documents

Google Drive Integration

  • POST /drive/sync - Sync and process receipts from Google Drive
  • GET /drive/folders - List accessible Google Drive folders
  • GET /drive/folder/{folder_id} - Get folder information

AI Matching Engine

  • POST /match - Match receipts to transactions using AI
  • POST /approve - Approve or reject AI matches

AI Rules Management

  • POST /rules - Add new AI rules
  • GET /rules - List all active rules
  • DELETE /rules/{rule_name} - Delete rules

System Monitoring

  • GET /stats - Get system statistics and performance metrics

🔧 Core Components

AIMatcher (ai_matcher.py)

  • Uses Groq LLM to compare receipts and transactions
  • Provides confidence scores and reasoning
  • Configurable matching criteria (amount, date, vendor)

AIRulesEngine (ai_rules.py)

  • Applies business rules for auto-approval and categorization
  • Configurable rule conditions and actions
  • Supports system and user-generated rules

DocumentProcessor (document_processor.py)

  • AI-powered receipt data extraction
  • Supports PDF and image formats
  • Uses Groq vision model for OCR

MatchingEngine (matching_engine.py)

  • Main orchestrator combining all components
  • Handles the complete matching workflow
  • Provides statistics and feedback logging

FeedbackLogger (feedback_logger.py)

  • Tracks manual overrides for AI training
  • Maintains audit trail of user decisions
  • Enables continuous model improvement

📊 Configuration

Edit config.py to adjust:

  • Confidence threshold (default: 0.8)
  • Date tolerance days (default: 7)
  • Amount tolerance percent (default: 5%)
  • Groq API key (already configured)

🔄 Integration Workflow

1. Backend Sends QuickBooks Data

# Backend sends QuickBooks transactions
response = requests.post(
    "http://localhost:8343/transactions/import/quickbooks",
    json={
        "transactions": [
            {
                "id": "QB_TXN_123",
                "txn_date": "2024-01-15",
                "amount": 12.50,
                "payee_name": "Starbucks",
                "memo": "Coffee purchase"
            }
        ]
    }
)

2. Process Receipts

# Sync from Google Drive
response = requests.post(
    "http://localhost:8343/drive/sync",
    json={"folder_id": "your_folder_id"}
)

# Or upload directly
response = requests.post(
    "http://localhost:8343/upload",
    files={"file": receipt_file}
)

3. AI Matching

# Match receipts to transactions
response = requests.post(
    "http://localhost:8343/match",
    json={
        "receipts": processed_receipts,
        "transactions": converted_transactions
    }
)

4. User Feedback

# Approve or reject matches
response = requests.post(
    "http://localhost:8343/approve",
    json={
        "match_id": "match_123",
        "user_id": "user_456",
        "action": "approve"
    }
)

🎯 Key Features

  • AI-powered matching with confidence scores
  • Rule-based auto-approval and categorization
  • Feedback logging for continuous improvement
  • Configurable matching parameters
  • Google Drive integration for batch processing
  • JSON API for easy backend integration
  • Comprehensive error handling

📝 Data Formats

QuickBooks Transaction Input

{
  "id": "string",
  "txn_date": "YYYY-MM-DD",
  "amount": 0.00,
  "payee_name": "string",
  "memo": "string (optional)",
  "account_name": "string (optional)",
  "txn_type": "string (optional)"
}

Match Result Output

{
  "receipt_id": "string",
  "transaction_id": "string",
  "confidence_score": 0.95,
  "match_reason": "string",
  "receipt_vendor": "string",
  "receipt_amount": 0.00,
  "transaction_vendor": "string",
  "transaction_amount": 0.00
}

🔍 AI Matching Criteria

The engine uses three primary criteria for matching:

  1. Amount Similarity - Compares receipt and transaction amounts (5% tolerance)
  2. Date Proximity - Checks date closeness (7-day tolerance)
  3. Vendor Matching - AI-powered vendor name comparison

🚀 Production Deployment

For production deployment:

  • Replace in-memory storage with a database
  • Configure proper authentication
  • Set up monitoring and logging
  • Use environment variables for configuration
  • Implement proper error handling and retries

📞 Support

This Data Science Engine is designed to be integrated with backend applications that handle:

  • QuickBooks API connections
  • User interface and workflows
  • Data persistence and management
  • External integrations

The engine focuses purely on AI/ML capabilities and provides a clean JSON API for backend integration.

S
Description
No description provided
Readme 122 MiB
Languages
Python 100%