2025-08-05 22:25:51 +01:00
2025-08-07 09:46:04 +01:00
2025-08-05 22:25:51 +01:00
2025-08-05 22:25:51 +01:00
2025-08-05 22:25:51 +01:00
2025-08-05 22:25:51 +01:00

AI Bookkeeper - Data Science Engine

AI-powered receipt-to-transaction matching engine using Groq LLM. This is a Data Science Engine that provides intelligent matching capabilities for backend applications.

🎯 Purpose

This Data Science Engine receives QuickBooks transaction data from backend applications and provides:

  • AI-powered receipt processing (OCR and data extraction)
  • Intelligent receipt-transaction matching with confidence scores
  • Configurable AI rules for business logic
  • Feedback logging for continuous improvement
  • RESTful API for easy integration

🚀 Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Configure API Keys

Create a .env file in the project root with your Groq API key:

# Create .env file
echo "GROQ_API_KEY=your_actual_groq_api_key_here" > .env

Important: Get your API key from Groq Console

3. Start the Server

# Option 1: Using the main script
python main.py

# Option 2: Using uvicorn directly
uvicorn main:app --host 0.0.0.0 --port 8343 --reload

4. Access API Documentation

📋 API Endpoints

Transaction Import

  • POST /transactions/import/csv - Import transactions from CSV file
  • POST /transactions/import/image - Import transactions from image/PDF

Receipt Processing

  • POST /upload-multiple - Upload multiple receipt documents
  • POST /process/{file_id} - Extract data from uploaded documents

AI Matching Engine

  • POST /match-specific - Match specific receipts to transactions using AI

AI Rules Management

  • POST /rules - Add new AI rules
  • GET /rules - List all active rules
  • DELETE /rules/{rule_name} - Delete rules

System Monitoring

  • GET /stats - Get system statistics and performance metrics
  • GET / - Health check endpoint

🔧 Core Components

AIMatcher (ai_matcher.py)

  • Uses Groq LLM to compare receipts and transactions
  • Provides confidence scores and reasoning
  • Configurable matching criteria (amount, date, vendor)
  • Rate limiting to prevent API quota exhaustion

AIRulesEngine (ai_rules.py)

  • Applies business rules for auto-approval and categorization
  • Configurable rule conditions and actions
  • Supports system and user-generated rules
  • Safe condition evaluation with proper error handling

DocumentProcessor (document_processor.py)

  • AI-powered receipt data extraction using Groq vision model
  • Supports PDF and image formats
  • Robust JSON parsing with error handling
  • Extracts vendor, amount, date, tax, and category information

MatchingEngine (matching_engine.py)

  • Main orchestrator combining all components
  • Handles the complete matching workflow
  • Provides statistics and feedback logging
  • Configurable confidence thresholds

FeedbackLogger (feedback_logger.py)

  • Tracks manual overrides for AI training
  • Maintains audit trail of user decisions
  • Enables continuous model improvement

📊 Configuration

Edit config.py to adjust:

  • Confidence threshold (default: 0.3)
  • Date tolerance days (default: 7)
  • Amount tolerance percent (default: 5%)
  • Groq API key (from environment variable)

🔄 Integration Workflow

1. Import Transactions

# Import from CSV
curl -X POST -F "file=@transactions.csv" http://localhost:8343/transactions/import/csv

# Import from image
curl -X POST -F "file=@statement.jpg" http://localhost:8343/transactions/import/image

2. Upload and Process Receipts

# Upload receipts
curl -X POST -F "files=@receipt1.jpg" -F "files=@receipt2.jpg" http://localhost:8343/upload-multiple

# Process a specific receipt
curl -X POST http://localhost:8343/process/{file_id}

3. AI Matching

# Match specific receipts
curl -X POST -H "Content-Type: application/json" \
  -d '["file_id_1", "file_id_2"]' \
  http://localhost:8343/match-specific

4. Check Results

# Get system stats
curl http://localhost:8343/stats

# View AI rules
curl http://localhost:8343/rules

🎯 Key Features

  • AI-powered matching with confidence scores
  • Rule-based auto-approval and categorization
  • Feedback logging for continuous improvement
  • Configurable matching parameters
  • RESTful JSON API for easy backend integration
  • Comprehensive error handling
  • Rate limiting to prevent API quota exhaustion
  • Robust JSON parsing for AI responses

📝 Data Formats

Transaction Input (CSV)

Date,Description,Amount,Category
2024-01-15,Starbucks Coffee,12.50,Food & Dining
2024-01-16,Office Supplies,45.99,Office

Receipt Processing Output

{
  "vendor": "Starbucks",
  "total_amount": 12.50,
  "tax_amount": 1.25,
  "date": "2024-01-15",
  "category": "Food & Dining",
  "confidence": 0.95,
  "extraction_success": true
}

Match Result Output

{
  "receipt_id": "uuid",
  "transaction_id": "transaction_123",
  "confidence_score": 0.95,
  "match_reason": "Same vendor, minor date difference (Auto-approved by rules)",
  "receipt_vendor": "Starbucks",
  "receipt_amount": 12.50,
  "transaction_vendor": "STARBUCKS",
  "transaction_amount": 12.50
}

🔍 AI Matching Criteria

The engine uses multiple criteria for matching:

  1. Amount Similarity - Compares receipt and transaction amounts (5% tolerance)
  2. Date Proximity - Checks date closeness (7-day tolerance)
  3. Vendor Matching - AI-powered vendor name comparison using Groq LLM
  4. Rule-based Auto-approval - Automatic approval for exact matches and high-confidence matches

🛠️ Development

Project Structure

├── main.py                 # FastAPI application entry point
├── ai_matcher.py           # AI-powered matching logic
├── ai_rules.py            # Business rules engine
├── document_processor.py   # Receipt data extraction
├── matching_engine.py      # Main matching orchestrator
├── feedback_logger.py      # User feedback tracking
├── models.py              # Pydantic data models
├── api_models.py          # API request/response models
├── config.py              # Configuration settings
├── requirements.txt       # Python dependencies
└── test_images/           # Test image files

Running Tests

# Test the server
curl http://localhost:8343/

# Test stats endpoint
curl http://localhost:8343/stats

# Test rules endpoint
curl http://localhost:8343/rules

🚀 Production Deployment

For production deployment:

  • Replace in-memory storage with a database (PostgreSQL recommended)
  • Configure proper authentication and authorization
  • Set up monitoring and logging (ELK stack recommended)
  • Use environment variables for all configuration
  • Implement proper error handling and retries
  • Set up rate limiting and API quotas
  • Configure CORS for frontend integration
  • Use HTTPS in production

📞 Support

This Data Science Engine is designed to be integrated with backend applications that handle:

  • QuickBooks API connections
  • User interface and workflows
  • Data persistence and management
  • External integrations

The engine focuses purely on AI/ML capabilities and provides a clean JSON API for backend integration.

🔧 Troubleshooting

Common Issues

  1. API Key Error: Ensure GROQ_API_KEY is set in your .env file
  2. Port Already in Use: Kill existing process with pkill -f "python main.py"
  3. Import Errors: Install dependencies with pip install -r requirements.txt
  4. Rate Limiting: The system includes built-in rate limiting to prevent API quota exhaustion

Logs

Check the application logs for detailed error information:

tail -f app.log
S
Description
No description provided
Readme 534 KiB
Languages
Python 100%