AI-powered receipt-to-transaction matching engine using Groq LLM. This is a Data Science Engine that provides intelligent matching capabilities for backend applications.

🎯 Purpose

This Data Science Engine receives QuickBooks transaction data from backend applications and provides:

AI-powered receipt processing (OCR and data extraction)
Intelligent receipt-transaction matching with confidence scores
Configurable AI rules for business logic
Feedback logging for continuous improvement
RESTful API for easy integration

🚀 Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Configure API Keys

Create a .env file in the project root with your Groq API key:

# Create .env file
echo "GROQ_API_KEY=your_actual_groq_api_key_here" > .env

Important: Get your API key from Groq Console

3. Start the Server

# Option 1: Using the main script
python main.py

# Option 2: Using uvicorn directly
uvicorn main:app --host 0.0.0.0 --port 8343 --reload

4. Access API Documentation

Swagger UI: http://localhost:8343/docs
ReDoc: http://localhost:8343/redoc

📋 API Endpoints

Transaction Import

POST /transactions/import/csv - Import transactions from CSV file
POST /transactions/import/image - Import transactions from image/PDF

Receipt Processing

POST /upload-multiple - Upload multiple receipt documents
POST /process/{file_id} - Extract data from uploaded documents

AI Matching Engine

POST /match-specific - Match specific receipts to transactions using AI

AI Rules Management

POST /rules - Add new AI rules
GET /rules - List all active rules
DELETE /rules/{rule_name} - Delete rules

System Monitoring

GET /stats - Get system statistics and performance metrics
GET / - Health check endpoint

🔧 Core Components

AIMatcher (`ai_matcher.py`)

Uses Groq LLM to compare receipts and transactions
Provides confidence scores and reasoning
Configurable matching criteria (amount, date, vendor)
Rate limiting to prevent API quota exhaustion

AIRulesEngine (`ai_rules.py`)

Applies business rules for auto-approval and categorization
Configurable rule conditions and actions
Supports system and user-generated rules
Safe condition evaluation with proper error handling

DocumentProcessor (`document_processor.py`)

AI-powered receipt data extraction using Groq vision model
Supports PDF and image formats
Robust JSON parsing with error handling
Extracts vendor, amount, date, tax, and category information

MatchingEngine (`matching_engine.py`)

Main orchestrator combining all components
Handles the complete matching workflow
Provides statistics and feedback logging
Configurable confidence thresholds

FeedbackLogger (`feedback_logger.py`)

Tracks manual overrides for AI training
Maintains audit trail of user decisions
Enables continuous model improvement

📊 Configuration

Edit config.py to adjust:

Confidence threshold (default: 0.3)
Date tolerance days (default: 7)
Amount tolerance percent (default: 5%)
Groq API key (from environment variable)

🔄 Integration Workflow

1. Import Transactions

# Import from CSV
curl -X POST -F "file=@transactions.csv" http://localhost:8343/transactions/import/csv

# Import from image
curl -X POST -F "file=@statement.jpg" http://localhost:8343/transactions/import/image

2. Upload and Process Receipts

# Upload receipts
curl -X POST -F "files=@receipt1.jpg" -F "files=@receipt2.jpg" http://localhost:8343/upload-multiple

# Process a specific receipt
curl -X POST http://localhost:8343/process/{file_id}

3. AI Matching

# Match specific receipts
curl -X POST -H "Content-Type: application/json" \
  -d '["file_id_1", "file_id_2"]' \
  http://localhost:8343/match-specific

4. Check Results

# Get system stats
curl http://localhost:8343/stats

# View AI rules
curl http://localhost:8343/rules

🎯 Key Features

AI-powered matching with confidence scores
Rule-based auto-approval and categorization
Feedback logging for continuous improvement
Configurable matching parameters
RESTful JSON API for easy backend integration
Comprehensive error handling
Rate limiting to prevent API quota exhaustion
Robust JSON parsing for AI responses

📝 Data Formats

Transaction Input (CSV)

Date,Description,Amount,Category
2024-01-15,Starbucks Coffee,12.50,Food & Dining
2024-01-16,Office Supplies,45.99,Office

Receipt Processing Output

{
  "vendor": "Starbucks",
  "total_amount": 12.50,
  "tax_amount": 1.25,
  "date": "2024-01-15",
  "category": "Food & Dining",
  "confidence": 0.95,
  "extraction_success": true
}

Match Result Output

{
  "receipt_id": "uuid",
  "transaction_id": "transaction_123",
  "confidence_score": 0.95,
  "match_reason": "Same vendor, minor date difference (Auto-approved by rules)",
  "receipt_vendor": "Starbucks",
  "receipt_amount": 12.50,
  "transaction_vendor": "STARBUCKS",
  "transaction_amount": 12.50
}

🔍 AI Matching Criteria

The engine uses multiple criteria for matching:

Amount Similarity - Compares receipt and transaction amounts (5% tolerance)
Date Proximity - Checks date closeness (7-day tolerance)
Vendor Matching - AI-powered vendor name comparison using Groq LLM
Rule-based Auto-approval - Automatic approval for exact matches and high-confidence matches

🛠️ Development

Project Structure

├── main.py                 # FastAPI application entry point
├── ai_matcher.py           # AI-powered matching logic
├── ai_rules.py            # Business rules engine
├── document_processor.py   # Receipt data extraction
├── matching_engine.py      # Main matching orchestrator
├── feedback_logger.py      # User feedback tracking
├── models.py              # Pydantic data models
├── api_models.py          # API request/response models
├── config.py              # Configuration settings
├── requirements.txt       # Python dependencies
└── test_images/           # Test image files

Running Tests

# Test the server
curl http://localhost:8343/

# Test stats endpoint
curl http://localhost:8343/stats

# Test rules endpoint
curl http://localhost:8343/rules

🚀 Production Deployment

For production deployment:

Replace in-memory storage with a database (PostgreSQL recommended)
Configure proper authentication and authorization
Set up monitoring and logging (ELK stack recommended)
Use environment variables for all configuration
Implement proper error handling and retries
Set up rate limiting and API quotas
Configure CORS for frontend integration
Use HTTPS in production

📞 Support

This Data Science Engine is designed to be integrated with backend applications that handle:

QuickBooks API connections
User interface and workflows
Data persistence and management
External integrations

The engine focuses purely on AI/ML capabilities and provides a clean JSON API for backend integration.

🔧 Troubleshooting

Common Issues

API Key Error: Ensure GROQ_API_KEY is set in your .env file
Port Already in Use: Kill existing process with pkill -f "python main.py"
Import Errors: Install dependencies with pip install -r requirements.txt
Rate Limiting: The system includes built-in rate limiting to prevent API quota exhaustion

Logs

Check the application logs for detailed error information:

tail -f app.log

README.md

AI Bookkeeper - Data Science Engine