1784d2e40645cf8d38852bea888c9885a8cc840a
AI Bookkeeper - Data Science Engine
AI-powered receipt-to-transaction matching engine using Groq LLM. This is a Data Science Engine that provides intelligent matching capabilities for backend applications.
🎯 Purpose
This Data Science Engine receives QuickBooks transaction data from backend applications and provides:
- AI-powered receipt processing (OCR and data extraction)
- Intelligent receipt-transaction matching with confidence scores
- Configurable AI rules for business logic
- Feedback logging for continuous improvement
- RESTful API for easy integration
🚀 Quick Start
1. Install Dependencies
pip install -r requirements.txt
2. Configure API Keys
Create a .env file in the project root with your Groq API key:
# Create .env file
echo "GROQ_API_KEY=your_actual_groq_api_key_here" > .env
Important: Get your API key from Groq Console
3. Start the Server
# Option 1: Using the main script
python main.py
# Option 2: Using uvicorn directly
uvicorn main:app --host 0.0.0.0 --port 8343 --reload
4. Access API Documentation
- Swagger UI: http://localhost:8343/docs
- ReDoc: http://localhost:8343/redoc
📋 API Endpoints
Transaction Import
POST /transactions/import/csv- Import transactions from CSV filePOST /transactions/import/image- Import transactions from image/PDF
Receipt Processing
POST /upload-multiple- Upload multiple receipt documentsPOST /process/{file_id}- Extract data from uploaded documents
AI Matching Engine
POST /match-specific- Match specific receipts to transactions using AI
AI Rules Management
POST /rules- Add new AI rulesGET /rules- List all active rulesDELETE /rules/{rule_name}- Delete rules
System Monitoring
GET /stats- Get system statistics and performance metricsGET /- Health check endpoint
🔧 Core Components
AIMatcher (ai_matcher.py)
- Uses Groq LLM to compare receipts and transactions
- Provides confidence scores and reasoning
- Configurable matching criteria (amount, date, vendor)
- Rate limiting to prevent API quota exhaustion
AIRulesEngine (ai_rules.py)
- Applies business rules for auto-approval and categorization
- Configurable rule conditions and actions
- Supports system and user-generated rules
- Safe condition evaluation with proper error handling
DocumentProcessor (document_processor.py)
- AI-powered receipt data extraction using Groq vision model
- Supports PDF and image formats
- Robust JSON parsing with error handling
- Extracts vendor, amount, date, tax, and category information
MatchingEngine (matching_engine.py)
- Main orchestrator combining all components
- Handles the complete matching workflow
- Provides statistics and feedback logging
- Configurable confidence thresholds
FeedbackLogger (feedback_logger.py)
- Tracks manual overrides for AI training
- Maintains audit trail of user decisions
- Enables continuous model improvement
📊 Configuration
Edit config.py to adjust:
- Confidence threshold (default: 0.3)
- Date tolerance days (default: 7)
- Amount tolerance percent (default: 5%)
- Groq API key (from environment variable)
🔄 Integration Workflow
1. Import Transactions
# Import from CSV
curl -X POST -F "file=@transactions.csv" http://localhost:8343/transactions/import/csv
# Import from image
curl -X POST -F "file=@statement.jpg" http://localhost:8343/transactions/import/image
2. Upload and Process Receipts
# Upload receipts
curl -X POST -F "files=@receipt1.jpg" -F "files=@receipt2.jpg" http://localhost:8343/upload-multiple
# Process a specific receipt
curl -X POST http://localhost:8343/process/{file_id}
3. AI Matching
# Match specific receipts
curl -X POST -H "Content-Type: application/json" \
-d '["file_id_1", "file_id_2"]' \
http://localhost:8343/match-specific
4. Check Results
# Get system stats
curl http://localhost:8343/stats
# View AI rules
curl http://localhost:8343/rules
🎯 Key Features
- AI-powered matching with confidence scores
- Rule-based auto-approval and categorization
- Feedback logging for continuous improvement
- Configurable matching parameters
- RESTful JSON API for easy backend integration
- Comprehensive error handling
- Rate limiting to prevent API quota exhaustion
- Robust JSON parsing for AI responses
📝 Data Formats
Transaction Input (CSV)
Date,Description,Amount,Category
2024-01-15,Starbucks Coffee,12.50,Food & Dining
2024-01-16,Office Supplies,45.99,Office
Receipt Processing Output
{
"vendor": "Starbucks",
"total_amount": 12.50,
"tax_amount": 1.25,
"date": "2024-01-15",
"category": "Food & Dining",
"confidence": 0.95,
"extraction_success": true
}
Match Result Output
{
"receipt_id": "uuid",
"transaction_id": "transaction_123",
"confidence_score": 0.95,
"match_reason": "Same vendor, minor date difference (Auto-approved by rules)",
"receipt_vendor": "Starbucks",
"receipt_amount": 12.50,
"transaction_vendor": "STARBUCKS",
"transaction_amount": 12.50
}
🔍 AI Matching Criteria
The engine uses multiple criteria for matching:
- Amount Similarity - Compares receipt and transaction amounts (5% tolerance)
- Date Proximity - Checks date closeness (7-day tolerance)
- Vendor Matching - AI-powered vendor name comparison using Groq LLM
- Rule-based Auto-approval - Automatic approval for exact matches and high-confidence matches
🛠️ Development
Project Structure
├── main.py # FastAPI application entry point
├── ai_matcher.py # AI-powered matching logic
├── ai_rules.py # Business rules engine
├── document_processor.py # Receipt data extraction
├── matching_engine.py # Main matching orchestrator
├── feedback_logger.py # User feedback tracking
├── models.py # Pydantic data models
├── api_models.py # API request/response models
├── config.py # Configuration settings
├── requirements.txt # Python dependencies
└── test_images/ # Test image files
Running Tests
# Test the server
curl http://localhost:8343/
# Test stats endpoint
curl http://localhost:8343/stats
# Test rules endpoint
curl http://localhost:8343/rules
🚀 Production Deployment
For production deployment:
- Replace in-memory storage with a database (PostgreSQL recommended)
- Configure proper authentication and authorization
- Set up monitoring and logging (ELK stack recommended)
- Use environment variables for all configuration
- Implement proper error handling and retries
- Set up rate limiting and API quotas
- Configure CORS for frontend integration
- Use HTTPS in production
📞 Support
This Data Science Engine is designed to be integrated with backend applications that handle:
- QuickBooks API connections
- User interface and workflows
- Data persistence and management
- External integrations
The engine focuses purely on AI/ML capabilities and provides a clean JSON API for backend integration.
🔧 Troubleshooting
Common Issues
- API Key Error: Ensure
GROQ_API_KEYis set in your.envfile - Port Already in Use: Kill existing process with
pkill -f "python main.py" - Import Errors: Install dependencies with
pip install -r requirements.txt - Rate Limiting: The system includes built-in rate limiting to prevent API quota exhaustion
Logs
Check the application logs for detailed error information:
tail -f app.log
Description
Languages
Python
100%