220 lines
5.9 KiB
Markdown
220 lines
5.9 KiB
Markdown
|
|
# AI Bookkeeper - Data Science Engine
|
||
|
|
|
||
|
|
AI-powered receipt-to-transaction matching engine using Groq LLM. This is a **Data Science Engine** that provides intelligent matching capabilities for backend applications.
|
||
|
|
|
||
|
|
## 🎯 Purpose
|
||
|
|
|
||
|
|
This Data Science Engine receives QuickBooks transaction data from backend applications and provides:
|
||
|
|
- **AI-powered receipt processing** (OCR and data extraction)
|
||
|
|
- **Intelligent receipt-transaction matching** with confidence scores
|
||
|
|
- **Google Drive integration** for batch receipt processing
|
||
|
|
- **Configurable AI rules** for business logic
|
||
|
|
- **Feedback logging** for continuous improvement
|
||
|
|
|
||
|
|
## 🚀 Quick Start
|
||
|
|
|
||
|
|
### 1. Install Dependencies
|
||
|
|
```bash
|
||
|
|
pip install -r requirements.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
### 2. Configure API Keys
|
||
|
|
The Groq API key is already configured in `config.py`
|
||
|
|
|
||
|
|
### 3. Start the DS Engine
|
||
|
|
```bash
|
||
|
|
python main.py
|
||
|
|
```
|
||
|
|
|
||
|
|
### 4. Access API Documentation
|
||
|
|
- **Swagger UI**: http://localhost:8343/docs
|
||
|
|
- **ReDoc**: http://localhost:8343/redoc
|
||
|
|
|
||
|
|
## 📋 API Endpoints
|
||
|
|
|
||
|
|
### QuickBooks Data Import
|
||
|
|
- `POST /transactions/import/quickbooks` - Import and convert QuickBooks transactions
|
||
|
|
|
||
|
|
### Receipt Processing
|
||
|
|
- `POST /upload` - Upload receipt documents (PDF/images)
|
||
|
|
- `POST /process/{file_id}` - Extract data from uploaded documents
|
||
|
|
- `GET /documents` - List all processed documents
|
||
|
|
|
||
|
|
### Google Drive Integration
|
||
|
|
- `POST /drive/sync` - Sync and process receipts from Google Drive
|
||
|
|
- `GET /drive/folders` - List accessible Google Drive folders
|
||
|
|
- `GET /drive/folder/{folder_id}` - Get folder information
|
||
|
|
|
||
|
|
### AI Matching Engine
|
||
|
|
- `POST /match` - Match receipts to transactions using AI
|
||
|
|
- `POST /approve` - Approve or reject AI matches
|
||
|
|
|
||
|
|
### AI Rules Management
|
||
|
|
- `POST /rules` - Add new AI rules
|
||
|
|
- `GET /rules` - List all active rules
|
||
|
|
- `DELETE /rules/{rule_name}` - Delete rules
|
||
|
|
|
||
|
|
### System Monitoring
|
||
|
|
- `GET /stats` - Get system statistics and performance metrics
|
||
|
|
|
||
|
|
## 🔧 Core Components
|
||
|
|
|
||
|
|
### **AIMatcher** (`ai_matcher.py`)
|
||
|
|
- Uses Groq LLM to compare receipts and transactions
|
||
|
|
- Provides confidence scores and reasoning
|
||
|
|
- Configurable matching criteria (amount, date, vendor)
|
||
|
|
|
||
|
|
### **AIRulesEngine** (`ai_rules.py`)
|
||
|
|
- Applies business rules for auto-approval and categorization
|
||
|
|
- Configurable rule conditions and actions
|
||
|
|
- Supports system and user-generated rules
|
||
|
|
|
||
|
|
### **DocumentProcessor** (`document_processor.py`)
|
||
|
|
- AI-powered receipt data extraction
|
||
|
|
- Supports PDF and image formats
|
||
|
|
- Uses Groq vision model for OCR
|
||
|
|
|
||
|
|
### **MatchingEngine** (`matching_engine.py`)
|
||
|
|
- Main orchestrator combining all components
|
||
|
|
- Handles the complete matching workflow
|
||
|
|
- Provides statistics and feedback logging
|
||
|
|
|
||
|
|
### **FeedbackLogger** (`feedback_logger.py`)
|
||
|
|
- Tracks manual overrides for AI training
|
||
|
|
- Maintains audit trail of user decisions
|
||
|
|
- Enables continuous model improvement
|
||
|
|
|
||
|
|
## 📊 Configuration
|
||
|
|
|
||
|
|
Edit `config.py` to adjust:
|
||
|
|
- **Confidence threshold** (default: 0.8)
|
||
|
|
- **Date tolerance days** (default: 7)
|
||
|
|
- **Amount tolerance percent** (default: 5%)
|
||
|
|
- **Groq API key** (already configured)
|
||
|
|
|
||
|
|
## 🔄 Integration Workflow
|
||
|
|
|
||
|
|
### 1. Backend Sends QuickBooks Data
|
||
|
|
```python
|
||
|
|
# Backend sends QuickBooks transactions
|
||
|
|
response = requests.post(
|
||
|
|
"http://localhost:8343/transactions/import/quickbooks",
|
||
|
|
json={
|
||
|
|
"transactions": [
|
||
|
|
{
|
||
|
|
"id": "QB_TXN_123",
|
||
|
|
"txn_date": "2024-01-15",
|
||
|
|
"amount": 12.50,
|
||
|
|
"payee_name": "Starbucks",
|
||
|
|
"memo": "Coffee purchase"
|
||
|
|
}
|
||
|
|
]
|
||
|
|
}
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
### 2. Process Receipts
|
||
|
|
```python
|
||
|
|
# Sync from Google Drive
|
||
|
|
response = requests.post(
|
||
|
|
"http://localhost:8343/drive/sync",
|
||
|
|
json={"folder_id": "your_folder_id"}
|
||
|
|
)
|
||
|
|
|
||
|
|
# Or upload directly
|
||
|
|
response = requests.post(
|
||
|
|
"http://localhost:8343/upload",
|
||
|
|
files={"file": receipt_file}
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3. AI Matching
|
||
|
|
```python
|
||
|
|
# Match receipts to transactions
|
||
|
|
response = requests.post(
|
||
|
|
"http://localhost:8343/match",
|
||
|
|
json={
|
||
|
|
"receipts": processed_receipts,
|
||
|
|
"transactions": converted_transactions
|
||
|
|
}
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
### 4. User Feedback
|
||
|
|
```python
|
||
|
|
# Approve or reject matches
|
||
|
|
response = requests.post(
|
||
|
|
"http://localhost:8343/approve",
|
||
|
|
json={
|
||
|
|
"match_id": "match_123",
|
||
|
|
"user_id": "user_456",
|
||
|
|
"action": "approve"
|
||
|
|
}
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
## 🎯 Key Features
|
||
|
|
|
||
|
|
- **AI-powered matching** with confidence scores
|
||
|
|
- **Rule-based auto-approval** and categorization
|
||
|
|
- **Feedback logging** for continuous improvement
|
||
|
|
- **Configurable matching parameters**
|
||
|
|
- **Google Drive integration** for batch processing
|
||
|
|
- **JSON API** for easy backend integration
|
||
|
|
- **Comprehensive error handling**
|
||
|
|
|
||
|
|
## 📝 Data Formats
|
||
|
|
|
||
|
|
### QuickBooks Transaction Input
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"id": "string",
|
||
|
|
"txn_date": "YYYY-MM-DD",
|
||
|
|
"amount": 0.00,
|
||
|
|
"payee_name": "string",
|
||
|
|
"memo": "string (optional)",
|
||
|
|
"account_name": "string (optional)",
|
||
|
|
"txn_type": "string (optional)"
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Match Result Output
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"receipt_id": "string",
|
||
|
|
"transaction_id": "string",
|
||
|
|
"confidence_score": 0.95,
|
||
|
|
"match_reason": "string",
|
||
|
|
"receipt_vendor": "string",
|
||
|
|
"receipt_amount": 0.00,
|
||
|
|
"transaction_vendor": "string",
|
||
|
|
"transaction_amount": 0.00
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## 🔍 AI Matching Criteria
|
||
|
|
|
||
|
|
The engine uses three primary criteria for matching:
|
||
|
|
|
||
|
|
1. **Amount Similarity** - Compares receipt and transaction amounts (5% tolerance)
|
||
|
|
2. **Date Proximity** - Checks date closeness (7-day tolerance)
|
||
|
|
3. **Vendor Matching** - AI-powered vendor name comparison
|
||
|
|
|
||
|
|
## 🚀 Production Deployment
|
||
|
|
|
||
|
|
For production deployment:
|
||
|
|
- Replace in-memory storage with a database
|
||
|
|
- Configure proper authentication
|
||
|
|
- Set up monitoring and logging
|
||
|
|
- Use environment variables for configuration
|
||
|
|
- Implement proper error handling and retries
|
||
|
|
|
||
|
|
## 📞 Support
|
||
|
|
|
||
|
|
This Data Science Engine is designed to be integrated with backend applications that handle:
|
||
|
|
- QuickBooks API connections
|
||
|
|
- User interface and workflows
|
||
|
|
- Data persistence and management
|
||
|
|
- External integrations
|
||
|
|
|
||
|
|
The engine focuses purely on AI/ML capabilities and provides a clean JSON API for backend integration.
|