Update README and core files, remove test/debug scripts, improve documentation and robustness
This commit is contained in:
@@ -7,9 +7,9 @@ AI-powered receipt-to-transaction matching engine using Groq LLM. This is a **Da
|
||||
This Data Science Engine receives QuickBooks transaction data from backend applications and provides:
|
||||
- **AI-powered receipt processing** (OCR and data extraction)
|
||||
- **Intelligent receipt-transaction matching** with confidence scores
|
||||
- **Google Drive integration** for batch receipt processing
|
||||
- **Configurable AI rules** for business logic
|
||||
- **Feedback logging** for continuous improvement
|
||||
- **RESTful API** for easy integration
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
@@ -19,11 +19,22 @@ pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### 2. Configure API Keys
|
||||
The Groq API key is already configured in `config.py`
|
||||
Create a `.env` file in the project root with your Groq API key:
|
||||
|
||||
### 3. Start the DS Engine
|
||||
```bash
|
||||
# Create .env file
|
||||
echo "GROQ_API_KEY=your_actual_groq_api_key_here" > .env
|
||||
```
|
||||
|
||||
**Important**: Get your API key from [Groq Console](https://console.groq.com/)
|
||||
|
||||
### 3. Start the Server
|
||||
```bash
|
||||
# Option 1: Using the main script
|
||||
python main.py
|
||||
|
||||
# Option 2: Using uvicorn directly
|
||||
uvicorn main:app --host 0.0.0.0 --port 8343 --reload
|
||||
```
|
||||
|
||||
### 4. Access API Documentation
|
||||
@@ -32,22 +43,16 @@ python main.py
|
||||
|
||||
## 📋 API Endpoints
|
||||
|
||||
### QuickBooks Data Import
|
||||
- `POST /transactions/import/quickbooks` - Import and convert QuickBooks transactions
|
||||
### Transaction Import
|
||||
- `POST /transactions/import/csv` - Import transactions from CSV file
|
||||
- `POST /transactions/import/image` - Import transactions from image/PDF
|
||||
|
||||
### Receipt Processing
|
||||
- `POST /upload` - Upload receipt documents (PDF/images)
|
||||
- `POST /upload-multiple` - Upload multiple receipt documents
|
||||
- `POST /process/{file_id}` - Extract data from uploaded documents
|
||||
- `GET /documents` - List all processed documents
|
||||
|
||||
### Google Drive Integration
|
||||
- `POST /drive/sync` - Sync and process receipts from Google Drive
|
||||
- `GET /drive/folders` - List accessible Google Drive folders
|
||||
- `GET /drive/folder/{folder_id}` - Get folder information
|
||||
|
||||
### AI Matching Engine
|
||||
- `POST /match` - Match receipts to transactions using AI
|
||||
- `POST /approve` - Approve or reject AI matches
|
||||
- `POST /match-specific` - Match specific receipts to transactions using AI
|
||||
|
||||
### AI Rules Management
|
||||
- `POST /rules` - Add new AI rules
|
||||
@@ -56,6 +61,7 @@ python main.py
|
||||
|
||||
### System Monitoring
|
||||
- `GET /stats` - Get system statistics and performance metrics
|
||||
- `GET /` - Health check endpoint
|
||||
|
||||
## 🔧 Core Components
|
||||
|
||||
@@ -63,21 +69,25 @@ python main.py
|
||||
- Uses Groq LLM to compare receipts and transactions
|
||||
- Provides confidence scores and reasoning
|
||||
- Configurable matching criteria (amount, date, vendor)
|
||||
- Rate limiting to prevent API quota exhaustion
|
||||
|
||||
### **AIRulesEngine** (`ai_rules.py`)
|
||||
- Applies business rules for auto-approval and categorization
|
||||
- Configurable rule conditions and actions
|
||||
- Supports system and user-generated rules
|
||||
- Safe condition evaluation with proper error handling
|
||||
|
||||
### **DocumentProcessor** (`document_processor.py`)
|
||||
- AI-powered receipt data extraction
|
||||
- AI-powered receipt data extraction using Groq vision model
|
||||
- Supports PDF and image formats
|
||||
- Uses Groq vision model for OCR
|
||||
- Robust JSON parsing with error handling
|
||||
- Extracts vendor, amount, date, tax, and category information
|
||||
|
||||
### **MatchingEngine** (`matching_engine.py`)
|
||||
- Main orchestrator combining all components
|
||||
- Handles the complete matching workflow
|
||||
- Provides statistics and feedback logging
|
||||
- Configurable confidence thresholds
|
||||
|
||||
### **FeedbackLogger** (`feedback_logger.py`)
|
||||
- Tracks manual overrides for AI training
|
||||
@@ -87,70 +97,46 @@ python main.py
|
||||
## 📊 Configuration
|
||||
|
||||
Edit `config.py` to adjust:
|
||||
- **Confidence threshold** (default: 0.8)
|
||||
- **Confidence threshold** (default: 0.3)
|
||||
- **Date tolerance days** (default: 7)
|
||||
- **Amount tolerance percent** (default: 5%)
|
||||
- **Groq API key** (already configured)
|
||||
- **Groq API key** (from environment variable)
|
||||
|
||||
## 🔄 Integration Workflow
|
||||
|
||||
### 1. Backend Sends QuickBooks Data
|
||||
```python
|
||||
# Backend sends QuickBooks transactions
|
||||
response = requests.post(
|
||||
"http://localhost:8343/transactions/import/quickbooks",
|
||||
json={
|
||||
"transactions": [
|
||||
{
|
||||
"id": "QB_TXN_123",
|
||||
"txn_date": "2024-01-15",
|
||||
"amount": 12.50,
|
||||
"payee_name": "Starbucks",
|
||||
"memo": "Coffee purchase"
|
||||
}
|
||||
]
|
||||
}
|
||||
)
|
||||
### 1. Import Transactions
|
||||
```bash
|
||||
# Import from CSV
|
||||
curl -X POST -F "file=@transactions.csv" http://localhost:8343/transactions/import/csv
|
||||
|
||||
# Import from image
|
||||
curl -X POST -F "file=@statement.jpg" http://localhost:8343/transactions/import/image
|
||||
```
|
||||
|
||||
### 2. Process Receipts
|
||||
```python
|
||||
# Sync from Google Drive
|
||||
response = requests.post(
|
||||
"http://localhost:8343/drive/sync",
|
||||
json={"folder_id": "your_folder_id"}
|
||||
)
|
||||
### 2. Upload and Process Receipts
|
||||
```bash
|
||||
# Upload receipts
|
||||
curl -X POST -F "files=@receipt1.jpg" -F "files=@receipt2.jpg" http://localhost:8343/upload-multiple
|
||||
|
||||
# Or upload directly
|
||||
response = requests.post(
|
||||
"http://localhost:8343/upload",
|
||||
files={"file": receipt_file}
|
||||
)
|
||||
# Process a specific receipt
|
||||
curl -X POST http://localhost:8343/process/{file_id}
|
||||
```
|
||||
|
||||
### 3. AI Matching
|
||||
```python
|
||||
# Match receipts to transactions
|
||||
response = requests.post(
|
||||
"http://localhost:8343/match",
|
||||
json={
|
||||
"receipts": processed_receipts,
|
||||
"transactions": converted_transactions
|
||||
}
|
||||
)
|
||||
```bash
|
||||
# Match specific receipts
|
||||
curl -X POST -H "Content-Type: application/json" \
|
||||
-d '["file_id_1", "file_id_2"]' \
|
||||
http://localhost:8343/match-specific
|
||||
```
|
||||
|
||||
### 4. User Feedback
|
||||
```python
|
||||
# Approve or reject matches
|
||||
response = requests.post(
|
||||
"http://localhost:8343/approve",
|
||||
json={
|
||||
"match_id": "match_123",
|
||||
"user_id": "user_456",
|
||||
"action": "approve"
|
||||
}
|
||||
)
|
||||
### 4. Check Results
|
||||
```bash
|
||||
# Get system stats
|
||||
curl http://localhost:8343/stats
|
||||
|
||||
# View AI rules
|
||||
curl http://localhost:8343/rules
|
||||
```
|
||||
|
||||
## 🎯 Key Features
|
||||
@@ -159,55 +145,96 @@ response = requests.post(
|
||||
- **Rule-based auto-approval** and categorization
|
||||
- **Feedback logging** for continuous improvement
|
||||
- **Configurable matching parameters**
|
||||
- **Google Drive integration** for batch processing
|
||||
- **JSON API** for easy backend integration
|
||||
- **RESTful JSON API** for easy backend integration
|
||||
- **Comprehensive error handling**
|
||||
- **Rate limiting** to prevent API quota exhaustion
|
||||
- **Robust JSON parsing** for AI responses
|
||||
|
||||
## 📝 Data Formats
|
||||
|
||||
### QuickBooks Transaction Input
|
||||
### Transaction Input (CSV)
|
||||
```csv
|
||||
Date,Description,Amount,Category
|
||||
2024-01-15,Starbucks Coffee,12.50,Food & Dining
|
||||
2024-01-16,Office Supplies,45.99,Office
|
||||
```
|
||||
|
||||
### Receipt Processing Output
|
||||
```json
|
||||
{
|
||||
"id": "string",
|
||||
"txn_date": "YYYY-MM-DD",
|
||||
"amount": 0.00,
|
||||
"payee_name": "string",
|
||||
"memo": "string (optional)",
|
||||
"account_name": "string (optional)",
|
||||
"txn_type": "string (optional)"
|
||||
"vendor": "Starbucks",
|
||||
"total_amount": 12.50,
|
||||
"tax_amount": 1.25,
|
||||
"date": "2024-01-15",
|
||||
"category": "Food & Dining",
|
||||
"confidence": 0.95,
|
||||
"extraction_success": true
|
||||
}
|
||||
```
|
||||
|
||||
### Match Result Output
|
||||
```json
|
||||
{
|
||||
"receipt_id": "string",
|
||||
"transaction_id": "string",
|
||||
"receipt_id": "uuid",
|
||||
"transaction_id": "transaction_123",
|
||||
"confidence_score": 0.95,
|
||||
"match_reason": "string",
|
||||
"receipt_vendor": "string",
|
||||
"receipt_amount": 0.00,
|
||||
"transaction_vendor": "string",
|
||||
"transaction_amount": 0.00
|
||||
"match_reason": "Same vendor, minor date difference (Auto-approved by rules)",
|
||||
"receipt_vendor": "Starbucks",
|
||||
"receipt_amount": 12.50,
|
||||
"transaction_vendor": "STARBUCKS",
|
||||
"transaction_amount": 12.50
|
||||
}
|
||||
```
|
||||
|
||||
## 🔍 AI Matching Criteria
|
||||
|
||||
The engine uses three primary criteria for matching:
|
||||
The engine uses multiple criteria for matching:
|
||||
|
||||
1. **Amount Similarity** - Compares receipt and transaction amounts (5% tolerance)
|
||||
2. **Date Proximity** - Checks date closeness (7-day tolerance)
|
||||
3. **Vendor Matching** - AI-powered vendor name comparison
|
||||
3. **Vendor Matching** - AI-powered vendor name comparison using Groq LLM
|
||||
4. **Rule-based Auto-approval** - Automatic approval for exact matches and high-confidence matches
|
||||
|
||||
## 🛠️ Development
|
||||
|
||||
### Project Structure
|
||||
```
|
||||
├── main.py # FastAPI application entry point
|
||||
├── ai_matcher.py # AI-powered matching logic
|
||||
├── ai_rules.py # Business rules engine
|
||||
├── document_processor.py # Receipt data extraction
|
||||
├── matching_engine.py # Main matching orchestrator
|
||||
├── feedback_logger.py # User feedback tracking
|
||||
├── models.py # Pydantic data models
|
||||
├── api_models.py # API request/response models
|
||||
├── config.py # Configuration settings
|
||||
├── requirements.txt # Python dependencies
|
||||
└── test_images/ # Test image files
|
||||
```
|
||||
|
||||
### Running Tests
|
||||
```bash
|
||||
# Test the server
|
||||
curl http://localhost:8343/
|
||||
|
||||
# Test stats endpoint
|
||||
curl http://localhost:8343/stats
|
||||
|
||||
# Test rules endpoint
|
||||
curl http://localhost:8343/rules
|
||||
```
|
||||
|
||||
## 🚀 Production Deployment
|
||||
|
||||
For production deployment:
|
||||
- Replace in-memory storage with a database
|
||||
- Configure proper authentication
|
||||
- Set up monitoring and logging
|
||||
- Use environment variables for configuration
|
||||
- Replace in-memory storage with a database (PostgreSQL recommended)
|
||||
- Configure proper authentication and authorization
|
||||
- Set up monitoring and logging (ELK stack recommended)
|
||||
- Use environment variables for all configuration
|
||||
- Implement proper error handling and retries
|
||||
- Set up rate limiting and API quotas
|
||||
- Configure CORS for frontend integration
|
||||
- Use HTTPS in production
|
||||
|
||||
## 📞 Support
|
||||
|
||||
@@ -217,4 +244,19 @@ This Data Science Engine is designed to be integrated with backend applications
|
||||
- Data persistence and management
|
||||
- External integrations
|
||||
|
||||
The engine focuses purely on AI/ML capabilities and provides a clean JSON API for backend integration.
|
||||
The engine focuses purely on AI/ML capabilities and provides a clean JSON API for backend integration.
|
||||
|
||||
## 🔧 Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **API Key Error**: Ensure `GROQ_API_KEY` is set in your `.env` file
|
||||
2. **Port Already in Use**: Kill existing process with `pkill -f "python main.py"`
|
||||
3. **Import Errors**: Install dependencies with `pip install -r requirements.txt`
|
||||
4. **Rate Limiting**: The system includes built-in rate limiting to prevent API quota exhaustion
|
||||
|
||||
### Logs
|
||||
Check the application logs for detailed error information:
|
||||
```bash
|
||||
tail -f app.log
|
||||
```
|
||||
Reference in New Issue
Block a user