first commit

2025-08-05 22:25:51 +01:00
commit 5b3c066cea
14 changed files with 2670 additions and 0 deletions
@@ -0,0 +1,262 @@
+# AI Bookkeeper - Data Science Engine
+
+AI-powered receipt-to-transaction matching engine using Groq LLM. This is a **Data Science Engine** that provides intelligent matching capabilities for backend applications.
+
+## 🎯 Purpose
+
+This Data Science Engine receives QuickBooks transaction data from backend applications and provides:
+- **AI-powered receipt processing** (OCR and data extraction)
+- **Intelligent receipt-transaction matching** with confidence scores
+- **Configurable AI rules** for business logic
+- **Feedback logging** for continuous improvement
+- **RESTful API** for easy integration
+
+## 🚀 Quick Start
+
+### 1. Install Dependencies
+```bash
+pip install -r requirements.txt
+```
+
+### 2. Configure API Keys
+Create a `.env` file in the project root with your Groq API key:
+
+```bash
+# Create .env file
+echo "GROQ_API_KEY=your_actual_groq_api_key_here" > .env
+```
+
+**Important**: Get your API key from [Groq Console](https://console.groq.com/)
+
+### 3. Start the Server
+```bash
+# Option 1: Using the main script
+python main.py
+
+# Option 2: Using uvicorn directly
+uvicorn main:app --host 0.0.0.0 --port 8343 --reload
+```
+
+### 4. Access API Documentation
+- **Swagger UI**: http://localhost:8343/docs
+- **ReDoc**: http://localhost:8343/redoc
+
+## 📋 API Endpoints
+
+### Transaction Import
+- `POST /transactions/import/csv` - Import transactions from CSV file
+- `POST /transactions/import/image` - Import transactions from image/PDF
+
+### Receipt Processing
+- `POST /upload-multiple` - Upload multiple receipt documents
+- `POST /process/{file_id}` - Extract data from uploaded documents
+
+### AI Matching Engine
+- `POST /match-specific` - Match specific receipts to transactions using AI
+
+### AI Rules Management
+- `POST /rules` - Add new AI rules
+- `GET /rules` - List all active rules
+- `DELETE /rules/{rule_name}` - Delete rules
+
+### System Monitoring
+- `GET /stats` - Get system statistics and performance metrics
+- `GET /` - Health check endpoint
+
+## 🔧 Core Components
+
+### **AIMatcher** (`ai_matcher.py`)
+- Uses Groq LLM to compare receipts and transactions
+- Provides confidence scores and reasoning
+- Configurable matching criteria (amount, date, vendor)
+- Rate limiting to prevent API quota exhaustion
+
+### **AIRulesEngine** (`ai_rules.py`)
+- Applies business rules for auto-approval and categorization
+- Configurable rule conditions and actions
+- Supports system and user-generated rules
+- Safe condition evaluation with proper error handling
+
+### **DocumentProcessor** (`document_processor.py`)
+- AI-powered receipt data extraction using Groq vision model
+- Supports PDF and image formats
+- Robust JSON parsing with error handling
+- Extracts vendor, amount, date, tax, and category information
+
+### **MatchingEngine** (`matching_engine.py`)
+- Main orchestrator combining all components
+- Handles the complete matching workflow
+- Provides statistics and feedback logging
+- Configurable confidence thresholds
+
+### **FeedbackLogger** (`feedback_logger.py`)
+- Tracks manual overrides for AI training
+- Maintains audit trail of user decisions
+- Enables continuous model improvement
+
+## 📊 Configuration
+
+Edit `config.py` to adjust:
+- **Confidence threshold** (default: 0.3)
+- **Date tolerance days** (default: 7)
+- **Amount tolerance percent** (default: 5%)
+- **Groq API key** (from environment variable)
+
+## 🔄 Integration Workflow
+
+### 1. Import Transactions
+```bash
+# Import from CSV
+curl -X POST -F "file=@transactions.csv" http://localhost:8343/transactions/import/csv
+
+# Import from image
+curl -X POST -F "file=@statement.jpg" http://localhost:8343/transactions/import/image
+```
+
+### 2. Upload and Process Receipts
+```bash
+# Upload receipts
+curl -X POST -F "files=@receipt1.jpg" -F "files=@receipt2.jpg" http://localhost:8343/upload-multiple
+
+# Process a specific receipt
+curl -X POST http://localhost:8343/process/{file_id}
+```
+
+### 3. AI Matching
+```bash
+# Match specific receipts
+curl -X POST -H "Content-Type: application/json" \
+  -d '["file_id_1", "file_id_2"]' \
+  http://localhost:8343/match-specific
+```
+
+### 4. Check Results
+```bash
+# Get system stats
+curl http://localhost:8343/stats
+
+# View AI rules
+curl http://localhost:8343/rules
+```
+
+## 🎯 Key Features
+
+- **AI-powered matching** with confidence scores
+- **Rule-based auto-approval** and categorization
+- **Feedback logging** for continuous improvement
+- **Configurable matching parameters**
+- **RESTful JSON API** for easy backend integration
+- **Comprehensive error handling**
+- **Rate limiting** to prevent API quota exhaustion
+- **Robust JSON parsing** for AI responses
+
+## 📝 Data Formats
+
+### Transaction Input (CSV)
+```csv
+Date,Description,Amount,Category
+2024-01-15,Starbucks Coffee,12.50,Food & Dining
+2024-01-16,Office Supplies,45.99,Office
+```
+
+### Receipt Processing Output
+```json
+{
+  "vendor": "Starbucks",
+  "total_amount": 12.50,
+  "tax_amount": 1.25,
+  "date": "2024-01-15",
+  "category": "Food & Dining",
+  "confidence": 0.95,
+  "extraction_success": true
+}
+```
+
+### Match Result Output
+```json
+{
+  "receipt_id": "uuid",
+  "transaction_id": "transaction_123",
+  "confidence_score": 0.95,
+  "match_reason": "Same vendor, minor date difference (Auto-approved by rules)",
+  "receipt_vendor": "Starbucks",
+  "receipt_amount": 12.50,
+  "transaction_vendor": "STARBUCKS",
+  "transaction_amount": 12.50
+}
+```
+
+## 🔍 AI Matching Criteria
+
+The engine uses multiple criteria for matching:
+
+1. **Amount Similarity** - Compares receipt and transaction amounts (5% tolerance)
+2. **Date Proximity** - Checks date closeness (7-day tolerance)
+3. **Vendor Matching** - AI-powered vendor name comparison using Groq LLM
+4. **Rule-based Auto-approval** - Automatic approval for exact matches and high-confidence matches
+
+## 🛠️ Development
+
+### Project Structure
+```
+├── main.py                 # FastAPI application entry point
+├── ai_matcher.py           # AI-powered matching logic
+├── ai_rules.py            # Business rules engine
+├── document_processor.py   # Receipt data extraction
+├── matching_engine.py      # Main matching orchestrator
+├── feedback_logger.py      # User feedback tracking
+├── models.py              # Pydantic data models
+├── api_models.py          # API request/response models
+├── config.py              # Configuration settings
+├── requirements.txt       # Python dependencies
+└── test_images/           # Test image files
+```
+
+### Running Tests
+```bash
+# Test the server
+curl http://localhost:8343/
+
+# Test stats endpoint
+curl http://localhost:8343/stats
+
+# Test rules endpoint
+curl http://localhost:8343/rules
+```
+
+## 🚀 Production Deployment
+
+For production deployment:
+- Replace in-memory storage with a database (PostgreSQL recommended)
+- Configure proper authentication and authorization
+- Set up monitoring and logging (ELK stack recommended)
+- Use environment variables for all configuration
+- Implement proper error handling and retries
+- Set up rate limiting and API quotas
+- Configure CORS for frontend integration
+- Use HTTPS in production
+
+## 📞 Support
+
+This Data Science Engine is designed to be integrated with backend applications that handle:
+- QuickBooks API connections
+- User interface and workflows
+- Data persistence and management
+- External integrations
+
+The engine focuses purely on AI/ML capabilities and provides a clean JSON API for backend integration.
+
+## 🔧 Troubleshooting
+
+### Common Issues
+
+1. **API Key Error**: Ensure `GROQ_API_KEY` is set in your `.env` file
+2. **Port Already in Use**: Kill existing process with `pkill -f "python main.py"`
+3. **Import Errors**: Install dependencies with `pip install -r requirements.txt`
+4. **Rate Limiting**: The system includes built-in rate limiting to prevent API quota exhaustion
+
+### Logs
+Check the application logs for detailed error information:
+```bash
+tail -f app.log
+```