Compare commits

..

8 Commits

Author SHA1 Message Date
michael 823c05f78d Implement code changes to enhance functionality and improve performance 2025-10-05 23:38:03 +00:00
bolade c2a7c5a087 Add manual tax calculator for rule-based tax analysis and integrate with matching engine 2025-10-05 20:48:05 +01:00
bolade e3f610e01a Refine JSON response handling in batch analysis to exclude markdown code blocks and improve extraction logic 2025-10-05 20:36:47 +01:00
bolade 7c412bcf9e Enhance batch processing in LLMTaxAnalyzer with fallback to individual analysis on failure 2025-10-05 20:03:46 +01:00
bolade ae200bd30f Implement batch processing for LLM-based tax analysis and enhance match confidence scoring 2025-10-05 19:38:34 +01:00
bolade c45e3fa791 Add user location support and tax analysis enhancements
- Introduced user location extraction from user tax info for improved matching.
- Normalized user location to province codes for tax calculations.
- Updated MatchResponse schema to include tax analysis data.
- Enhanced LLMTaxAnalyzer to handle various location formats and provide fallback logic.
2025-10-05 18:34:35 +01:00
bolade c78c4c6fe9 Enhance receipt matching by adding user location support and implementing LLM-based tax analysis rules 2025-10-05 13:25:55 +01:00
michael 3d48cf0385 Add requirements.txt with essential dependencies for the project 2025-10-05 11:29:45 +00:00
23 changed files with 3364 additions and 1293 deletions
+7 -225
View File
@@ -1,229 +1,11 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be added to the global gitignore or merged into this project gitignore. For a PyCharm
# project, it is recommended to include the following files:
# .idea/
# *.iml
# *.ipr
# *.iws
# VS Code
.vscode/
# macOS
.DS_Store
.AppleDouble
.LSOverride
# Windows
Thumbs.db
ehthumbs.db
Desktop.ini
# Linux
*~
# Temporary files
*.tmp
*.temp
*.swp
*.swo
*~
# Log files
*.log
# Database files
__pycache__/
*.pyc
*.pyo
*.pyd
*.db
*.sqlite
*.sqlite3
# Configuration files with sensitive data
config.ini
secrets.json
.env.local
.env.production
# Test files
test_*.py
*_test.py
tests/
# Documentation
docs/
*.md
!README.md
# IDE files
.idea/
.vscode/
*.sublime-*
.atom/
# OS generated files
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db
uploads/
chequing statement.csv
test_images/
.cursorrules.md
.env
*.log
/uploads
-262
View File
@@ -1,262 +0,0 @@
# AI Bookkeeper - Data Science Engine
AI-powered receipt-to-transaction matching engine using Groq LLM. This is a **Data Science Engine** that provides intelligent matching capabilities for backend applications.
## 🎯 Purpose
This Data Science Engine receives QuickBooks transaction data from backend applications and provides:
- **AI-powered receipt processing** (OCR and data extraction)
- **Intelligent receipt-transaction matching** with confidence scores
- **Configurable AI rules** for business logic
- **Feedback logging** for continuous improvement
- **RESTful API** for easy integration
## 🚀 Quick Start
### 1. Install Dependencies
```bash
pip install -r requirements.txt
```
### 2. Configure API Keys
Create a `.env` file in the project root with your Groq API key:
```bash
# Create .env file
echo "GROQ_API_KEY=your_actual_groq_api_key_here" > .env
```
**Important**: Get your API key from [Groq Console](https://console.groq.com/)
### 3. Start the Server
```bash
# Option 1: Using the main script
python main.py
# Option 2: Using uvicorn directly
uvicorn main:app --host 0.0.0.0 --port 8343 --reload
```
### 4. Access API Documentation
- **Swagger UI**: http://localhost:8343/docs
- **ReDoc**: http://localhost:8343/redoc
## 📋 API Endpoints
### Transaction Import
- `POST /transactions/import/csv` - Import transactions from CSV file
- `POST /transactions/import/image` - Import transactions from image/PDF
### Receipt Processing
- `POST /upload-multiple` - Upload multiple receipt documents
- `POST /process/{file_id}` - Extract data from uploaded documents
### AI Matching Engine
- `POST /match-specific` - Match specific receipts to transactions using AI
### AI Rules Management
- `POST /rules` - Add new AI rules
- `GET /rules` - List all active rules
- `DELETE /rules/{rule_name}` - Delete rules
### System Monitoring
- `GET /stats` - Get system statistics and performance metrics
- `GET /` - Health check endpoint
## 🔧 Core Components
### **AIMatcher** (`ai_matcher.py`)
- Uses Groq LLM to compare receipts and transactions
- Provides confidence scores and reasoning
- Configurable matching criteria (amount, date, vendor)
- Rate limiting to prevent API quota exhaustion
### **AIRulesEngine** (`ai_rules.py`)
- Applies business rules for auto-approval and categorization
- Configurable rule conditions and actions
- Supports system and user-generated rules
- Safe condition evaluation with proper error handling
### **DocumentProcessor** (`document_processor.py`)
- AI-powered receipt data extraction using Groq vision model
- Supports PDF and image formats
- Robust JSON parsing with error handling
- Extracts vendor, amount, date, tax, and category information
### **MatchingEngine** (`matching_engine.py`)
- Main orchestrator combining all components
- Handles the complete matching workflow
- Provides statistics and feedback logging
- Configurable confidence thresholds
### **FeedbackLogger** (`feedback_logger.py`)
- Tracks manual overrides for AI training
- Maintains audit trail of user decisions
- Enables continuous model improvement
## 📊 Configuration
Edit `config.py` to adjust:
- **Confidence threshold** (default: 0.3)
- **Date tolerance days** (default: 7)
- **Amount tolerance percent** (default: 5%)
- **Groq API key** (from environment variable)
## 🔄 Integration Workflow
### 1. Import Transactions
```bash
# Import from CSV
curl -X POST -F "file=@transactions.csv" http://localhost:8343/transactions/import/csv
# Import from image
curl -X POST -F "file=@statement.jpg" http://localhost:8343/transactions/import/image
```
### 2. Upload and Process Receipts
```bash
# Upload receipts
curl -X POST -F "files=@receipt1.jpg" -F "files=@receipt2.jpg" http://localhost:8343/upload-multiple
# Process a specific receipt
curl -X POST http://localhost:8343/process/{file_id}
```
### 3. AI Matching
```bash
# Match specific receipts
curl -X POST -H "Content-Type: application/json" \
-d '["file_id_1", "file_id_2"]' \
http://localhost:8343/match-specific
```
### 4. Check Results
```bash
# Get system stats
curl http://localhost:8343/stats
# View AI rules
curl http://localhost:8343/rules
```
## 🎯 Key Features
- **AI-powered matching** with confidence scores
- **Rule-based auto-approval** and categorization
- **Feedback logging** for continuous improvement
- **Configurable matching parameters**
- **RESTful JSON API** for easy backend integration
- **Comprehensive error handling**
- **Rate limiting** to prevent API quota exhaustion
- **Robust JSON parsing** for AI responses
## 📝 Data Formats
### Transaction Input (CSV)
```csv
Date,Description,Amount,Category
2024-01-15,Starbucks Coffee,12.50,Food & Dining
2024-01-16,Office Supplies,45.99,Office
```
### Receipt Processing Output
```json
{
"vendor": "Starbucks",
"total_amount": 12.50,
"tax_amount": 1.25,
"date": "2024-01-15",
"category": "Food & Dining",
"confidence": 0.95,
"extraction_success": true
}
```
### Match Result Output
```json
{
"receipt_id": "uuid",
"transaction_id": "transaction_123",
"confidence_score": 0.95,
"match_reason": "Same vendor, minor date difference (Auto-approved by rules)",
"receipt_vendor": "Starbucks",
"receipt_amount": 12.50,
"transaction_vendor": "STARBUCKS",
"transaction_amount": 12.50
}
```
## 🔍 AI Matching Criteria
The engine uses multiple criteria for matching:
1. **Amount Similarity** - Compares receipt and transaction amounts (5% tolerance)
2. **Date Proximity** - Checks date closeness (7-day tolerance)
3. **Vendor Matching** - AI-powered vendor name comparison using Groq LLM
4. **Rule-based Auto-approval** - Automatic approval for exact matches and high-confidence matches
## 🛠️ Development
### Project Structure
```
├── main.py # FastAPI application entry point
├── ai_matcher.py # AI-powered matching logic
├── ai_rules.py # Business rules engine
├── document_processor.py # Receipt data extraction
├── matching_engine.py # Main matching orchestrator
├── feedback_logger.py # User feedback tracking
├── models.py # Pydantic data models
├── api_models.py # API request/response models
├── config.py # Configuration settings
├── requirements.txt # Python dependencies
└── test_images/ # Test image files
```
### Running Tests
```bash
# Test the server
curl http://localhost:8343/
# Test stats endpoint
curl http://localhost:8343/stats
# Test rules endpoint
curl http://localhost:8343/rules
```
## 🚀 Production Deployment
For production deployment:
- Replace in-memory storage with a database (PostgreSQL recommended)
- Configure proper authentication and authorization
- Set up monitoring and logging (ELK stack recommended)
- Use environment variables for all configuration
- Implement proper error handling and retries
- Set up rate limiting and API quotas
- Configure CORS for frontend integration
- Use HTTPS in production
## 📞 Support
This Data Science Engine is designed to be integrated with backend applications that handle:
- QuickBooks API connections
- User interface and workflows
- Data persistence and management
- External integrations
The engine focuses purely on AI/ML capabilities and provides a clean JSON API for backend integration.
## 🔧 Troubleshooting
### Common Issues
1. **API Key Error**: Ensure `GROQ_API_KEY` is set in your `.env` file
2. **Port Already in Use**: Kill existing process with `pkill -f "python main.py"`
3. **Import Errors**: Install dependencies with `pip install -r requirements.txt`
4. **Rate Limiting**: The system includes built-in rate limiting to prevent API quota exhaustion
### Logs
Check the application logs for detailed error information:
```bash
tail -f app.log
```
View File
+12
View File
@@ -0,0 +1,12 @@
from pydantic_settings import BaseSettings
from typing import Optional
class Settings(BaseSettings):
database_url: Optional[str] = None
secret_key: Optional[str] = None
api_key: Optional[str] = None
GROQ_API_KEY: str
class Config:
env_file = ".env"
settings = Settings()
+20 -5
View File
@@ -34,19 +34,20 @@ def clear_all_data():
"""Clear all data from the database (useful for testing)"""
db = SessionLocal()
try:
db.query(Transaction).delete()
db.query(Receipt).delete()
db.query(DBTransaction).delete()
db.query(DBReceipt).delete()
db.query(DBUploadedFile).delete()
db.commit()
finally:
db.close()
# Transactions table
class Transaction(Base):
class DBTransaction(Base):
__tablename__ = "transactions"
id = Column(Integer, primary_key=True, index=True)
transaction_id = Column(String, unique=True, index=True)
transaction_id = Column(String, index=True)
amount = Column(Float, nullable=False)
date = Column(DateTime, nullable=False)
vendor = Column(String, nullable=False)
@@ -57,8 +58,21 @@ class Transaction(Base):
user_id = Column(String, nullable=True)
# Uploaded Files table
class DBUploadedFile(Base):
__tablename__ = "uploaded_files"
id = Column(Integer, primary_key=True, index=True)
file_id = Column(String, unique=True, index=True)
filename = Column(String, nullable=False)
file_path = Column(String, nullable=False)
file_type = Column(String, nullable=False)
upload_date = Column(DateTime, nullable=False)
status = Column(String, nullable=False, default="uploaded")
# Receipts table
class Receipt(Base):
class DBReceipt(Base):
__tablename__ = "receipts"
id = Column(Integer, primary_key=True, index=True)
@@ -73,3 +87,4 @@ class Receipt(Base):
confidence = Column(Float, nullable=True)
extraction_success = Column(String, nullable=True)
error_message = Column(String, nullable=True)
receipt_currency = Column(String, nullable=True)
+134 -354
View File
@@ -5,32 +5,36 @@ import uuid
from datetime import datetime
from typing import List
from database import (
DBReceipt,
DBTransaction,
DBUploadedFile,
create_db_tables,
db_dependency,
)
from fastapi import FastAPI, File, Form, HTTPException, UploadFile
from fastapi.middleware.cors import CORSMiddleware
from sqlalchemy.orm import Session
from ai_rules import AIRule
from api_models import (
from schemas import (
DocumentProcessResponse,
DocumentUploadResponse,
MatchingResponse,
MatchResponse,
MatchSpecificRequest,
Receipt,
RuleRequest,
Transaction,
)
from database import Receipt as DBReceipt
from database import Transaction as DBTransaction
from database import create_db_tables, db_dependency
from document_processor import DocumentProcessor
from matching_engine import MatchingEngine
from models import Receipt, Transaction
from services.ai_rules import AIRule
from services.document_processor import DocumentProcessor
from services.matching_engine import MatchingEngine
from sqlalchemy.orm import Session
create_db_tables()
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
handlers=[logging.FileHandler("app.log"), logging.StreamHandler()],
handlers=[logging.StreamHandler()],
)
logger = logging.getLogger(__name__)
@@ -53,9 +57,6 @@ app.add_middleware(
matching_engine = MatchingEngine()
document_processor = DocumentProcessor()
# In-memory storage for uploaded files (in production, use a database)
uploaded_files = {}
# Helper functions for database operations
def get_transactions_from_db(
@@ -80,7 +81,17 @@ def get_receipts_from_db(db: Session, file_ids: List[str]):
return db.query(DBReceipt).filter(DBReceipt.file_id.in_(file_ids)).all()
@app.get("/")
def get_uploaded_file_from_db(db: Session, file_id: str):
"""Retrieve uploaded file from database by file_id"""
return db.query(DBUploadedFile).filter(DBUploadedFile.file_id == file_id).first()
def get_uploaded_files_from_db(db: Session, file_ids: List[str]):
"""Retrieve multiple uploaded files from database by file_ids"""
return db.query(DBUploadedFile).filter(DBUploadedFile.file_id.in_(file_ids)).all()
@app.get("/", tags=["Health"])
async def root():
"""Health check endpoint"""
return {
@@ -93,75 +104,7 @@ async def root():
# ============================================================================
# TRANSACTION IMPORT ENDPOINTS
# ============================================================================
# @app.post("/transactions/import/csv")
# async def import_transactions_csv(file: UploadFile = File(...), user_id: str = "", categorization_id: str = ""):
# """
# Import transactions from a CSV file (custom bank export format).
# """
# try:
# content = await file.read()
# decoded = content.decode("utf-8")
# reader = csv.DictReader(io.StringIO(decoded))
# transactions = []
# errors = []
# for idx, row in enumerate(reader):
# try:
# # Use correct headers and strip whitespace
# account_number = row.get("Account Number") or row.get(
# "Account Number ".strip()
# )
# txn_date_raw = row.get("Transaction Date") or row.get(
# "Transaction Date ".strip()
# )
# amount_raw = row.get("Amount") or row.get("Amount ".strip())
# payee_name = row.get("Description 2") or row.get(
# "Description 2 ".strip()
# )
# memo = f"{row.get('Account Type', '').strip()} {row.get('Cheque Number', '').strip()} {row.get('Description 1', '').strip()}".strip()
# # Compose ID
# txn_id = f"{account_number}_{idx + 1}"
# # Parse date (try multiple formats)
# txn_date_str = txn_date_raw.strip()
# txn_date = None
# for fmt in ("%m/%d/%y", "%m/%d/%Y"):
# try:
# txn_date = datetime.strptime(txn_date_str, fmt).strftime(
# "%Y-%m-%d"
# )
# break
# except Exception:
# continue
# if not txn_date:
# raise ValueError(f"Could not parse date: {txn_date_str}")
# # Parse amount
# amount = float(amount_raw.replace(",", "").strip())
# transactions.append(
# {
# "id": txn_id,
# "txn_date": txn_date,
# "amount": amount,
# "payee_name": payee_name.strip(),
# "memo": memo,
# }
# )
# except Exception as e:
# errors.append(f"Row {idx + 1}: {str(e)}")
# # Store transactions globally for auto-matching
# global stored_transactions
# stored_transactions = transactions
# return {
# "imported_count": len(transactions),
# "converted_transactions": transactions,
# "errors": errors,
# }
# except Exception as e:
# raise HTTPException(status_code=500, detail=str(e))
@app.post("/transactions/import/csv")
@app.post("/transactions/import/csv", tags=["Transaction Import"])
async def import_transactions_csv(
db: db_dependency,
file: UploadFile = File(...),
@@ -184,7 +127,7 @@ async def import_transactions_csv(
"Account Number ".strip()
)
txn_date_raw = row.get("Transaction Date") or row.get(
"Transaction Date ".strip()
"Transaction Date ".strip() or row.get("Date")
)
amount_raw = row.get("Amount") or row.get("Amount ".strip())
payee_name = row.get("Description 2") or row.get(
@@ -252,7 +195,7 @@ async def import_transactions_csv(
raise HTTPException(status_code=500, detail=str(e))
@app.post("/transactions/import/image")
@app.post("/transactions/import/image", tags=["Transaction Import"])
async def import_transactions_from_image(
db: db_dependency,
file: UploadFile = File(...),
@@ -350,8 +293,14 @@ async def import_transactions_from_image(
# ============================================================================
@app.post("/upload-multiple", response_model=List[DocumentUploadResponse])
async def upload_multiple_documents(files: List[UploadFile] = File(...)):
@app.post(
"/upload-multiple",
response_model=List[DocumentUploadResponse],
tags=["Document Processing"],
)
async def upload_multiple_documents(
files: List[UploadFile] = File(...), db: db_dependency = None
):
"""
Upload multiple receipt images for processing.
@@ -375,13 +324,24 @@ async def upload_multiple_documents(files: List[UploadFile] = File(...)):
# Generate unique file ID
file_id = str(uuid.uuid4())
# Read and store file content
# Read file content and save to disk
content = await file.read()
uploaded_files[file_id] = {
"filename": file.filename,
"content": content,
"upload_date": datetime.now(),
}
file_path = await document_processor.save_uploaded_file(
content, file.filename
)
# Create database record for uploaded file
db_uploaded_file = DBUploadedFile(
file_id=file_id,
filename=file.filename,
file_path=file_path,
file_type=file_extension,
upload_date=datetime.now(),
status="uploaded",
)
# Add to database
db.add(db_uploaded_file)
responses.append(
DocumentUploadResponse(
@@ -393,6 +353,9 @@ async def upload_multiple_documents(files: List[UploadFile] = File(...)):
)
)
# Commit all uploaded files to database
db.commit()
return responses
except Exception as e:
@@ -400,7 +363,11 @@ async def upload_multiple_documents(files: List[UploadFile] = File(...)):
raise HTTPException(status_code=500, detail=str(e))
@app.post("/process/{file_id}", response_model=DocumentProcessResponse)
@app.post(
"/process/{file_id}",
response_model=DocumentProcessResponse,
tags=["Document Processing"],
)
async def process_document(file_id: str, db: db_dependency):
"""
Process a previously uploaded document to extract receipt information.
@@ -409,18 +376,15 @@ async def process_document(file_id: str, db: db_dependency):
including vendor, amount, date, and category information.
"""
try:
# Check if file exists
if file_id not in uploaded_files:
# Get file info from database
db_uploaded_file = get_uploaded_file_from_db(db, file_id)
if not db_uploaded_file:
raise HTTPException(status_code=404, detail=f"File {file_id} not found")
file_data = uploaded_files[file_id]
# Save file temporarily and process it
file_path = await document_processor.save_uploaded_file(
file_data["content"], file_data["filename"]
# Process the file using the stored file path
receipt_data = await document_processor.process_file(
db_uploaded_file.file_path, db_uploaded_file.file_type
)
file_type = file_data["filename"].split(".")[-1].lower()
receipt_data = await document_processor.process_file(file_path, file_type)
# Parse date for database storage
receipt_date = None
@@ -445,6 +409,7 @@ async def process_document(file_id: str, db: db_dependency):
confidence=receipt_data.get("confidence", 0.0),
extraction_success=str(receipt_data.get("extraction_success", False)),
error_message=receipt_data.get("error"),
receipt_currency=receipt_data.get("currency"),
)
# Add to database
@@ -453,6 +418,7 @@ async def process_document(file_id: str, db: db_dependency):
return DocumentProcessResponse(
file_id=file_id,
receipt_id=db_receipt.receipt_id,
extraction_success=receipt_data.get("extraction_success", False),
vendor=receipt_data.get("vendor", ""),
description=receipt_data.get("description", ""),
@@ -462,6 +428,7 @@ async def process_document(file_id: str, db: db_dependency):
category=receipt_data.get("category", ""),
confidence=receipt_data.get("confidence", 0.0),
error=receipt_data.get("error", None),
receipt_currency=receipt_data.get("currency"),
)
except Exception as e:
@@ -469,244 +436,7 @@ async def process_document(file_id: str, db: db_dependency):
raise HTTPException(status_code=500, detail=str(e))
# ============================================================================
# MATCHING ENDPOINTS
# ============================================================================
# @app.post("/match-specific", response_model=MatchingResponse)
# async def match_specific_receipts(file_ids: List[str]):
# """
# Match specific receipts against imported transactions.
# This endpoint takes a list of receipt file IDs and matches them against
# the currently imported transactions using AI-powered matching logic.
# """
# try:
# logger.info(f"Starting match-specific for file IDs: {file_ids}")
# # Check if transactions are imported
# if not stored_transactions:
# logger.warning("No transactions imported")
# raise HTTPException(
# status_code=400,
# detail="No transactions imported. Please upload CSV first.",
# )
# logger.info(f"Found {len(stored_transactions)} stored transactions")
# # Convert stored transactions to Transaction objects
# transactions = []
# for txn in stored_transactions:
# try:
# txn_date = datetime.strptime(txn["txn_date"], "%Y-%m-%d")
# transaction = Transaction(
# id=txn["id"],
# transaction_date=txn_date,
# amount=txn["amount"],
# vendor=txn["payee_name"],
# notes=txn["memo"],
# )
# transactions.append(transaction)
# except Exception as e:
# logger.warning(f"Error converting transaction {txn['id']}: {str(e)}")
# continue
# logger.info(f"Converted {len(transactions)} transactions")
# # Get receipts for the specified file IDs
# receipts = []
# missing_files = []
# for file_id in file_ids:
# if file_id in processed_receipts:
# receipt_data = processed_receipts[file_id]
# logger.info(f"DEBUG: receipt_data for {file_id}: {receipt_data}")
# logger.info(
# f"DEBUG: receipt_data keys for {file_id}: {list(receipt_data.keys())}"
# )
# try:
# # Handle missing date field
# if "date" not in receipt_data or not receipt_data["date"]:
# logger.warning(
# f"Missing date for receipt {file_id}, using current date"
# )
# receipt_date = datetime.now()
# else:
# receipt_date = datetime.strptime(
# receipt_data["date"], "%Y-%m-%d"
# )
# # Handle missing amount field - try multiple possible keys
# amount = receipt_data.get("amount")
# if amount is None:
# amount = receipt_data.get("total_amount")
# if amount is None:
# amount = receipt_data.get("amount_total")
# if amount is None:
# logger.warning(
# f"Missing amount for receipt {file_id}, using 0.0"
# )
# amount = 0.0
# # Ensure amount is a float
# try:
# amount = float(amount)
# except (ValueError, TypeError):
# logger.warning(
# f"Invalid amount '{amount}' for receipt {file_id}, using 0.0"
# )
# amount = 0.0
# logger.info(f"DEBUG: amount for {file_id}: {amount}")
# # Handle missing vendor field
# vendor = receipt_data.get("vendor", "")
# if not vendor:
# logger.warning(
# f"Missing vendor for receipt {file_id}, using 'Unknown'"
# )
# vendor = "Unknown"
# # Handle missing category field
# category = receipt_data.get("category", "Other")
# # Handle description field
# description = receipt_data.get("description", "")
# # Handle tax field
# tax = receipt_data.get("tax", receipt_data.get("tax_amount", 0.0))
# try:
# tax = float(tax)
# except (ValueError, TypeError):
# tax = 0.0
# receipt = Receipt(
# id=file_id,
# file_name=uploaded_files[file_id]["filename"],
# upload_date=uploaded_files[file_id]["upload_date"],
# receipt_date=receipt_date,
# amount=amount,
# tax=tax,
# vendor=vendor,
# category=category,
# description=description,
# )
# receipts.append(receipt)
# logger.info(f"Added receipt: {receipt.vendor} - ${receipt.amount}")
# except Exception as e:
# logger.warning(
# f"Error creating receipt object for {file_id}: {str(e)}"
# )
# missing_files.append(f"{file_id} (error: {str(e)})")
# else:
# logger.warning(f"Receipt {file_id} not found in processed_receipts")
# missing_files.append(f"{file_id} (not found)")
# if missing_files:
# logger.error(f"Missing files: {missing_files}")
# raise HTTPException(
# status_code=400, detail=f"Missing files: {missing_files}"
# )
# logger.info(
# f"Processing {len(receipts)} receipts against {len(transactions)} transactions"
# )
# # Perform matching
# try:
# logger.info("Starting direct matching call (without ThreadPoolExecutor)")
# logger.info(f"matching_engine type: {type(matching_engine)}")
# logger.info(
# f"matching_engine.process_matching type: {type(matching_engine.process_matching)}"
# )
# logger.info(f"receipts type: {type(receipts)}, length: {len(receipts)}")
# logger.info(
# f"transactions type: {type(transactions)}, length: {len(transactions)}"
# )
# matches = matching_engine.process_matching(receipts, transactions)
# logger.info(
# f"Matching completed successfully. Found {len(matches)} matches"
# )
# # Convert matches to response format
# match_responses = []
# for match in matches:
# logger.info(f"Raw match object: {match}")
# logger.info(f" receipt_id: {match.receipt.id}")
# logger.info(f" transaction_id: {match.transaction.id}")
# logger.info(f" confidence_score: {match.confidence_score}")
# logger.info(f" match_reason: {match.match_reason}")
# logger.info(f" receipt_vendor: {match.receipt.vendor}")
# logger.info(f" receipt_amount: {match.receipt.amount}")
# logger.info(f" transaction_vendor: {match.transaction.vendor}")
# logger.info(f" transaction_amount: {match.transaction.amount}")
# match_response = MatchResponse(
# receipt_id=match.receipt.id,
# transaction_id=match.transaction.id,
# confidence_score=match.confidence_score,
# match_reason=match.match_reason,
# receipt_vendor=match.receipt.vendor,
# receipt_amount=match.receipt.amount,
# receipt_description=match.receipt.description,
# receipt_category=match.receipt.category,
# receipt_tax_amount=match.receipt.tax,
# transaction_vendor=match.transaction.vendor,
# transaction_amount=match.transaction.amount,
# )
# match_responses.append(match_response)
# logger.info(
# f"Successfully created MatchResponse for {match.receipt.vendor} -> {match.transaction.vendor}"
# )
# logger.info(f"Formatted {len(match_responses)} match responses")
# # Calculate statistics
# if match_responses:
# high_confidence = sum(
# 1 for m in match_responses if m.confidence_score >= 0.8
# )
# low_confidence = len(match_responses) - high_confidence
# avg_score = sum(m.confidence_score for m in match_responses) / len(
# match_responses
# )
# else:
# high_confidence = low_confidence = avg_score = 0
# stats = {
# "total": len(match_responses),
# "high_confidence": high_confidence,
# "low_confidence": low_confidence,
# "avg_score": round(avg_score, 2),
# }
# logger.info(f"Generated stats: {stats}")
# logger.info(
# f"Match-specific completed successfully with {len(match_responses)} matches"
# )
# return MatchingResponse(matches=match_responses, stats=stats)
# except Exception as e:
# logger.error(f"Exception in matching section: {str(e)}")
# logger.error(f"Exception type: {type(e)}")
# logger.error(f"Exception args: {e.args}")
# logger.error(f"Traceback: {e.__traceback__}")
# raise HTTPException(
# status_code=500, detail=f"Unexpected matching error: {str(e)}"
# )
# except HTTPException:
# raise
# except Exception as e:
# logger.error(f"Unexpected error in match_specific_receipts: {str(e)}")
# raise HTTPException(status_code=500, detail=str(e))
@app.post("/match-specific", response_model=MatchingResponse)
@app.post("/match-specific", response_model=MatchingResponse, tags=["AI Matching"])
async def match_specific_receipts(request: MatchSpecificRequest, db: db_dependency):
"""
Match specific receipts against imported transactions.
@@ -805,31 +535,50 @@ async def match_specific_receipts(request: MatchSpecificRequest, db: db_dependen
f"Starting matching with {len(receipts)} receipts and {len(transactions)} transactions"
)
# Extract user location from user_tax_info if provided
user_location = request.user_location # Default/fallback
if request.user_tax_info:
# Use state_code from user_tax_info (e.g., "ON", "QC", "BC")
user_location = request.user_tax_info.state.state_code
logger.info(
f"Using location from user_tax_info: {user_location} ({request.user_tax_info.state.name}, {request.user_tax_info.country.name})"
)
else:
logger.info(f"Using default/provided user_location: {user_location}")
try:
matching_results = matching_engine.process_matching(receipts, transactions)
matching_results = matching_engine.process_matching(
receipts, transactions, user_location=user_location
)
logger.info(f"Matching completed, got {len(matching_results)} results")
# Convert matching results to response format
match_responses = []
for result in matching_results:
# Get final tax amount from LLM analysis if available, otherwise use receipt's stated tax
final_tax = result.receipt.tax
if result.tax_analysis and "final_tax_amount" in result.tax_analysis:
final_tax = result.tax_analysis["final_tax_amount"]
match_response = MatchResponse(
receipt_id=result.receipt.id,
transaction_id=result.transaction.id
if result.transaction
else "no_match",
confidence_score=result.confidence_score,
confidence_score=result.confidence_score * 100,
match_reason=result.match_reason,
receipt_vendor=result.receipt.vendor,
receipt_amount=result.receipt.amount,
receipt_description=result.receipt.description,
receipt_category=result.receipt.category,
receipt_tax_amount=result.receipt.tax,
receipt_tax_amount=final_tax,
transaction_vendor=result.transaction.vendor
if result.transaction
else "",
transaction_amount=result.transaction.amount
if result.transaction
else 0.0,
tax_analysis=result.tax_analysis,
)
match_responses.append(match_response)
@@ -881,7 +630,7 @@ async def match_specific_receipts(request: MatchSpecificRequest, db: db_dependen
# ============================================================================
@app.get("/transactions")
@app.get("/transactions", tags=["Database Queries"])
async def get_transactions(
db: db_dependency,
user_id: str = None,
@@ -922,7 +671,7 @@ async def get_transactions(
raise HTTPException(status_code=500, detail=str(e))
@app.get("/receipts")
@app.get("/receipts", tags=["Database Queries"])
async def get_receipts(db: db_dependency, limit: int = 100):
"""
Get receipts from the database.
@@ -957,12 +706,42 @@ async def get_receipts(db: db_dependency, limit: int = 100):
raise HTTPException(status_code=500, detail=str(e))
@app.get("/uploaded-files", tags=["Database Queries"])
async def get_uploaded_files(db: db_dependency, limit: int = 100):
"""
Get uploaded files from the database.
"""
try:
uploaded_files = db.query(DBUploadedFile).limit(limit).all()
# Convert to response format
result = []
for file in uploaded_files:
result.append(
{
"file_id": file.file_id,
"filename": file.filename,
"file_path": file.file_path,
"file_type": file.file_type,
"upload_date": file.upload_date.strftime("%Y-%m-%d %H:%M:%S"),
"status": file.status,
}
)
return {
"uploaded_files": result,
"count": len(result),
}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
# ============================================================================
# RULES MANAGEMENT ENDPOINTS
# ============================================================================
@app.post("/rules")
@app.post("/rules", tags=["AI Rules Management"])
async def add_rule(request: RuleRequest):
"""
Add a new AI rule for transaction matching.
@@ -983,7 +762,7 @@ async def add_rule(request: RuleRequest):
raise HTTPException(status_code=500, detail=str(e))
@app.get("/rules")
@app.get("/rules", tags=["AI Rules Management"])
async def get_rules():
"""
Get all current AI rules.
@@ -1007,7 +786,7 @@ async def get_rules():
raise HTTPException(status_code=500, detail=str(e))
@app.delete("/rules/{rule_name}")
@app.delete("/rules/{rule_name}", tags=["AI Rules Management"])
async def delete_rule(rule_name: str):
"""
Delete an AI rule by name.
@@ -1032,20 +811,21 @@ async def delete_rule(rule_name: str):
# ============================================================================
@app.get("/stats")
@app.get("/stats", tags=["Statistics"])
async def get_stats(db: db_dependency):
"""
Get system statistics.
"""
try:
# Count transactions and receipts from database
# Count transactions, receipts, and uploaded files from database
total_transactions = db.query(DBTransaction).count()
total_receipts = db.query(DBReceipt).count()
total_uploaded_files = db.query(DBUploadedFile).count()
return {
"total_transactions": total_transactions,
"total_receipts": total_receipts,
"total_uploaded_files": len(uploaded_files),
"total_uploaded_files": total_uploaded_files,
"rules_count": len(matching_engine.rules_engine.rules),
}
@@ -1056,4 +836,4 @@ async def get_stats(db: db_dependency):
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8343)
uvicorn.run(app, host="0.0.0.0", port=8654)
+131 -1
View File
@@ -1,8 +1,73 @@
from dataclasses import dataclass
from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel
@dataclass
class Address:
"""Address information for tax calculations"""
province: str
city: str
postal_code: str
country: str = "Canada"
@dataclass
class Receipt:
id: str
file_name: str
upload_date: datetime
receipt_date: datetime
amount: float
tax: float
vendor: str
category: str
description: str
# Tax rule fields
billing_address: Optional[Address] = None
shipping_address: Optional[Address] = None
currency: str = "CAD"
is_meals_entertainment: bool = False
@dataclass
class Transaction:
id: str
transaction_date: datetime
amount: float
vendor: str
notes: str
# Tax rule fields
currency: str = "CAD"
fx_rate: Optional[float] = None
@dataclass
class Asset:
"""Asset for depreciation calculations"""
id: str
name: str
purchase_date: datetime
purchase_amount: float
useful_life_years: int
residual_value: float
cca_rate: float # Capital Cost Allowance rate
asset_class: str
@dataclass
class Match:
receipt: Receipt
transaction: Transaction
confidence_score: float
match_reason: str
tax_analysis: Optional[dict] = None
class AddressRequest(BaseModel):
province: str
city: str
@@ -66,6 +131,7 @@ class MatchResponse(BaseModel):
receipt_tax_amount: float
transaction_vendor: str
transaction_amount: float
tax_analysis: Optional[dict] = None
class MatchingResponse(BaseModel):
@@ -89,12 +155,14 @@ class RuleRequest(BaseModel):
class DocumentUploadResponse(BaseModel):
file_id: str
filename: str
file_type: str
upload_date: datetime
status: str
class DocumentProcessResponse(BaseModel):
file_id: str
receipt_id: str
extraction_success: bool
vendor: Optional[str] = None
description: Optional[str] = None
@@ -104,6 +172,7 @@ class DocumentProcessResponse(BaseModel):
category: Optional[str] = None
confidence: Optional[float] = None
error: Optional[str] = None
receipt_currency: Optional[str] = "CAD"
# New tax-related models
@@ -136,7 +205,68 @@ class DepreciationResponse(BaseModel):
success: bool
error: Optional[str] = None
class CityInfo(BaseModel):
"""City information from user tax info"""
id: int
name: str
state_id: int
state_code: str
country_id: int
country_code: str
latitude: Optional[str] = None
longitude: Optional[str] = None
class StateInfo(BaseModel):
"""State/Province information from user tax info"""
id: int
name: str
country_id: int
country_code: str
state_code: str
class CountryInfo(BaseModel):
"""Country information from user tax info"""
id: int
name: str
iso3: str
iso2: str
phone_code: str
capital: str
currency: str
native: Optional[str] = None
region: Optional[str] = None
subregion: Optional[str] = None
emoji: Optional[str] = None
emojiU: Optional[str] = None
class UserTaxInfo(BaseModel):
"""User tax information for location-based tax calculations"""
id: int
user_id: int
company_name: str
tax_id: Optional[str] = ""
tax_id_type: Optional[str] = "EIN"
address_line_1: Optional[str] = ""
address_line_2: Optional[str] = ""
city: CityInfo
state: StateInfo
zip_postal_code: Optional[str] = ""
country: CountryInfo
include_on_invoices: Optional[int] = 1
created_at: Optional[str] = None
updated_at: Optional[str] = None
class MatchSpecificRequest(BaseModel):
file_ids: List[str]
categorization_id: str
user_location: Optional[str] = "Canada" # Kept for backward compatibility
user_tax_info: Optional[UserTaxInfo] = None
View File
+4 -4
View File
@@ -4,8 +4,8 @@ from typing import List, Tuple
import groq
import config
from models import Match, Receipt, Transaction
from config import settings
from schemas import Match, Receipt, Transaction
# Set up logging
logging.basicConfig(level=logging.INFO)
@@ -14,8 +14,8 @@ logger = logging.getLogger(__name__)
class AIMatcher:
def __init__(self, use_batch_matching=True):
self.client = groq.Groq(api_key=config.GROQ_API_KEY)
self.model = "llama3-8b-8192"
self.client = groq.Groq(api_key=settings.GROQ_API_KEY)
self.model = "llama-3.1-8b-instant"
self.max_retries = 3
self.retry_delay = 2 # seconds - increased for rate limiting
self.rate_limit_delay = 1.0 # seconds between API calls
+2 -2
View File
@@ -1,8 +1,8 @@
from dataclasses import dataclass
from typing import Any, Dict, List
from models import Receipt, Transaction
from tax_rules_engine import TaxRulesEngine
from schemas import Receipt, Transaction
from services.tax_rules_engine import TaxRulesEngine
@dataclass
@@ -1,40 +1,41 @@
import groq
import base64
import io
from PIL import Image
import PyPDF2
from typing import Dict, Any, List, Optional
import config
import os
import aiofiles
from datetime import datetime
import logging
import os
from datetime import datetime
from typing import Any, Dict
import aiofiles
import groq
import PyPDF2
from config import settings
logger = logging.getLogger(__name__)
class DocumentProcessor:
def __init__(self):
self.client = groq.Groq(api_key=config.GROQ_API_KEY)
self.client = groq.Groq(api_key=settings.GROQ_API_KEY)
self.model = "meta-llama/llama-4-scout-17b-16e-instruct" # Vision model
async def process_file(self, file_path: str, file_type: str) -> Dict[str, Any]:
"""Process uploaded file and extract receipt data"""
try:
if file_type.lower() in ['jpg', 'jpeg', 'png', 'gif', 'bmp']:
if file_type.lower() in ["jpg", "jpeg", "png", "gif", "bmp"]:
return await self._process_image(file_path)
elif file_type.lower() == 'pdf':
elif file_type.lower() == "pdf":
return await self._process_pdf(file_path)
else:
raise ValueError(f"Unsupported file type: {file_type}")
except Exception as e:
return {"error": str(e)}
async def _process_image(self, image_path: str) -> Dict[str, Any]:
"""Extract data from image using Groq vision"""
try:
# Encode image to base64
base64_image = self._encode_image(image_path)
# Create Groq vision prompt
prompt = """
Analyze this receipt image and extract the following information in JSON format:
@@ -45,7 +46,8 @@ class DocumentProcessor:
"tax_amount": 0.00,
"date": "YYYY-MM-DD",
"category": "Food/Transport/Office/Other",
"confidence": 0.95
"confidence": 0.95,
"currency": "USD"
}
Rules:
@@ -56,10 +58,11 @@ class DocumentProcessor:
- Date should be the date on the receipt
- Categorize based on vendor type (Starbucks=Food, Shell=Transport, etc.)
- Confidence score 0-1 based on how clear the receipt is
- Currency should be the currency used on the receipt (e.g., "USD", "EUR")
Return only valid JSON.
"""
# Call Groq vision API with correct format
response = self.client.chat.completions.create(
messages=[
@@ -78,43 +81,43 @@ class DocumentProcessor:
],
model=self.model,
max_tokens=500,
temperature=0.1
temperature=0.1,
)
# Parse response
result_text = response.choices[0].message.content.strip()
return self._parse_extraction_result(result_text)
except Exception as e:
return {"error": f"Image processing error: {str(e)}"}
def _encode_image(self, image_path: str) -> str:
"""Encode image to base64 string"""
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
return base64.b64encode(image_file.read()).decode("utf-8")
async def _process_pdf(self, pdf_path: str) -> Dict[str, Any]:
"""Extract data from PDF by converting to image first"""
try:
# For now, extract text from PDF and process as text
text_content = self._extract_text_from_pdf(pdf_path)
return self._process_text_content(text_content)
except Exception as e:
return {"error": f"PDF processing error: {str(e)}"}
def _extract_text_from_pdf(self, pdf_path: str) -> str:
"""Extract text from PDF"""
try:
with open(pdf_path, 'rb') as file:
with open(pdf_path, "rb") as file:
pdf_reader = PyPDF2.PdfReader(file)
text = ""
for page in pdf_reader.pages:
text += page.extract_text() + "\n"
return text
except Exception as e:
except Exception:
return ""
def _process_text_content(self, text_content: str) -> Dict[str, Any]:
"""Process text content using Groq (fallback for PDFs)"""
try:
@@ -132,7 +135,8 @@ class DocumentProcessor:
"tax_amount": 0.00,
"date": "YYYY-MM-DD",
"category": "Food/Transport/Office/Other",
"confidence": 0.95
"confidence": 0.95,
"currency": "USD"
}}
Rules:
@@ -143,64 +147,91 @@ class DocumentProcessor:
- Date should be the date on the receipt
- Categorize based on vendor type
- Confidence score 0-1 based on clarity
- Currency should be the currency used on the receipt (e.g., "USD", "EUR")
Return only valid JSON.
"""
response = self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
max_tokens=500,
temperature=0.1
temperature=0.1,
)
result_text = response.choices[0].message.content.strip()
return self._parse_extraction_result(result_text)
except Exception as e:
return {"error": f"Text processing error: {str(e)}"}
def _parse_extraction_result(self, result_text: str) -> Dict[str, Any]:
"""Parse Groq response and extract JSON data"""
try:
# Clean up response and extract JSON
import json
import re
# Find JSON in response - try multiple patterns
json_match = re.search(r'\{.*\}', result_text, re.DOTALL)
json_match = re.search(r"\{.*\}", result_text, re.DOTALL)
if json_match:
json_str = json_match.group()
# Clean up common JSON issues
json_str = re.sub(r',\s*([}\]])', r'\1', json_str) # Remove trailing commas
json_str = re.sub(r'([{,])\s*([a-zA-Z_][a-zA-Z0-9_]*)\s*:', r'\1"\2":', json_str) # Quote unquoted keys
json_str = re.sub(
r",\s*([}\]])", r"\1", json_str
) # Remove trailing commas
json_str = re.sub(
r"([{,])\s*([a-zA-Z_][a-zA-Z0-9_]*)\s*:", r'\1"\2":', json_str
) # Quote unquoted keys
try:
data = json.loads(json_str)
except json.JSONDecodeError as e:
# Try to fix common JSON issues
logger.warning(f"Initial JSON parsing failed: {e}")
# Try to extract individual fields using regex
vendor_match = re.search(r'"vendor"\s*:\s*"([^"]*)"', json_str)
description_match = re.search(r'"description"\s*:\s*"([^"]*)"', json_str)
total_amount_match = re.search(r'"total_amount"\s*:\s*([0-9.]+)', json_str)
tax_amount_match = re.search(r'"tax_amount"\s*:\s*([0-9.]+)', json_str)
description_match = re.search(
r'"description"\s*:\s*"([^"]*)"', json_str
)
total_amount_match = re.search(
r'"total_amount"\s*:\s*([0-9.]+)', json_str
)
tax_amount_match = re.search(
r'"tax_amount"\s*:\s*([0-9.]+)', json_str
)
date_match = re.search(r'"date"\s*:\s*"([^"]*)"', json_str)
category_match = re.search(r'"category"\s*:\s*"([^"]*)"', json_str)
confidence_match = re.search(r'"confidence"\s*:\s*([0-9.]+)', json_str)
confidence_match = re.search(
r'"confidence"\s*:\s*([0-9.]+)', json_str
)
currency_match = re.search(
r'"currency"\s*:\s*"([^"]*)"', json_str
)
data = {
"vendor": vendor_match.group(1) if vendor_match else "",
"description": description_match.group(1) if description_match else "",
"total_amount": float(total_amount_match.group(1)) if total_amount_match else 0.0,
"tax_amount": float(tax_amount_match.group(1)) if tax_amount_match else 0.0,
"description": description_match.group(1)
if description_match
else "",
"total_amount": float(total_amount_match.group(1))
if total_amount_match
else 0.0,
"tax_amount": float(tax_amount_match.group(1))
if tax_amount_match
else 0.0,
"date": date_match.group(1) if date_match else "",
"category": category_match.group(1) if category_match else "Other",
"confidence": float(confidence_match.group(1)) if confidence_match else 0.5
"category": category_match.group(1)
if category_match
else "Other",
"confidence": float(confidence_match.group(1))
if confidence_match
else 0.5,
"currency": currency_match.group(1) if currency_match else "CAD"
}
# Validate and clean data
return {
"vendor": str(data.get("vendor", "")).strip(),
@@ -210,65 +241,69 @@ class DocumentProcessor:
"date": str(data.get("date", "")).strip(),
"category": str(data.get("category", "Other")).strip(),
"confidence": float(data.get("confidence", 0.5)),
"extraction_success": True
"extraction_success": True,
"currency": data.get("currency", "CAD").strip(),
}
else:
# Try to extract fields from plain text
logger.warning("No JSON found in response, attempting text extraction")
return self._extract_from_plain_text(result_text)
except Exception as e:
logger.error(f"JSON parsing error: {str(e)}")
return {"error": f"JSON parsing error: {str(e)}", "extraction_success": False}
return {
"error": f"JSON parsing error: {str(e)}",
"extraction_success": False,
}
def _extract_from_plain_text(self, text: str) -> Dict[str, Any]:
"""Extract receipt data from plain text when JSON parsing fails"""
try:
import re
# Extract vendor (look for common patterns)
vendor_patterns = [
r'(?:vendor|store|merchant|company)\s*[:\-]?\s*([A-Za-z0-9\s&.,]+)',
r'([A-Z][A-Za-z0-9\s&.,]{3,30})', # Capitalized words
r"(?:vendor|store|merchant|company)\s*[:\-]?\s*([A-Za-z0-9\s&.,]+)",
r"([A-Z][A-Za-z0-9\s&.,]{3,30})", # Capitalized words
]
vendor = ""
for pattern in vendor_patterns:
match = re.search(pattern, text, re.IGNORECASE)
if match:
vendor = match.group(1).strip()
break
# Extract amount (look for currency patterns)
amount_patterns = [
r'\$?\s*([0-9,]+\.?[0-9]*)',
r'(?:total|amount|sum)\s*[:\-]?\s*\$?\s*([0-9,]+\.?[0-9]*)',
r"\$?\s*([0-9,]+\.?[0-9]*)",
r"(?:total|amount|sum)\s*[:\-]?\s*\$?\s*([0-9,]+\.?[0-9]*)",
]
total_amount = 0.0
for pattern in amount_patterns:
match = re.search(pattern, text, re.IGNORECASE)
if match:
try:
total_amount = float(match.group(1).replace(',', ''))
total_amount = float(match.group(1).replace(",", ""))
break
except ValueError:
continue
# Extract date
date_patterns = [
r'(\d{4}-\d{2}-\d{2})',
r'(\d{1,2}/\d{1,2}/\d{2,4})',
r'(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d{1,2},?\s+\d{4}',
r"(\d{4}-\d{2}-\d{2})",
r"(\d{1,2}/\d{1,2}/\d{2,4})",
r"(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d{1,2},?\s+\d{4}",
]
date = ""
for pattern in date_patterns:
match = re.search(pattern, text, re.IGNORECASE)
if match:
date = match.group(0)
break
return {
"vendor": vendor or "Unknown",
"total_amount": total_amount,
@@ -276,9 +311,9 @@ class DocumentProcessor:
"date": date or "",
"category": "Other",
"confidence": 0.3, # Low confidence for text extraction
"extraction_success": True
"extraction_success": True,
}
except Exception as e:
logger.error(f"Text extraction error: {str(e)}")
return {
@@ -289,27 +324,27 @@ class DocumentProcessor:
"category": "Other",
"confidence": 0.1,
"extraction_success": False,
"error": f"Text extraction failed: {str(e)}"
"error": f"Text extraction failed: {str(e)}",
}
async def save_uploaded_file(self, file_content: bytes, filename: str) -> str:
"""Save uploaded file to temporary storage"""
try:
# Create uploads directory if it doesn't exist
upload_dir = "uploads"
os.makedirs(upload_dir, exist_ok=True)
# Generate unique filename
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
safe_filename = f"{timestamp}_{filename.replace(' ', '_')}"
file_path = os.path.join(upload_dir, safe_filename)
# Save file
async with aiofiles.open(file_path, 'wb') as f:
async with aiofiles.open(file_path, "wb") as f:
await f.write(file_content)
return file_path
except Exception as e:
raise Exception(f"Failed to save file: {str(e)}")
@@ -318,7 +353,7 @@ class DocumentProcessor:
try:
# Encode image to base64
base64_image = self._encode_image(image_path)
# Create Groq vision prompt for transaction extraction
prompt = """
Analyze this financial document image (bank statement, credit card statement, etc.) and extract ALL transactions in JSON format.
@@ -358,7 +393,7 @@ class DocumentProcessor:
Return only valid JSON.
"""
# Call Groq vision API
response = self.client.chat.completions.create(
messages=[
@@ -377,18 +412,18 @@ class DocumentProcessor:
],
model=self.model,
max_tokens=2000, # Higher token limit for multiple transactions
temperature=0.1
temperature=0.1,
)
# Parse response
result_text = response.choices[0].message.content.strip()
return self._parse_transaction_extraction_result(result_text)
except Exception as e:
return {
"extraction_success": False,
"error": f"Transaction extraction error: {str(e)}",
"transactions": []
"transactions": [],
}
def _parse_transaction_extraction_result(self, result_text: str) -> Dict[str, Any]:
@@ -398,29 +433,30 @@ class DocumentProcessor:
import re
# Find the first '{' and last '}'
start = result_text.find('{')
end = result_text.rfind('}')
start = result_text.find("{")
end = result_text.rfind("}")
if start == -1 or end == -1 or end <= start:
return {
"extraction_success": False,
"error": "Could not find JSON object in AI response",
"transactions": []
"transactions": [],
}
json_str = result_text[start:end+1]
json_str = result_text[start : end + 1]
# Remove trailing commas before } or ]
json_str = re.sub(r',\s*([}\]])', r'\1', json_str)
json_str = re.sub(r",\s*([}\]])", r"\1", json_str)
try:
data = json.loads(json_str)
except Exception as e:
import logging
logging.error(f"JSON parsing error: {str(e)}")
logging.error(f"Offending JSON string:\n{json_str}")
return {
"extraction_success": False,
"error": f"JSON parsing error: {str(e)}",
"transactions": []
"transactions": [],
}
# Validate and clean data
@@ -430,25 +466,28 @@ class DocumentProcessor:
try:
cleaned_txn = {
"date": str(txn.get("date", "")).strip(),
"amount": float(str(txn.get("amount", 0)).replace('$', '').replace(',', '')),
"amount": float(
str(txn.get("amount", 0)).replace("$", "").replace(",", "")
),
"vendor": str(txn.get("vendor", "")).strip(),
"memo": str(txn.get("memo", "")).strip()
"memo": str(txn.get("memo", "")).strip(),
}
cleaned_transactions.append(cleaned_txn)
except Exception as e:
except Exception:
continue
return {
"extraction_success": data.get("extraction_success", True),
"transactions": cleaned_transactions,
"total_transactions": len(cleaned_transactions)
"total_transactions": len(cleaned_transactions),
}
except Exception as e:
import logging
logging.error(f"JSON parsing error (outer): {str(e)}")
return {
"extraction_success": False,
"error": f"JSON parsing error: {str(e)}",
"transactions": []
"transactions": [],
}
def _parse_date_to_iso(self, date_str: str) -> str:
@@ -456,43 +495,53 @@ class DocumentProcessor:
try:
import re
from datetime import datetime
date_str = date_str.strip().upper()
# Handle formats like "MAY 22", "JUN 01", "MAY 22, 2024"
month_pattern = r'(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)\s+(\d{1,2})(?:,\s*(\d{4}))?'
month_pattern = r"(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)\s+(\d{1,2})(?:,\s*(\d{4}))?"
match = re.match(month_pattern, date_str)
if match:
month_abbr, day, year = match.groups()
month_map = {
'JAN': 1, 'FEB': 2, 'MAR': 3, 'APR': 4, 'MAY': 5, 'JUN': 6,
'JUL': 7, 'AUG': 8, 'SEP': 9, 'OCT': 10, 'NOV': 11, 'DEC': 12
"JAN": 1,
"FEB": 2,
"MAR": 3,
"APR": 4,
"MAY": 5,
"JUN": 6,
"JUL": 7,
"AUG": 8,
"SEP": 9,
"OCT": 10,
"NOV": 11,
"DEC": 12,
}
month = month_map[month_abbr]
day = int(day)
year = int(year) if year else datetime.now().year
# Handle 2-digit years
if year < 100:
year += 2000
return f"{year:04d}-{month:02d}-{day:02d}"
# Handle YYYY-MM-DD format
if re.match(r'\d{4}-\d{2}-\d{2}', date_str):
if re.match(r"\d{4}-\d{2}-\d{2}", date_str):
return date_str
# Handle MM/DD/YYYY format
if re.match(r'\d{1,2}/\d{1,2}/\d{4}', date_str):
return datetime.strptime(date_str, '%m/%d/%Y').strftime('%Y-%m-%d')
if re.match(r"\d{1,2}/\d{1,2}/\d{4}", date_str):
return datetime.strptime(date_str, "%m/%d/%Y").strftime("%Y-%m-%d")
# Handle MM/DD/YY format
if re.match(r'\d{1,2}/\d{1,2}/\d{2}', date_str):
return datetime.strptime(date_str, '%m/%d/%y').strftime('%Y-%m-%d')
if re.match(r"\d{1,2}/\d{1,2}/\d{2}", date_str):
return datetime.strptime(date_str, "%m/%d/%y").strftime("%Y-%m-%d")
return None
except Exception:
return None
return None
+992
View File
@@ -0,0 +1,992 @@
import json
import logging
from typing import Any, Dict
import groq
from config import settings
from schemas import Receipt, Transaction
logger = logging.getLogger(__name__)
class LLMTaxAnalyzer:
"""
Uses LLM to intelligently apply tax rules based on context.
Implements four core tax rules:
1. Sales Tax Rule - Based on receipt location (shipping/billing address)
2. Foreign Exchange Rule - Handles currency mismatches
3. Depreciation Rule - Capital assets (based on user location)
4. Meals & Entertainment Rule - 50% tax deduction, 100% accounting deduction
"""
# Provincial tax rates for reference
PROVINCIAL_TAX_RATES = {
"ON": {"rate": 0.13, "name": "HST", "type": "Harmonized"},
"QC": {"rate": 0.14975, "name": "QST + GST", "type": "Combined"},
"BC": {"rate": 0.12, "name": "PST + GST", "type": "Combined"},
"AB": {"rate": 0.05, "name": "GST", "type": "Federal only"},
"SK": {"rate": 0.11, "name": "PST + GST", "type": "Combined"},
"MB": {"rate": 0.12, "name": "PST + GST", "type": "Combined"},
"NS": {"rate": 0.15, "name": "HST", "type": "Harmonized"},
"NB": {"rate": 0.15, "name": "HST", "type": "Harmonized"},
"NL": {"rate": 0.15, "name": "HST", "type": "Harmonized"},
"PE": {"rate": 0.15, "name": "HST", "type": "Harmonized"},
"NT": {"rate": 0.05, "name": "GST", "type": "Federal only"},
"NU": {"rate": 0.05, "name": "GST", "type": "Federal only"},
"YT": {"rate": 0.05, "name": "GST", "type": "Federal only"},
}
# CCA rates by asset class (simplified)
CCA_RATES = {
"vehicles": 0.30, # Class 10
"computer_equipment": 0.55, # Class 50
"furniture": 0.20, # Class 8
"buildings": 0.04, # Class 1
"machinery": 0.20, # Class 8
}
def __init__(self):
self.client = groq.Groq(api_key=settings.GROQ_API_KEY)
self.model = "llama-3.1-8b-instant"
self.max_retries = 3
def analyze_and_apply_tax_rules_batch(
self,
matches: list, # List of Match objects
user_location: str = "ON",
) -> list:
"""
Batch process all matches in a SINGLE LLM call to reduce costs.
Analyzes all receipt-transaction pairs together and applies tax rules.
Falls back to individual processing if batch fails.
"""
if not matches:
return matches
logger.info(f"Starting batch tax analysis for {len(matches)} matches")
# Build batch context for all matches
try:
batch_context = self._build_batch_analysis_context(matches, user_location)
except Exception as e:
logger.error(f"Error building batch context: {str(e)}")
# If we can't even build the context, return matches as-is
for match in matches:
match.match_reason += " (Batch analysis setup failed)"
return matches
# Get LLM analysis for ALL matches at once
llm_batch_analysis = self._get_llm_tax_analysis_batch(
batch_context, len(matches)
)
# Check if we got any analysis back
if not llm_batch_analysis:
logger.warning("Batch LLM analysis returned empty results")
# Fallback: Try processing each match individually if batch size is small
if (
len(matches) <= 5
): # Only fallback for small batches to avoid excessive API calls
logger.info(
f"Attempting individual processing fallback for {len(matches)} matches"
)
return self._process_matches_individually(matches, user_location)
else:
logger.warning(
f"Batch too large ({len(matches)} matches) for individual fallback - returning matches without enhanced tax analysis"
)
for match in matches:
match.match_reason += " (Batch tax analysis unavailable)"
return matches
logger.info(f"Received batch analysis for {len(llm_batch_analysis)} matches")
# Apply results to each match
enhanced_matches = []
for i, match in enumerate(matches):
try:
# Get the analysis for this specific match from the batch results
match_key = f"match_{i}"
match_analysis = llm_batch_analysis.get(match_key, {})
if match_analysis and isinstance(match_analysis, dict):
# Apply the tax analysis to this match
enhanced_match = self._apply_tax_analysis_to_match(
match, match_analysis
)
enhanced_matches.append(enhanced_match)
else:
# No analysis available for this match, use as-is
logger.warning(
f"No analysis found for match {i} (key: {match_key})"
)
match.match_reason += " (Tax analysis incomplete)"
enhanced_matches.append(match)
except Exception as e:
logger.error(f"Error applying tax analysis to match {i}: {str(e)}")
match.match_reason += " (Tax analysis error)"
enhanced_matches.append(match)
logger.info(
f"Completed batch tax analysis, enhanced {len(enhanced_matches)} matches"
)
# logger.info(
# f"\n\n\nFinal batch enhanced matches: {enhanced_matches}"
# )
return enhanced_matches
def _process_matches_individually(self, matches: list, user_location: str) -> list:
"""
Fallback method: Process matches one at a time using the legacy method.
Only used when batch processing fails and batch size is small.
"""
logger.info(f"Processing {len(matches)} matches individually as fallback")
enhanced_matches = []
for i, match in enumerate(matches):
try:
# Use the legacy single-match analysis method
tax_analysis = self.analyze_and_apply_tax_rules(
match.receipt, match.transaction, user_location
)
# Apply the analysis to the match
enhanced_match = self._apply_tax_analysis_to_match(match, tax_analysis)
enhanced_matches.append(enhanced_match)
logger.info(
f"Successfully processed match {i + 1}/{len(matches)} individually"
)
except Exception as e:
logger.error(f"Error in individual processing for match {i}: {str(e)}")
match.match_reason += " (Individual tax analysis failed)"
enhanced_matches.append(match)
return enhanced_matches
def analyze_and_apply_tax_rules(
self,
receipt: Receipt,
transaction: Transaction,
user_location: str = "ON", # Default to Ontario
) -> Dict[str, Any]:
"""
Legacy single-match analysis method (kept for backward compatibility).
Use analyze_and_apply_tax_rules_batch() for better performance.
Use LLM to intelligently analyze and apply all tax rules:
1. Sales tax based on receipt location (shipping/billing address priority)
2. Foreign exchange rules for currency mismatches
3. Depreciation rules for capital assets (based on user location)
4. Meals & Entertainment deduction rules
"""
# Prepare context for LLM
analysis_context = self._build_analysis_context(
receipt, transaction, user_location
)
# Get LLM analysis
llm_analysis = self._get_llm_tax_analysis(analysis_context)
# Parse and structure the results
structured_results = self._structure_analysis_results(
llm_analysis, receipt, transaction, user_location
)
return structured_results
def _build_analysis_context(
self, receipt: Receipt, transaction: Transaction, user_location: str
) -> str:
"""Build comprehensive context for LLM analysis"""
# Extract location information
receipt_location = self._extract_receipt_location(receipt)
# Normalize user_location to province code (handle "Canada", "Ontario", "ON", etc.)
user_province = self._normalize_location_to_province(user_location)
logger.info(
f"Building tax analysis context - User Location: {user_location} → Province Code: {user_province}"
)
# Build tax rates reference
tax_rates_info = json.dumps(self.PROVINCIAL_TAX_RATES, indent=2)
cca_rates_info = json.dumps(self.CCA_RATES, indent=2)
context = f"""
RECEIPT DETAILS:
- Vendor: {receipt.vendor}
- Amount: ${receipt.amount:.2f}
- Currency: {receipt.currency}
- Date: {receipt.receipt_date.strftime("%Y-%m-%d")}
- Category: {receipt.category}
- Description: {receipt.description}
- Billing Address: {self._format_address(receipt.billing_address)}
- Shipping Address: {self._format_address(receipt.shipping_address)}
- Is Meals & Entertainment: {receipt.is_meals_entertainment}
TRANSACTION DETAILS:
- Vendor: {transaction.vendor}
- Amount: ${transaction.amount:.2f}
- Currency: {transaction.currency}
- Date: {transaction.transaction_date.strftime("%Y-%m-%d")}
- Notes: {transaction.notes}
- FX Rate: {transaction.fx_rate if transaction.fx_rate else "N/A"}
USER CONTEXT:
- User Location (Province): {user_province}
- User Province Tax Rate: {self.PROVINCIAL_TAX_RATES.get(user_province, {}).get("rate", 0.13) * 100}%
- User Tax Type: {self.PROVINCIAL_TAX_RATES.get(user_province, {}).get("name", "HST")}
RECEIPT LOCATION DETECTED:
{receipt_location}
PROVINCIAL TAX RATES REFERENCE:
{tax_rates_info}
CCA DEPRECIATION RATES BY ASSET CLASS:
{cca_rates_info}
"""
return context
def _normalize_location_to_province(self, location: str) -> str:
"""
Normalize various location formats to province code.
Handles: "ON", "Ontario", "Canada", etc.
"""
location_upper = location.upper().strip()
# Direct province code match
if location_upper in self.PROVINCIAL_TAX_RATES:
return location_upper
# Map full province names to codes
province_name_map = {
"ONTARIO": "ON",
"QUEBEC": "QC",
"BRITISH COLUMBIA": "BC",
"ALBERTA": "AB",
"SASKATCHEWAN": "SK",
"MANITOBA": "MB",
"NOVA SCOTIA": "NS",
"NEW BRUNSWICK": "NB",
"NEWFOUNDLAND AND LABRADOR": "NL",
"NEWFOUNDLAND": "NL",
"PRINCE EDWARD ISLAND": "PE",
"NORTHWEST TERRITORIES": "NT",
"NUNAVUT": "NU",
"YUKON": "YT",
}
if location_upper in province_name_map:
return province_name_map[location_upper]
# Default to Ontario if country is Canada or unspecified
if location_upper in ["CANADA", "CAN", "CA", ""]:
logger.warning(f"Location '{location}' is too generic, defaulting to ON")
return "ON"
# If nothing matches, default to Ontario
logger.warning(f"Could not parse location '{location}', defaulting to ON")
return "ON"
def _extract_receipt_location(self, receipt: Receipt) -> str:
"""Extract and format receipt location information"""
# Priority: Use shipping address if available, then billing
location = (
receipt.shipping_address
if receipt.shipping_address
else receipt.billing_address
)
if location:
return f"""
- Province: {location.province}
- City: {location.city}
- Country: {location.country}
- Postal Code: {location.postal_code}
"""
else:
return "- No address information available (will use user location)"
def _format_address(self, address) -> str:
"""Format address for display"""
if address:
return f"{address.city}, {address.province}, {address.country} ({address.postal_code})"
return "Not provided"
def _get_llm_tax_analysis(self, context: str) -> str:
"""Get tax rule analysis from LLM"""
prompt = f"""
You are a tax expert analyzing a receipt-transaction match. Apply the following tax rules intelligently:
And you are to calculate the tax for the receipt based on the context provided.
{context}
=== FOUR CORE TAX RULES ===
### 1. SALES TAX RULE
**Purpose**: Calculate and apply correct sales tax based on shipping and billing addresses.
**Key Principles**:
- When billing and shipping addresses are THE SAME: Apply sales tax based on that address location.
- When billing and shipping addresses are DIFFERENT: Apply sales tax based on the SHIPPING address.
- Tax rate is determined by the RECEIPT'S location, NOT the user's location (unless no receipt location).
**Scenario Examples**:
a) User in Ontario, Receipt from Quebec:
- Apply Quebec's tax rate (14.975% QST+GST), not Ontario's 13% HST
- The user's location is only for depreciation purposes
b) User in Ontario, Receipt from USA (New York):
- DO NOT apply Canadian sales tax
- This is an international transaction
- Flag for FX review instead
c) User in USA (New York), Receipt from California:
- Apply California's sales tax rate (receipt location)
- Not New York's rate (user location)
d) User in Ontario, Receipt has NO address information:
- DEFAULT to user's location (Ontario 13% HST)
- This is the fallback when receipt location is unknown
**Tax Calculation**:
- Compare calculated tax vs stated tax on receipt
- Flag discrepancies for review
### 2. FOREIGN EXCHANGE (FX) RULE
**Purpose**: Handle currency mismatches between receipts and transactions.
**Actions**:
- Identify when receipt currency ≠ transaction currency (e.g., USD vs CAD)
- Calculate the absolute discrepancy: |receipt_amount - transaction_amount|
- ALWAYS flag for manual review - DO NOT fetch exchange rates automatically
- If FX rate is provided in transaction data, note it but still require manual review
**Examples**:
- Transaction: USD $100, Receipt: CAD $125 → Discrepancy: $25, Flag for review
- The user must manually approve or adjust the FX difference
### 3. DEPRECIATION RULE
**Purpose**: Calculate depreciation for assets using two methods.
**Key Principle**: Depreciation is ALWAYS based on USER'S location, NOT receipt location.
**Asset Identification**:
- Only applies to capital assets: vehicles, equipment, furniture, buildings, machinery
- Identify from receipt category and description
- Typical threshold: Assets generally > $500
**Two Methods Required**:
a) **Straight-Line Depreciation** (for accounting purposes):
Formula: (Cost - Residual Value) / Useful Life
Example: Asset $10,000, 5-year life, $1,000 residual = $1,800/year
b) **CCA Depreciation** (for tax purposes - Canada):
Method: Declining Balance
Formula: Book Value × CCA Rate each year
Example: Truck $20,000, 30% CCA:
- Year 1: $20,000 × 30% = $6,000
- Year 2: ($20,000 - $6,000) × 30% = $4,200
- Continues declining each year
**CCA Classes** (Canada):
- Vehicles: 30% (Class 10)
- Computer Equipment: 55% (Class 50)
- Furniture/Machinery: 20% (Class 8)
- Buildings: 4% (Class 1)
### 4. MEALS & ENTERTAINMENT TAX DEDUCTION RULE
**Purpose**: Apply correct deductions for meals and entertainment expenses.
**Deduction Rules**:
- **For Tax Purposes**: Only 50% of total receipt amount is deductible
- **For Accounting Purposes**: 100% of total receipt amount is deductible
- **Sales Tax**: Full sales tax amount is deductible for accounting
**Example**:
- Receipt: $100 meal + $12 sales tax = $112 total
- **Tax Deduction**: $50 (50% of meal) + $12 (full tax) = $62
- **Accounting Deduction**: $100 (full meal) + $12 (full tax) = $112
=== LOCATION-BASED SCENARIO HANDLING ===
**When Receipt Location ≠ User Location**:
1. **Sales Tax**: Use RECEIPT's location for tax calculation
- Exception: If international (different country), no Canadian sales tax + flag FX
- Exception: If no location on receipt, use user's location as default
2. **Depreciation**: ALWAYS use USER's location for depreciation rules
- Receipt location is irrelevant for depreciation
- Apply user's country/province depreciation methods
3. **FX Handling**:
- If receipt currency ≠ transaction currency: Flag for manual review
- Do NOT automatically fetch or apply exchange rates
4. **Missing Location**:
- If receipt has no address: Default to user's location for sales tax
- Still apply user's location for depreciation
=== ANALYSIS REQUIRED ===
Provide a structured JSON response with the following format:
**CRITICAL INSTRUCTION FOR final_tax_amount:**
- This field MUST contain ONLY the calculated sales tax amount in dollars
- This is NOT the total amount including tax
- This is ONLY the tax portion (HST/GST/PST/QST)
- Example: If receipt total is $100 and calculated tax is $13, return 13.00 (not 113.00)
- For meals & entertainment: Return the FULL calculated tax amount (not the 50% adjusted amount)
{{
"final_tax_amount": XX.XX, // ONLY the calculated tax amount (e.g., 13.00 for $100 + $13 HST)
"sales_tax": {{
"applicable_province": "XX",
"applicable_rate": 0.XX,
"tax_name": "HST/GST/PST/QST",
"calculated_tax": XX.XX, // This should match final_tax_amount above
"stated_tax": XX.XX,
"discrepancy": XX.XX,
"reason": "Detailed explanation",
"requires_review": true/false
}},
"foreign_exchange": {{
"currency_mismatch": true/false,
"receipt_currency": "XXX",
"transaction_currency": "XXX",
"receipt_amount": XX.XX,
"transaction_amount": XX.XX,
"discrepancy": XX.XX,
"requires_manual_review": true/false,
"reason": "Explanation of FX situation"
}},
"depreciation": {{
"is_capital_asset": true/false,
"asset_class": "category name or N/A",
"suggested_cca_rate": 0.XX,
"straight_line_applicable": true/false,
"cca_applicable": true/false,
"straight_line_example": "Brief calculation example if applicable",
"cca_example": "Brief calculation example if applicable",
"reason": "Why this is/isn't a capital asset, which CCA class, and why depreciation based on user's location"
}},
"meals_entertainment": {{
"is_meals_entertainment": true/false,
"tax_deduction_amount": XX.XX,
"accounting_deduction_amount": XX.XX,
"sales_tax_included": XX.XX,
"reason": "Explanation of M&E rule application"
}},
"confidence_adjustment": {{
"boost": 0.XX,
"reduce": 0.XX,
"reason": "Why confidence should be adjusted based on tax analysis"
}},
"overall_assessment": "Comprehensive summary: which rules applied, why, what location used for what purpose, and any required actions"
}}
**IMPORTANT**: The "final_tax_amount" field at the top level must contain the final calculated tax amount. This should be the calculated_tax from sales_tax analysis. If this is a meals & entertainment expense, ensure you return the FULL tax amount here (not the 50% adjusted amount).
**Critical Reminders**:
- Sales tax uses RECEIPT location (or user location if receipt has none)
- Depreciation ALWAYS uses USER location
- For different addresses, use SHIPPING address for sales tax
- International transactions: no Canadian tax + FX flag
- Be precise with all calculations
- Always explain your reasoning clearly
"""
try:
response = self.client.chat.completions.create(
model=self.model,
messages=[
{
"role": "system",
"content": "You are a Canadian tax expert. Analyze transactions and apply tax rules accurately. Always return valid JSON.",
},
{"role": "user", "content": prompt},
],
temperature=0.1, # Low temperature for consistent, factual responses
max_tokens=2000,
)
content = response.choices[0].message.content.strip()
logger.info(f"LLM tax analysis received: {len(content)} characters")
return content
except Exception as e:
logger.error(f"Error getting LLM tax analysis: {str(e)}")
return self._get_fallback_analysis()
def _get_fallback_analysis(self) -> str:
"""Return fallback analysis if LLM fails"""
return json.dumps(
{
"final_tax_amount": 0.0,
"sales_tax": {
"applicable_province": "ON",
"applicable_rate": 0.13,
"tax_name": "HST",
"calculated_tax": 0.0,
"stated_tax": 0.0,
"discrepancy": 0.0,
"reason": "LLM analysis failed - using defaults",
"requires_review": True,
},
"foreign_exchange": {
"currency_mismatch": False,
"requires_manual_review": False,
"reason": "Analysis not available",
},
"depreciation": {
"is_capital_asset": False,
"reason": "Analysis not available",
},
"meals_entertainment": {
"is_meals_entertainment": False,
"reason": "Analysis not available",
},
"confidence_adjustment": {
"boost": 0.0,
"reduce": 0.1,
"reason": "LLM analysis failed - recommend manual review",
},
"overall_assessment": "Automatic analysis failed. Manual review recommended.",
}
)
def _structure_analysis_results(
self,
llm_response: str,
receipt: Receipt,
transaction: Transaction,
user_location: str,
) -> Dict[str, Any]:
"""Parse LLM response and structure it for application"""
try:
# Extract JSON from LLM response (may have markdown code blocks)
json_str = llm_response
if "```json" in llm_response:
json_str = llm_response.split("```json")[1].split("```")[0].strip()
elif "```" in llm_response:
json_str = llm_response.split("```")[1].split("```")[0].strip()
analysis = json.loads(json_str)
# Add metadata
analysis["metadata"] = {
"user_location": user_location,
"receipt_id": receipt.id,
"transaction_id": transaction.id,
"analysis_method": "LLM-based",
"model": self.model,
}
return analysis
except json.JSONDecodeError as e:
logger.error(f"Failed to parse LLM response as JSON: {str(e)}")
logger.error(f"LLM response was: {llm_response}")
# Return structured fallback
return {
"final_tax_amount": receipt.tax if receipt.tax else 0.0,
"sales_tax": {
"requires_review": True,
"reason": "Failed to parse LLM response",
},
"foreign_exchange": {
"requires_manual_review": receipt.currency != transaction.currency
},
"depreciation": {"is_capital_asset": False},
"confidence_adjustment": {
"boost": 0.0,
"reduce": 0.15,
"reason": "Analysis parsing failed",
},
"overall_assessment": "Analysis failed. Manual review required.",
"error": str(e),
"metadata": {
"user_location": user_location,
"analysis_method": "fallback",
},
}
def _build_batch_analysis_context(self, matches: list, user_location: str) -> str:
"""Build comprehensive context for batch LLM analysis of all matches"""
# Normalize user_location to province code
user_province = self._normalize_location_to_province(user_location)
logger.info(
f"Building batch tax analysis context for {len(matches)} matches - User Location: {user_location} → Province Code: {user_province}"
)
# Build tax rates and CCA references once
tax_rates_info = json.dumps(self.PROVINCIAL_TAX_RATES, indent=2)
cca_rates_info = json.dumps(self.CCA_RATES, indent=2)
# Build match entries
matches_info = []
for i, match in enumerate(matches):
receipt = match.receipt
transaction = match.transaction
receipt_location = self._extract_receipt_location(receipt)
match_info = f"""
MATCH {i} (ID: match_{i}):
Receipt Details:
- Vendor: {receipt.vendor}
- Amount: ${receipt.amount:.2f}
- Currency: {receipt.currency}
- Date: {receipt.receipt_date.strftime("%Y-%m-%d")}
- Category: {receipt.category}
- Description: {receipt.description}
- Billing Address: {self._format_address(receipt.billing_address)}
- Shipping Address: {self._format_address(receipt.shipping_address)}
- Is Meals & Entertainment: {receipt.is_meals_entertainment}
Transaction Details:
- Vendor: {transaction.vendor}
- Amount: ${transaction.amount:.2f}
- Currency: {transaction.currency}
- Date: {transaction.transaction_date.strftime("%Y-%m-%d")}
- Notes: {transaction.notes}
- FX Rate: {transaction.fx_rate if transaction.fx_rate else "N/A"}
Receipt Location Detected:
{receipt_location}
"""
matches_info.append(match_info)
matches_section = "\n".join(matches_info)
context = f"""
USER CONTEXT:
- User Location (Province): {user_province}
- User Province Tax Rate: {self.PROVINCIAL_TAX_RATES.get(user_province, {}).get("rate", 0.13) * 100}%
- User Tax Type: {self.PROVINCIAL_TAX_RATES.get(user_province, {}).get("name", "HST")}
PROVINCIAL TAX RATES REFERENCE:
{tax_rates_info}
CCA DEPRECIATION RATES BY ASSET CLASS:
{cca_rates_info}
=== MATCHES TO ANALYZE ({len(matches)} total) ===
{matches_section}
"""
return context
def _get_llm_tax_analysis_batch(self, context: str, num_matches: int) -> Dict[str, Any]:
"""Get tax rule analysis from LLM for ALL matches in a single call"""
prompt = f"""
You are a Canadian tax expert analyzing MULTIPLE receipt-transaction matches.
{context}
=== FOUR CORE TAX RULES ===
### 1. SALES TAX RULE
**Purpose**: Calculate and apply correct sales tax based on shipping and billing addresses.
**Key Principles**:
- When billing and shipping addresses are THE SAME: Apply sales tax based on that address location.
- When billing and shipping addresses are DIFFERENT: Apply sales tax based on the SHIPPING address.
- Tax rate is determined by the RECEIPT'S location, NOT the user's location (unless no receipt location).
**Scenario Examples**:
a) User in Ontario, Receipt from Quebec:
- Apply Quebec's tax rate (14.975% QST+GST), not Ontario's 13% HST
b) User in Ontario, Receipt from USA (New York):
- DO NOT apply Canadian sales tax
- This is an international transaction
- Flag for FX review instead
c) User in Ontario, Receipt has NO address information:
- DEFAULT to user's location (Ontario 13% HST)
**Tax Calculation**:
- Compare calculated tax vs stated tax on receipt
- Flag discrepancies for review
### 2. FOREIGN EXCHANGE (FX) RULE
**Purpose**: Handle currency mismatches between receipts and transactions.
**Actions**:
- Identify when receipt currency ≠ transaction currency (e.g., USD vs CAD)
- Calculate expected transaction amount using FX rate if available
- Flag discrepancies > $5 or 5% for manual review
- If FX rate missing but currencies differ, flag for review
### 3. DEPRECIATION RULE
**Purpose**: Identify capital assets requiring depreciation based on USER'S location.
**Critical**: Depreciation is ALWAYS based on the USER'S location (for Canadian tax filing), NOT the receipt location.
**Capital Asset Criteria**:
- Cost > $500 typically
- Useful life > 1 year
- Examples: computers, vehicles, furniture, machinery, buildings
**CCA Classes**: Assign appropriate class and rate based on asset type and user's jurisdiction
### 4. MEALS & ENTERTAINMENT RULE
**Purpose**: Apply 50% tax deduction limit for M&E expenses.
**Actions**:
- Identify M&E expenses (meals, entertainment, client dinners, etc.)
- Tax Deduction: 50% of total amount (including tax)
- Accounting Deduction: 100% of total amount (including tax)
- Always include sales tax in both calculations
=== YOUR TASK ===
Analyze EACH match and return a JSON object where each key is the match ID and the value is the complete tax analysis.
**CRITICAL INSTRUCTION FOR final_tax_amount:**
- This field MUST contain ONLY the calculated sales tax amount in dollars
- This is NOT the total amount including tax
- This is ONLY the tax portion (HST/GST/PST/QST)
- Example: If receipt total is $100 and calculated tax is $13, return 13.00 (not 113.00)
- For meals & entertainment: Return the FULL calculated tax amount (not the 50% adjusted amount)
- VERIFY: final_tax_amount should equal sales_tax.calculated_tax
-
Return your response as a SINGLE JSON object in this format:
{{
"match_0": {{
"final_tax_amount": XX.XX, // ONLY the calculated tax amount
"sales_tax": {{
"applicable_province": "XX",
"applicable_rate": 0.XX,
"tax_name": "HST/GST/PST",
"calculated_tax": XX.XX,
"stated_tax": XX.XX,
"discrepancy": XX.XX,
"reason": "Detailed explanation",
"requires_review": true/false
}},
"foreign_exchange": {{
"currency_mismatch": true/false,
"receipt_currency": "XXX",
"transaction_currency": "XXX",
"expected_transaction_amount": XX.XX,
"actual_transaction_amount": XX.XX,
"discrepancy": XX.XX,
"requires_manual_review": true/false,
"reason": "Explanation"
}},
"depreciation": {{
"is_capital_asset": true/false,
"asset_class": "class_XX",
"cca_rate": 0.XX,
"applicable_jurisdiction": "XX",
"reason": "Explanation"
}},
"meals_entertainment": {{
"is_meals_entertainment": true/false,
"tax_deduction_amount": XX.XX,
"accounting_deduction_amount": XX.XX,
"sales_tax_included": XX.XX,
"reason": "Explanation"
}},
"confidence_adjustment": {{
"boost": 0.XX,
"reduce": 0.XX,
"reason": "Why confidence should be adjusted"
}},
"overall_assessment": "Summary for this match"
}},
"match_1": {{
... same structure ...
}},
... for all {num_matches} matches ...
}}
"""
try:
response = self.client.chat.completions.create(
model=self.model,
messages=[
{
"role": "system",
"content": "You are a Canadian tax expert. Analyze multiple transactions in batch and apply tax rules accurately. Return ONLY valid JSON - no markdown code blocks, no explanations, just pure JSON.",
},
{"role": "user", "content": prompt},
],
temperature=0.1, # Low temperature for consistent, factual responses
max_tokens=8000, # Higher limit for batch processing
)
content = response.choices[0].message.content
# Validate that we got content
if not content:
logger.error("LLM returned empty response")
return {}
content = content.strip()
# Check if content is empty after stripping
if not content:
logger.error("LLM returned whitespace-only response")
return {}
logger.info(
f"LLM batch tax analysis received: {len(content)} characters for {num_matches} matches"
)
logger.debug(f"Raw LLM response: {content[:500]}...") # Log first 500 chars
# Parse the JSON response - handle various markdown code block formats
json_str = content
# Check for markdown code blocks with various language identifiers
if "```json" in content:
json_str = content.split("```json")[1].split("```")[0].strip()
elif "```javascript" in content:
json_str = content.split("```javascript")[1].split("```")[0].strip()
elif "```js" in content:
json_str = content.split("```js")[1].split("```")[0].strip()
elif "```" in content:
# Generic code block - extract content between first ``` and last ```
parts = content.split("```")
if len(parts) >= 3:
# Take the second part (index 1), which is between first and second ```
json_str = parts[1].strip()
# Remove language identifier if it's on the first line
lines = json_str.split("\n", 1)
if len(lines) > 1 and lines[0].strip() in [
"json",
"javascript",
"js",
"",
]:
json_str = lines[1].strip()
# Validate JSON string is not empty
if not json_str:
logger.error("Extracted JSON string is empty")
logger.error(f"Original content was: {content[:500]}")
return {}
batch_analysis = json.loads(json_str)
# Validate we got a dictionary back
if not isinstance(batch_analysis, dict):
logger.error(f"LLM returned non-dict type: {type(batch_analysis)}")
return {}
logger.info(
f"Successfully parsed batch analysis with {len(batch_analysis)} matches"
)
return batch_analysis
except json.JSONDecodeError as e:
logger.error(f"JSON decode error in batch LLM tax analysis: {str(e)}")
logger.error(
f"Failed to parse: {json_str[:500] if 'json_str' in locals() else 'N/A'}"
)
return {}
except Exception as e:
logger.error(f"Error getting batch LLM tax analysis: {str(e)}")
logger.error(f"Exception type: {type(e).__name__}")
# Return empty dict so each match can handle fallback individually
return {}
def _apply_tax_analysis_to_match(self, match, tax_analysis: Dict[str, Any]):
"""Apply tax analysis results to a match object"""
# **CRITICAL FIX: Ensure final_tax_amount matches calculated_tax**
final_tax = tax_analysis.get("final_tax_amount", 0.0)
calculated_tax = tax_analysis.get("sales_tax", {}).get("calculated_tax", 0.0)
# If there's a mismatch, use calculated_tax as the source of truth
if abs(final_tax - calculated_tax) > 0.01:
logger.warning(
f"Correcting final_tax_amount mismatch for {match.receipt.vendor}: "
f"LLM returned final_tax_amount={final_tax}, but calculated_tax={calculated_tax}. "
f"Using calculated_tax as final value."
)
tax_analysis["final_tax_amount"] = calculated_tax
# Special case: If final_tax is 0 but calculated_tax > 0, always use calculated_tax
if final_tax == 0.0 and calculated_tax > 0.0:
logger.warning(
f"Correcting zero final_tax_amount for {match.receipt.vendor}: "
f"LLM returned 0 but calculated {calculated_tax} HST. Setting final_tax_amount={calculated_tax}"
)
tax_analysis["final_tax_amount"] = calculated_tax
tax_analysis["sales_tax"]["requires_review"] = True
# Apply the corrected tax analysis
match.tax_analysis = tax_analysis
logger.debug(
f"Applied tax analysis to match: {match.receipt.vendor} -> "
f"final_tax_amount={tax_analysis['final_tax_amount']}"
)
# Apply confidence adjustments based on tax analysis
confidence_adj = tax_analysis.get("confidence_adjustment", {})
# Boost confidence if tax rules validate the match
boost = confidence_adj.get("boost", 0.0)
if boost > 0:
match.confidence_score = min(1.0, match.confidence_score + boost)
match.match_reason += f" (Tax analysis confidence boost: +{boost:.2f})"
# Reduce confidence if tax issues detected
reduce = confidence_adj.get("reduce", 0.0)
if reduce > 0:
match.confidence_score = max(0.0, match.confidence_score - reduce)
match.match_reason += f" (Tax issues detected: -{reduce:.2f})"
# Add flags for manual review if needed
review_flags = []
# Check sales tax issues
sales_tax = tax_analysis.get("sales_tax", {})
if sales_tax.get("requires_review", False):
review_flags.append("Sales Tax Review Required")
# Check FX issues
fx_analysis = tax_analysis.get("foreign_exchange", {})
if fx_analysis.get("requires_manual_review", False):
review_flags.append(
f"FX Review Required (Discrepancy: ${fx_analysis.get('discrepancy', 0):.2f})"
)
# Check depreciation
depreciation = tax_analysis.get("depreciation", {})
if depreciation.get("is_capital_asset", False):
review_flags.append(
f"Capital Asset - Depreciation Applicable ({depreciation.get('asset_class', 'Unknown')})"
)
# Check meals & entertainment
meals_ent = tax_analysis.get("meals_entertainment", {})
if meals_ent.get("is_meals_entertainment", False):
tax_deduction = meals_ent.get("tax_deduction_amount", 0)
accounting_deduction = meals_ent.get("accounting_deduction_amount", 0)
review_flags.append(
f"M&E Expense - Tax Deduction: ${tax_deduction:.2f} (50%), Accounting: ${accounting_deduction:.2f} (100%)"
)
# Add review flags to match reason
if review_flags:
match.match_reason += " | REVIEW: " + "; ".join(review_flags)
return match
+583
View File
@@ -0,0 +1,583 @@
"""
Manual Tax Calculator - Rule-based tax calculations without LLM
Implements the four core tax rules based on rules.py specifications
"""
import logging
from typing import Any, Dict, Optional, Tuple
from schemas import Receipt, Transaction
logger = logging.getLogger(__name__)
class ManualTaxCalculator:
"""
Deterministic tax calculator based on explicit rules from rules.py
No LLM calls - pure business logic for accurate, consistent tax calculations
"""
# Provincial tax rates for Canada
PROVINCIAL_TAX_RATES = {
"ON": {"rate": 0.13, "name": "HST", "type": "Harmonized"},
"QC": {"rate": 0.14975, "name": "QST + GST", "type": "Combined"},
"BC": {"rate": 0.12, "name": "PST + GST", "type": "Combined"},
"AB": {"rate": 0.05, "name": "GST", "type": "Federal only"},
"SK": {"rate": 0.11, "name": "PST + GST", "type": "Combined"},
"MB": {"rate": 0.12, "name": "PST + GST", "type": "Combined"},
"NS": {"rate": 0.15, "name": "HST", "type": "Harmonized"},
"NB": {"rate": 0.15, "name": "HST", "type": "Harmonized"},
"NL": {"rate": 0.15, "name": "HST", "type": "Harmonized"},
"PE": {"rate": 0.15, "name": "HST", "type": "Harmonized"},
"NT": {"rate": 0.05, "name": "GST", "type": "Federal only"},
"NU": {"rate": 0.05, "name": "GST", "type": "Federal only"},
"YT": {"rate": 0.05, "name": "GST", "type": "Federal only"},
}
# CCA rates by asset class (Canada Revenue Agency rates)
CCA_RATES = {
"vehicles": {"rate": 0.30, "class": "Class 10", "description": "Vehicles"},
"computer_equipment": {
"rate": 0.55,
"class": "Class 50",
"description": "Computer Equipment",
},
"furniture": {
"rate": 0.20,
"class": "Class 8",
"description": "Furniture & Fixtures",
},
"buildings": {"rate": 0.04, "class": "Class 1", "description": "Buildings"},
"machinery": {
"rate": 0.20,
"class": "Class 8",
"description": "Machinery & Equipment",
},
}
# Capital asset threshold
CAPITAL_ASSET_THRESHOLD = 500.00
# Meals & Entertainment categories
MEALS_ENTERTAINMENT_KEYWORDS = [
"restaurant",
"cafe",
"coffee",
"dining",
"food",
"meal",
"catering",
"entertainment",
"bar",
"pub",
"bistro",
"eatery",
]
# Capital asset keywords
CAPITAL_ASSET_KEYWORDS = {
"vehicles": ["vehicle", "car", "truck", "van", "automobile", "suv"],
"computer_equipment": [
"computer",
"laptop",
"desktop",
"server",
"tablet",
"monitor",
"printer",
"scanner",
],
"furniture": [
"furniture",
"desk",
"chair",
"table",
"cabinet",
"bookshelf",
"sofa",
],
"buildings": ["building", "property", "real estate", "office space"],
"machinery": ["machinery", "equipment", "tool", "industrial"],
}
def calculate_tax_analysis(
self, receipt: Receipt, transaction: Transaction, user_location: str = "ON"
) -> Dict[str, Any]:
"""
Calculate comprehensive tax analysis for a receipt-transaction match
Returns:
Dict containing:
- sales_tax: Sales tax calculation and validation
- foreign_exchange: FX analysis and discrepancies
- depreciation: Capital asset depreciation details
- meals_entertainment: M&E deduction calculations
- confidence_adjustment: Confidence boost/reduction
"""
analysis = {}
# 1. Sales Tax Rule
analysis["sales_tax"] = self._calculate_sales_tax(
receipt, transaction, user_location
)
# 2. Foreign Exchange Rule
analysis["foreign_exchange"] = self._calculate_foreign_exchange(
receipt, transaction
)
# 3. Depreciation Rule
analysis["depreciation"] = self._calculate_depreciation(receipt, user_location)
# 4. Meals & Entertainment Rule
analysis["meals_entertainment"] = self._calculate_meals_entertainment(receipt)
# Calculate confidence adjustments
analysis["confidence_adjustment"] = self._calculate_confidence_adjustment(
analysis
)
# Calculate final tax amount
analysis["final_tax_amount"] = analysis["sales_tax"]["calculated_tax"]
return analysis
def _calculate_sales_tax(
self, receipt: Receipt, transaction: Transaction, user_location: str
) -> Dict[str, Any]:
"""
Rule 1: Sales Tax Calculation
- Priority: shipping address > billing address > user location
- Different country: no Canadian tax
- Missing location: default to user location
"""
# Determine the applicable location for tax
receipt_location, location_source = self._determine_receipt_location(
receipt, user_location
)
# Check if international transaction
is_international = self._is_international_transaction(
receipt_location, user_location
)
if is_international:
return {
"applicable_province": None,
"applicable_rate": 0.0,
"tax_name": "N/A",
"calculated_tax": 0.0,
"stated_tax": receipt.tax,
"discrepancy": abs(receipt.tax - 0.0),
"reason": f"International transaction - no Canadian tax applied. Receipt location: {receipt_location}",
"requires_review": True,
"location_source": location_source,
"is_international": True,
}
# Get tax rate for the applicable province
tax_info = self.PROVINCIAL_TAX_RATES.get(
receipt_location, self.PROVINCIAL_TAX_RATES.get(user_location)
)
# Calculate expected tax based on receipt amount
# Tax should be calculated on pre-tax amount
pre_tax_amount = receipt.amount - receipt.tax
calculated_tax = round(pre_tax_amount * tax_info["rate"], 2)
# Calculate discrepancy
discrepancy = abs(receipt.tax - calculated_tax)
discrepancy_percentage = (
(discrepancy / receipt.tax * 100) if receipt.tax > 0 else 0
)
# Determine if review is needed (>5% discrepancy)
requires_review = discrepancy_percentage > 5.0
return {
"applicable_province": receipt_location,
"applicable_rate": tax_info["rate"],
"tax_name": tax_info["name"],
"calculated_tax": calculated_tax,
"stated_tax": receipt.tax,
"discrepancy": discrepancy,
"discrepancy_percentage": round(discrepancy_percentage, 2),
"reason": f"Tax calculated for {receipt_location} ({tax_info['name']}) - {location_source}",
"requires_review": requires_review,
"location_source": location_source,
"is_international": False,
}
def _calculate_foreign_exchange(
self, receipt: Receipt, transaction: Transaction
) -> Dict[str, Any]:
"""
Rule 2: Foreign Exchange Handling
- Flag currency mismatches
- Don't auto-fetch rates
- Manual review required
"""
currency_mismatch = receipt.currency != transaction.currency
if not currency_mismatch:
return {
"currency_mismatch": False,
"receipt_currency": receipt.currency,
"transaction_currency": transaction.currency,
"requires_manual_review": False,
"reason": "Currencies match - no FX adjustment needed",
}
# Calculate discrepancy
discrepancy = abs(receipt.amount - transaction.amount)
# Check if transaction has FX rate
has_fx_rate = transaction.fx_rate is not None and transaction.fx_rate > 0
if has_fx_rate:
expected_amount = round(receipt.amount * transaction.fx_rate, 2)
calculated_discrepancy = abs(transaction.amount - expected_amount)
else:
expected_amount = None
calculated_discrepancy = None
return {
"currency_mismatch": True,
"receipt_currency": receipt.currency,
"transaction_currency": transaction.currency,
"receipt_amount": receipt.amount,
"transaction_amount": transaction.amount,
"discrepancy": discrepancy,
"fx_rate": transaction.fx_rate,
"expected_amount": expected_amount,
"calculated_discrepancy": calculated_discrepancy,
"requires_manual_review": True,
"reason": f"Currency mismatch detected: {receipt.currency}{transaction.currency}. Manual review required.",
}
def _calculate_depreciation(
self, receipt: Receipt, user_location: str
) -> Dict[str, Any]:
"""
Rule 3: Depreciation Calculation
- Always based on USER location (not receipt location)
- Threshold: $500+
- Two methods: Straight-Line (accounting) and CCA (tax)
"""
# Check if this is a capital asset
is_capital_asset = receipt.amount >= self.CAPITAL_ASSET_THRESHOLD
asset_class = None
cca_info = None
if is_capital_asset:
# Identify asset class from category and description
asset_class = self._identify_asset_class(receipt)
if asset_class:
cca_info = self.CCA_RATES.get(asset_class)
if not is_capital_asset or not asset_class:
return {
"is_capital_asset": False,
"reason": f"Not a capital asset (Amount: ${receipt.amount:.2f}, Threshold: ${self.CAPITAL_ASSET_THRESHOLD:.2f})",
}
# Calculate straight-line depreciation (accounting)
# Default: 5-year useful life, 10% residual value
useful_life_years = 5
residual_percentage = 0.10
residual_value = receipt.amount * residual_percentage
annual_straight_line = (receipt.amount - residual_value) / useful_life_years
# Calculate CCA depreciation (tax - declining balance)
cca_rate = cca_info["rate"]
year1_cca = receipt.amount * cca_rate
year2_cca = (receipt.amount - year1_cca) * cca_rate
return {
"is_capital_asset": True,
"asset_class": asset_class,
"cca_class": cca_info["class"],
"cca_description": cca_info["description"],
"asset_cost": receipt.amount,
"user_location": user_location,
"straight_line_depreciation": {
"method": "Straight-Line (Accounting)",
"useful_life_years": useful_life_years,
"residual_value": round(residual_value, 2),
"annual_depreciation": round(annual_straight_line, 2),
},
"cca_depreciation": {
"method": "CCA Declining Balance (Tax)",
"cca_rate": cca_rate,
"year_1_depreciation": round(year1_cca, 2),
"year_2_depreciation": round(year2_cca, 2),
},
"reason": f"Capital asset identified: {cca_info['description']} - Depreciation calculated based on user location ({user_location})",
}
def _calculate_meals_entertainment(self, receipt: Receipt) -> Dict[str, Any]:
"""
Rule 4: Meals & Entertainment Deductions
- Tax: 50% of meal cost + 100% of sales tax
- Accounting: 100% of meal cost + 100% of sales tax
"""
# Check if this is meals & entertainment
is_meals_entertainment = self._is_meals_entertainment(receipt)
if not is_meals_entertainment:
return {
"is_meals_entertainment": False,
"reason": "Not classified as meals & entertainment",
}
# Calculate pre-tax meal amount
meal_amount = receipt.amount - receipt.tax
sales_tax = receipt.tax
# Tax deduction: 50% of meal + 100% of tax
tax_deduction = (meal_amount * 0.50) + sales_tax
# Accounting deduction: 100% of meal + 100% of tax
accounting_deduction = meal_amount + sales_tax
return {
"is_meals_entertainment": True,
"meal_amount": round(meal_amount, 2),
"sales_tax": round(sales_tax, 2),
"total_receipt": round(receipt.amount, 2),
"tax_deduction_amount": round(tax_deduction, 2),
"tax_deduction_percentage": 50.0,
"accounting_deduction_amount": round(accounting_deduction, 2),
"accounting_deduction_percentage": 100.0,
"reason": "Meals & Entertainment: 50% deductible for tax purposes, 100% for accounting",
"breakdown": {
"meal_cost": round(meal_amount, 2),
"tax_50_percent": round(meal_amount * 0.50, 2),
"full_sales_tax": round(sales_tax, 2),
},
}
def _calculate_confidence_adjustment(
self, analysis: Dict[str, Any]
) -> Dict[str, float]:
"""
Calculate confidence boost/reduction based on tax analysis
"""
boost = 0.0
reduce = 0.0
# Sales tax analysis
sales_tax = analysis.get("sales_tax", {})
if sales_tax.get("requires_review"):
reduce += 0.05
else:
# Small discrepancy is good
discrepancy_pct = sales_tax.get("discrepancy_percentage", 0)
if discrepancy_pct < 2.0:
boost += 0.05
# Foreign exchange
fx = analysis.get("foreign_exchange", {})
if fx.get("currency_mismatch"):
reduce += 0.10 # FX always requires review
# Depreciation - capital assets need review
depreciation = analysis.get("depreciation", {})
if depreciation.get("is_capital_asset"):
reduce += 0.05
return {"boost": round(boost, 2), "reduce": round(reduce, 2)}
def _determine_receipt_location(
self, receipt: Receipt, user_location: str
) -> Tuple[str, str]:
"""
Determine the applicable location for tax calculation
Priority: shipping address > billing address > user location
Returns: (province_code, source_description)
"""
# Check shipping address first
if receipt.shipping_address:
province = self._extract_province_from_address(receipt.shipping_address)
if province:
return province, "shipping address"
# Check billing address
if receipt.billing_address:
province = self._extract_province_from_address(receipt.billing_address)
if province:
return province, "billing address"
# Default to user location
return user_location, "user location (default)"
def _extract_province_from_address(self, address: str) -> Optional[str]:
"""
Extract Canadian province code from address string
"""
if not address:
return None
address_upper = address.upper()
# Check for province codes
for province_code in self.PROVINCIAL_TAX_RATES.keys():
if province_code in address_upper:
return province_code
# Check for full province names
province_names = {
"ONTARIO": "ON",
"QUEBEC": "QC",
"BRITISH COLUMBIA": "BC",
"ALBERTA": "AB",
"SASKATCHEWAN": "SK",
"MANITOBA": "MB",
"NOVA SCOTIA": "NS",
"NEW BRUNSWICK": "NB",
"NEWFOUNDLAND": "NL",
"PRINCE EDWARD ISLAND": "PE",
"NORTHWEST TERRITORIES": "NT",
"NUNAVUT": "NU",
"YUKON": "YT",
}
for full_name, code in province_names.items():
if full_name in address_upper:
return code
return None
def _is_international_transaction(
self, receipt_location: str, user_location: str
) -> bool:
"""
Check if this is an international transaction
(receipt from outside Canada when user is in Canada, or vice versa)
"""
# If receipt location is not a Canadian province, it's international
is_canadian = receipt_location in self.PROVINCIAL_TAX_RATES
# For now, assume user_location is always Canadian
# In future, add support for other countries
return not is_canadian
def _identify_asset_class(self, receipt: Receipt) -> Optional[str]:
"""
Identify the asset class from receipt category and description
"""
search_text = (
f"{receipt.category} {receipt.description} {receipt.vendor}".lower()
)
for asset_class, keywords in self.CAPITAL_ASSET_KEYWORDS.items():
for keyword in keywords:
if keyword in search_text:
return asset_class
return None
def _is_meals_entertainment(self, receipt: Receipt) -> bool:
"""
Check if receipt is for meals & entertainment
"""
# Check explicit flag first
if (
hasattr(receipt, "is_meals_entertainment")
and receipt.is_meals_entertainment
):
return True
# Check category and description
search_text = (
f"{receipt.category} {receipt.description} {receipt.vendor}".lower()
)
for keyword in self.MEALS_ENTERTAINMENT_KEYWORDS:
if keyword in search_text:
return True
return False
def format_analysis_summary(self, analysis: Dict[str, Any]) -> str:
"""
Format the tax analysis into a human-readable summary
"""
lines = ["=== Tax Analysis Summary ===", ""]
# Sales Tax
st = analysis.get("sales_tax", {})
lines.append("1. SALES TAX:")
if st.get("is_international"):
lines.append(f" - {st['reason']}")
lines.append(" - ⚠️ Review Required: International Transaction")
else:
lines.append(f" - Province: {st.get('applicable_province', 'N/A')}")
lines.append(
f" - Tax Rate: {st.get('applicable_rate', 0) * 100:.2f}% ({st.get('tax_name', 'N/A')})"
)
lines.append(f" - Calculated Tax: ${st.get('calculated_tax', 0):.2f}")
lines.append(f" - Stated Tax: ${st.get('stated_tax', 0):.2f}")
lines.append(
f" - Discrepancy: ${st.get('discrepancy', 0):.2f} ({st.get('discrepancy_percentage', 0):.1f}%)"
)
if st.get("requires_review"):
lines.append(" - ⚠️ Review Required: Tax discrepancy > 5%")
lines.append("")
# Foreign Exchange
fx = analysis.get("foreign_exchange", {})
lines.append("2. FOREIGN EXCHANGE:")
if fx.get("currency_mismatch"):
lines.append(
f" - Currency Mismatch: {fx['receipt_currency']}{fx['transaction_currency']}"
)
lines.append(f" - Receipt Amount: ${fx['receipt_amount']:.2f}")
lines.append(f" - Transaction Amount: ${fx['transaction_amount']:.2f}")
lines.append(f" - Discrepancy: ${fx['discrepancy']:.2f}")
lines.append(" - ⚠️ Manual Review Required")
else:
lines.append(" - No currency mismatch")
lines.append("")
# Depreciation
dep = analysis.get("depreciation", {})
lines.append("3. DEPRECIATION:")
if dep.get("is_capital_asset"):
lines.append(f" - Capital Asset: Yes ({dep['cca_description']})")
lines.append(f" - Asset Cost: ${dep['asset_cost']:.2f}")
lines.append(
f" - CCA Class: {dep['cca_class']} ({dep['cca_depreciation']['cca_rate'] * 100:.0f}%)"
)
lines.append(
f" - Year 1 CCA: ${dep['cca_depreciation']['year_1_depreciation']:.2f}"
)
lines.append(
f" - Annual Straight-Line: ${dep['straight_line_depreciation']['annual_depreciation']:.2f}"
)
else:
lines.append(" - Not a capital asset")
lines.append("")
# Meals & Entertainment
me = analysis.get("meals_entertainment", {})
lines.append("4. MEALS & ENTERTAINMENT:")
if me.get("is_meals_entertainment"):
lines.append(" - Type: Meals & Entertainment Expense")
lines.append(f" - Meal Amount: ${me['meal_amount']:.2f}")
lines.append(f" - Sales Tax: ${me['sales_tax']:.2f}")
lines.append(f" - Tax Deduction (50%): ${me['tax_deduction_amount']:.2f}")
lines.append(
f" - Accounting Deduction (100%): ${me['accounting_deduction_amount']:.2f}"
)
else:
lines.append(" - Not a meals & entertainment expense")
lines.append("")
# Confidence Adjustment
conf = analysis.get("confidence_adjustment", {})
lines.append("CONFIDENCE ADJUSTMENT:")
lines.append(f" - Boost: +{conf.get('boost', 0):.2f}")
lines.append(f" - Reduce: -{conf.get('reduce', 0):.2f}")
return "\n".join(lines)
+310
View File
@@ -0,0 +1,310 @@
from typing import Any, Dict, List
from schemas import Match, Receipt, Transaction
from services.ai_matcher import AIMatcher
from services.ai_rules import AIRulesEngine
from services.feedback_logger import FeedbackLogger
from services.llm_tax_analyzer import LLMTaxAnalyzer
from services.manual_tax_calculator import ManualTaxCalculator
class MatchingEngine:
def __init__(self, use_manual_tax_calculator: bool = False):
self.ai_matcher = AIMatcher()
self.rules_engine = AIRulesEngine()
self.feedback_logger = FeedbackLogger()
self.llm_tax_analyzer = LLMTaxAnalyzer()
self.manual_tax_calculator = ManualTaxCalculator()
self.use_manual_tax_calculator = use_manual_tax_calculator
def process_matching(
self,
receipts: List[Receipt],
transactions: List[Transaction],
user_location: str = "ON",
) -> List[Match]:
# Get AI matches
ai_matches = self.ai_matcher.match_receipts_to_transactions(
receipts, transactions
)
# Apply traditional rules first (lightweight, no API calls)
for match in ai_matches:
rule_results = self.rules_engine.apply_rules(
match.receipt, match.transaction
)
# Apply confidence boost from traditional rules
if rule_results["confidence_boost"] > 0:
match.confidence_score = min(
1.0, match.confidence_score + rule_results["confidence_boost"]
)
# Auto-approve if rules say so
if rule_results["auto_approve"]:
match.confidence_score = 1.0
match.match_reason += " (Auto-approved by rules)"
# Apply tax analysis - use manual calculator or LLM based on configuration
if self.use_manual_tax_calculator:
# Use deterministic rule-based calculator
enhanced_matches = self._apply_manual_tax_analysis(
ai_matches, user_location
)
else:
# Use LLM-based tax analysis in a SINGLE batch call
try:
enhanced_matches = (
self.llm_tax_analyzer.analyze_and_apply_tax_rules_batch(
ai_matches, user_location
)
)
except Exception as e:
# If batch LLM analysis fails, log it and continue with matches as-is
import logging
logging.error(f"Batch LLM tax analysis failed: {str(e)}")
for match in ai_matches:
match.match_reason += " (Note: Advanced tax analysis unavailable)"
enhanced_matches = ai_matches
return enhanced_matches
def _enhance_match_with_rules(
self, match: Match, user_location: str = "ON"
) -> Match:
"""
Enhanced version using LLM to intelligently apply tax rules:
1. Sales tax based on receipt location (shipping/billing address priority)
2. Foreign exchange rules for currency mismatches
3. Depreciation rules for capital assets (based on user location)
4. Meals & Entertainment tax deduction rules (50% for tax, 100% for accounting)
"""
# First, apply traditional rule-based checks for basic matching quality
rule_results = self.rules_engine.apply_rules(match.receipt, match.transaction)
# Apply confidence boost from traditional rules
if rule_results["confidence_boost"] > 0:
match.confidence_score = min(
1.0, match.confidence_score + rule_results["confidence_boost"]
)
# Auto-approve if rules say so
if rule_results["auto_approve"]:
match.confidence_score = 1.0
match.match_reason += " (Auto-approved by rules)"
# Now apply LLM-based tax analysis
try:
llm_tax_analysis = self.llm_tax_analyzer.analyze_and_apply_tax_rules(
match.receipt, match.transaction, user_location
)
# Store the complete tax analysis
match.tax_analysis = llm_tax_analysis
# Apply confidence adjustments based on tax analysis
confidence_adj = llm_tax_analysis.get("confidence_adjustment", {})
# Boost confidence if tax rules validate the match
boost = confidence_adj.get("boost", 0.0)
if boost > 0:
match.confidence_score = min(1.0, match.confidence_score + boost)
match.match_reason += f" (Tax analysis confidence boost: +{boost:.2f})"
# Reduce confidence if tax issues detected
reduce = confidence_adj.get("reduce", 0.0)
if reduce > 0:
match.confidence_score = max(0.0, match.confidence_score - reduce)
match.match_reason += f" (Tax issues detected: -{reduce:.2f})"
# Add flags for manual review if needed
review_flags = []
# Check sales tax issues
sales_tax = llm_tax_analysis.get("sales_tax", {})
if sales_tax.get("requires_review", False):
review_flags.append("Sales Tax Review Required")
# Check FX issues
fx_analysis = llm_tax_analysis.get("foreign_exchange", {})
if fx_analysis.get("requires_manual_review", False):
review_flags.append(
f"FX Review Required (Discrepancy: ${fx_analysis.get('discrepancy', 0):.2f})"
)
# Check depreciation
depreciation = llm_tax_analysis.get("depreciation", {})
if depreciation.get("is_capital_asset", False):
review_flags.append(
f"Capital Asset - Depreciation Applicable ({depreciation.get('asset_class', 'Unknown')})"
)
# Check meals & entertainment
meals_ent = llm_tax_analysis.get("meals_entertainment", {})
if meals_ent.get("is_meals_entertainment", False):
tax_deduction = meals_ent.get("tax_deduction_amount", 0)
accounting_deduction = meals_ent.get("accounting_deduction_amount", 0)
review_flags.append(
f"M&E Expense - Tax Deduction: ${tax_deduction:.2f} (50%), Accounting: ${accounting_deduction:.2f} (100%)"
)
# Add review flags to match reason
if review_flags:
match.match_reason += " | REVIEW: " + "; ".join(review_flags)
except Exception as e:
# If LLM analysis fails, log it and continue with traditional rules
import logging
logging.error(f"LLM tax analysis failed: {str(e)}")
match.match_reason += " (Note: Advanced tax analysis unavailable)"
# Fall back to traditional tax rules if available
if rule_results.get("tax_analysis"):
match.tax_analysis = rule_results["tax_analysis"]
return match
def _apply_manual_tax_analysis(
self, matches: List[Match], user_location: str = "ON"
) -> List[Match]:
"""
Apply deterministic rule-based tax analysis to all matches
No LLM calls - pure business logic for consistent results
"""
import logging
logger = logging.getLogger(__name__)
logger.info(
f"Applying manual tax analysis to {len(matches)} matches using rule-based calculator"
)
enhanced_matches = []
for match in matches:
try:
# Get comprehensive tax analysis from manual calculator
tax_analysis = self.manual_tax_calculator.calculate_tax_analysis(
match.receipt, match.transaction, user_location
)
# Store the complete tax analysis
match.tax_analysis = tax_analysis
# Apply confidence adjustments
confidence_adj = tax_analysis.get("confidence_adjustment", {})
# Boost confidence if tax rules validate the match
boost = confidence_adj.get("boost", 0.0)
if boost > 0:
match.confidence_score = min(1.0, match.confidence_score + boost)
match.match_reason += f" (Tax validated: +{boost:.2f})"
# Reduce confidence if tax issues detected
reduce = confidence_adj.get("reduce", 0.0)
if reduce > 0:
match.confidence_score = max(0.0, match.confidence_score - reduce)
match.match_reason += f" (Tax issues: -{reduce:.2f})"
# Add flags for manual review
review_flags = []
# Sales tax issues
sales_tax = tax_analysis.get("sales_tax", {})
if sales_tax.get("requires_review"):
if sales_tax.get("is_international"):
review_flags.append("International Transaction - FX Review")
else:
discrepancy_pct = sales_tax.get("discrepancy_percentage", 0)
review_flags.append(
f"Sales Tax Discrepancy: {discrepancy_pct:.1f}%"
)
# FX issues
fx = tax_analysis.get("foreign_exchange", {})
if fx.get("currency_mismatch"):
review_flags.append(
f"FX: {fx['receipt_currency']}{fx['transaction_currency']} (${fx['discrepancy']:.2f})"
)
# Capital asset depreciation
depreciation = tax_analysis.get("depreciation", {})
if depreciation.get("is_capital_asset"):
cca_class = depreciation.get("cca_class", "Unknown")
year1_cca = depreciation.get("cca_depreciation", {}).get(
"year_1_depreciation", 0
)
review_flags.append(
f"Capital Asset ({cca_class}) - Year 1 CCA: ${year1_cca:.2f}"
)
# Meals & entertainment
meals_ent = tax_analysis.get("meals_entertainment", {})
if meals_ent.get("is_meals_entertainment"):
tax_deduction = meals_ent.get("tax_deduction_amount", 0)
accounting_deduction = meals_ent.get(
"accounting_deduction_amount", 0
)
review_flags.append(
f"M&E: Tax ${tax_deduction:.2f} (50%), Accounting ${accounting_deduction:.2f} (100%)"
)
# Add review flags to match reason
if review_flags:
match.match_reason += " | " + "; ".join(review_flags)
enhanced_matches.append(match)
except Exception as e:
logger.error(
f"Manual tax analysis failed for match: {str(e)}", exc_info=True
)
match.match_reason += " (Tax analysis failed)"
enhanced_matches.append(match)
logger.info(
f"Manual tax analysis completed for {len(enhanced_matches)} matches"
)
return enhanced_matches
def approve_match(self, match: Match, user_id: str):
# Log the approval
self.feedback_logger.log_override(
transaction_id=match.transaction.id,
original_match=f"AI Score: {match.confidence_score}",
correction="Approved",
reason="User approved match",
user_id=user_id,
)
def reject_match(self, match: Match, reason: str, user_id: str):
# Log the rejection
self.feedback_logger.log_override(
transaction_id=match.transaction.id,
original_match=f"AI Score: {match.confidence_score}",
correction="Rejected",
reason=reason,
user_id=user_id,
)
def get_matching_stats(self, matches: List[Match]) -> Dict[str, Any]:
if not matches:
return {
"total": 0,
"high_confidence": 0,
"low_confidence": 0,
"avg_score": 0,
}
high_confidence = len([m for m in matches if m.confidence_score >= 0.8])
low_confidence = len([m for m in matches if m.confidence_score < 0.8])
avg_score = sum(m.confidence_score for m in matches) / len(matches)
return {
"total": len(matches),
"high_confidence": high_confidence,
"low_confidence": low_confidence,
"avg_score": round(avg_score, 3),
}
+106
View File
@@ -0,0 +1,106 @@
rule = '''
### Rule Scenarios
Impact of Signup Fields on Tax Calculation and Receipt Matching
Impact of Signup Fields (Country and Province/State) on Tax Calculation and Matching**
**Scenario 1:** User Location (Canada, Ontario) but Receipt from Another Location (e.g., Quebec)
User's Location: Canada, Ontario (for tax and depreciation purposes).
Receipt Location: The receipt comes from Quebec (the tax rules in Quebec are different from Ontario).
What Happens:
The sales tax rate should be applied based on the location of the receipt, not the user's profile location.
**For example:**
The user in Ontario will have 13% HST applied to their purchases.
If the receipt is from Quebec, the QST (Quebec Sales Tax) of 9.975% applies instead.
**Scenario 2:** User Location (Canada, Ontario) and Receipt Location is Different Country (e.g., USA)
User's Location: Canada, Ontario.
Receipt Location: The receipt is from a business in the USA (e.g., New York).
**What Happens:**
Sales Tax should not be applied for international transactions (USA in this case) unless the user is importing or there is a customs duty involved.
The system will not apply a Canadian sales tax to the receipt from the USA, but the foreign exchange (FX) rule will apply because there is a mismatch between currencies (USD vs. CAD).
**Scenario 3:** User Location (USA, New York) but Receipt from Another Location in the Same Country (e.g., California)
User's Location: USA, New York (for tax purposes).
Receipt Location: The receipt is from California (still in the USA, but the sales tax rate is different).
**What Happens:**
Sales tax should be applied based on the location of the receipt, not the users location, since the receipt was issued in California.
California may have a different sales tax rate than New York.
**Scenario 4:** User Location (Canada, Ontario) and Receipt Location with No Address Information
User's Location: Canada, Ontario.
Receipt Location: The receipt contains no clear shipping or billing address.
**What Happens:**
If the receipt does not have a clear location, the system will default to the users location for sales tax and depreciation.
Action:
Sales Tax: Apply the sales tax rate based on the user's location (Ontario). For example, 13% HST will be applied.
Depreciation: Apply the depreciation rules based on the users location (Ontario), even if the receipt doesnt have address information.
**Summary of Actions in These Scenarios:**
Sales Tax:If the receipt is from a different location (same country or foreign), use the location from the receipt for sales tax calculation.
If the receipt is from a different country, dont apply sales tax from the user's country but flag the FX discrepancy.
If the location is missing, apply the users location sales tax by default.
**Depreciation:** Always apply depreciation rules based on the users location, regardless of where the receipt is from.
**FX (Foreign Exchange):** If the receipt is in a different currency, flag the FX difference for manual review but dont fetch exchange rates.
### Tax Rules:
Four Rules for Tax and Depreciation Handling
### 1. **Sales Tax Rule**
**Purpose**: To calculate and apply the correct sales tax based on the shipping and billing addresses.
- **When Billing and Shipping Address are the Same**: Apply the sales tax rate based on the billing address.
- **When Billing and Shipping Address are Different**: Apply the sales tax rate based on the shipping address.
**Example**:
1. If the billing and shipping address are in Ontario, the system will apply the 13% HST tax rate based on Ontario's tax rate.
2. If the billing address is in Ontario but the shipping address is in Quebec, the system will apply the 14.975% QST tax rate based on the shipping address.
### 2. **Foreign Exchange (FX) Rule**
**Purpose**: To handle discrepancies when transactions and receipts are in different currencies (e.g., USD vs. CAD).
- **Action**: Identify the currency mismatch, but do not automatically fetch the exchange rate. Flag the FX difference for manual review, allowing the user to approve or adjust the balance.
**Example**:
1. A transaction in USD for $100, matched to a receipt in CAD for $125, results in an FX discrepancy of $25.
2. The system flags the discrepancy for manual review by the user. The user can then approve the difference or adjust the amounts manually.
### 3. **Depreciation Rule**
**Purpose**: To calculate the depreciation for assets based on the Straight-Line Method (for accounting) or CCA Depreciation (Declining Balance) for tax purposes.
**Action**:
- Apply Straight-Line Depreciation (for accounting) across the assets useful life.
- Apply CCA Depreciation (for tax purposes) using a declining balance method.
**Example**:
1. Straight-Line Depreciation: An asset purchased for $10,000, with a 5-year useful life and a residual value of $1,000, will have an annual depreciation of:
- (10,000 - 1,000)/5 = 1,800 per year for 5 years.
2. CCA Depreciation: A truck purchased for $20,000, eligible for 30% CCA per year. The depreciation will be:
- Year 1: 20,000 x 30% = $6,000
- Year 2: (20,000 - 6,000) x 30% = $4,200
- The depreciation will decline each year as the book value reduces.
### 4. **Meals & Entertainment Tax Deduction Rule**
**Purpose**: To apply the correct tax deduction for Meals & Entertainment expenses.
**Action**:
- For Tax Purposes: Only 50% of the total receipt amount is deductible.
- For Accounting Purposes: 100% of the total receipt amount is deductible.
- Sales Tax: The full sales tax will be deducted for accounting purposes.
**Example**:
1. A $100 meal receipt for a business dinner:
- **Tax Purposes**: Only $50 of the total amount is deductible.
- **Accounting Purposes**: The full $100 is deductible.
2. If the sales tax on the meal is $12, the entire $12 is included in the accounting deduction, but for tax purposes, the $50 deduction will reflect the adjusted amount after the 50% rule is applied.
### **When Location on Receipt is Different from User's Location**
**1. Sales Tax**:
- **Scenario 1**: If the **receipt's location** is different (e.g., receipt from Quebec for a user in Ontario), the **sales tax** is applied based on the **receipt's location** (Quebec sales tax).
- **Scenario 2**: If the **receipt** is from a different **country** (e.g., USA), the **system flags** the **currency mismatch** but does not apply **Canadian sales tax**.
**2. Depreciation**:
- Depreciation is always calculated based on the **user's location**, not the receipt's location.
- **Depreciation Method** for **Canada (Ontario)**: **CCA method** will apply, regardless of where the receipt comes from.
**3. FX Handling**:
- If the receipt is in a different **currency** (e.g., USD for a CAD-based user), the system will **flag FX differences** for manual review but wont fetch exchange rates.
**4. General Process**:
- When the **receipt location** is different from the **user's location**, ensure that the **tax and depreciation** are correctly applied based on the **receipt's data**.
- For **foreign transactions**, ensure that **FX differences** are flagged for user review.
- For **missing location information**, apply **users location** by default for tax and depreciation.
'''
@@ -1,7 +1,7 @@
import logging
from typing import Any, Dict, Optional
from models import Address, Asset, Receipt, Transaction
from schemas import Address, Asset, Receipt, Transaction
logger = logging.getLogger(__name__)
-15
View File
@@ -1,15 +0,0 @@
import os
from dotenv import load_dotenv
load_dotenv()
# Get API key from environment variable with fallback
GROQ_API_KEY = os.getenv("GROQ_API_KEY", "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")
# Validate API key
if not GROQ_API_KEY or GROQ_API_KEY == "your_api_key_here":
raise ValueError("GROQ_API_KEY environment variable is not set or invalid. Please set it in your .env file.")
CONFIDENCE_THRESHOLD = 0.3
DATE_TOLERANCE_DAYS = 7
AMOUNT_TOLERANCE_PERCENT = 0.05
-157
View File
@@ -1,157 +0,0 @@
import os
from datetime import datetime, timedelta
from typing import Any, Dict, List
class GoogleDriveSync:
def __init__(self):
self.service = None
self.processed_files = set()
def authenticate(self):
"""Authenticate with Google Drive API"""
try:
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
SCOPES = ["https://www.googleapis.com/auth/drive.readonly"]
# Load existing credentials
if os.path.exists("token.json"):
self.creds = Credentials.from_authorized_user_file("token.json", SCOPES)
# If no valid credentials available, let user log in
if not self.creds or not self.creds.valid:
if self.creds and self.creds.expired and self.creds.refresh_token:
self.creds.refresh(Request())
else:
if not os.path.exists("credentials.json"):
raise Exception(
"credentials.json not found. Please download from Google Cloud Console."
)
flow = InstalledAppFlow.from_client_secrets_file(
"credentials.json", SCOPES
)
self.creds = flow.run_local_server(port=0)
# Save credentials for next run
with open("token.json", "w") as token:
token.write(self.creds.to_json())
# Build the Drive service
self.service = build("drive", "v3", credentials=self.creds)
return True
except Exception as e:
print(f"Authentication error: {e}")
return False
def list_folders(self) -> List[Dict[str, Any]]:
"""List all folders in Google Drive"""
if not self.service:
if not self.authenticate():
return []
try:
results = (
self.service.files()
.list(
q="mimeType='application/vnd.google-apps.folder'",
pageSize=100,
fields="nextPageToken, files(id, name, createdTime, modifiedTime)",
)
.execute()
)
return results.get("files", [])
except Exception as e:
print(f"Error listing folders: {e}")
return []
def get_folder_info(self, folder_id: str) -> Dict[str, Any]:
"""Get information about a Google Drive folder"""
if not self.service:
if not self.authenticate():
return {}
try:
folder = (
self.service.files()
.get(fileId=folder_id, fields="id, name, createdTime, modifiedTime")
.execute()
)
return folder
except Exception as e:
print(f"Error getting folder info: {e}")
return {}
async def process_drive_files(self, folder_id: str = None) -> List[Dict[str, Any]]:
"""Process all receipt files from Google Drive"""
if not self.service:
if not self.authenticate():
return []
results = []
try:
# File types to look for
file_types = [
"'application/pdf'",
"'image/jpeg'",
"'image/png'",
"'image/gif'",
"'image/bmp'",
]
mime_types = " or ".join(file_types)
# Build query
query = f"mimeType contains {mime_types}"
if folder_id:
query += f" and '{folder_id}' in parents"
# Add date filter (last 30 days)
thirty_days_ago = (datetime.now() - timedelta(days=30)).isoformat() + "Z"
query += f" and modifiedTime > '{thirty_days_ago}'"
results_files = (
self.service.files()
.list(
q=query,
pageSize=100,
fields="nextPageToken, files(id, name, mimeType, modifiedTime, size)",
)
.execute()
)
files = results_files.get("files", [])
files = [file for file in files if file["id"] not in self.processed_files]
# For demo purposes, return mock results
for file in files[:3]: # Process first 3 files
mock_result = {
"file_id": file["id"],
"filename": file["name"],
"drive_modified": file["modifiedTime"],
"file_size": file.get("size", 0),
"extraction_success": True,
"vendor": "Demo Vendor",
"description": "Coffee and sandwich",
"total_amount": 25.50,
"tax_amount": 2.04,
"date": "2024-01-15",
"category": "Food",
"confidence": 0.95,
}
results.append(mock_result)
self.processed_files.add(file["id"])
except Exception as e:
print(f"Error processing Drive files: {e}")
return results
-89
View File
@@ -1,89 +0,0 @@
from typing import Any, Dict, List
from ai_matcher import AIMatcher
from ai_rules import AIRulesEngine
from feedback_logger import FeedbackLogger
from models import Match, Receipt, Transaction
class MatchingEngine:
def __init__(self):
self.ai_matcher = AIMatcher()
self.rules_engine = AIRulesEngine()
self.feedback_logger = FeedbackLogger()
def process_matching(
self, receipts: List[Receipt], transactions: List[Transaction]
) -> List[Match]:
# Get AI matches
ai_matches = self.ai_matcher.match_receipts_to_transactions(
receipts, transactions
)
# Apply rules and enhance matches
enhanced_matches = []
for match in ai_matches:
enhanced_match = self._enhance_match_with_rules(match)
enhanced_matches.append(enhanced_match)
return enhanced_matches
def _enhance_match_with_rules(self, match: Match) -> Match:
rule_results = self.rules_engine.apply_rules(match.receipt, match.transaction)
# Apply confidence boost from rules
if rule_results["confidence_boost"] > 0:
match.confidence_score = min(
1.0, match.confidence_score + rule_results["confidence_boost"]
)
# Auto-approve if rules say so
if rule_results["auto_approve"]:
match.confidence_score = 1.0
match.match_reason += " (Auto-approved by rules)"
# Add tax analysis to match
if rule_results.get("tax_analysis"):
match.tax_analysis = rule_results["tax_analysis"]
return match
def approve_match(self, match: Match, user_id: str):
# Log the approval
self.feedback_logger.log_override(
transaction_id=match.transaction.id,
original_match=f"AI Score: {match.confidence_score}",
correction="Approved",
reason="User approved match",
user_id=user_id,
)
def reject_match(self, match: Match, reason: str, user_id: str):
# Log the rejection
self.feedback_logger.log_override(
transaction_id=match.transaction.id,
original_match=f"AI Score: {match.confidence_score}",
correction="Rejected",
reason=reason,
user_id=user_id,
)
def get_matching_stats(self, matches: List[Match]) -> Dict[str, Any]:
if not matches:
return {
"total": 0,
"high_confidence": 0,
"low_confidence": 0,
"avg_score": 0,
}
high_confidence = len([m for m in matches if m.confidence_score >= 0.8])
low_confidence = len([m for m in matches if m.confidence_score < 0.8])
avg_score = sum(m.confidence_score for m in matches) / len(matches)
return {
"total": len(matches),
"high_confidence": high_confidence,
"low_confidence": low_confidence,
"avg_score": round(avg_score, 3),
}
-59
View File
@@ -1,59 +0,0 @@
from dataclasses import dataclass
from datetime import datetime
from typing import Optional
@dataclass
class Address:
"""Address information for tax calculations"""
province: str
city: str
postal_code: str
country: str = "Canada"
@dataclass
class Receipt:
id: str
file_name: str
upload_date: datetime
receipt_date: datetime
amount: float
tax: float
vendor: str
category: str
description: str
# Tax rule fields
billing_address: Optional[Address] = None
shipping_address: Optional[Address] = None
currency: str = "CAD"
is_meals_entertainment: bool = False
@dataclass
class Transaction:
id: str
transaction_date: datetime
amount: float
vendor: str
notes: str
# Tax rule fields
currency: str = "CAD"
fx_rate: Optional[float] = None
@dataclass
class Asset:
"""Asset for depreciation calculations"""
id: str
name: str
purchase_date: datetime
purchase_amount: float
useful_life_years: int
residual_value: float
cca_rate: float # Capital Cost Allowance rate
asset_class: str
@dataclass
class Match:
receipt: Receipt
transaction: Transaction
confidence_score: float
match_reason: str
tax_analysis: Optional[dict] = None
+892
View File
@@ -0,0 +1,892 @@
INFO: Started server process [18995]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8765 (Press CTRL+C to quit)
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [18995]
INFO: Started server process [19157]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8654 (Press CTRL+C to quit)
INFO: 102.89.45.216:11636 - "POST /transactions/import/csv HTTP/1.1" 200 OK
INFO: 102.89.45.216:14600 - "POST /transactions/import/csv HTTP/1.1" 200 OK
INFO:__main__:Starting match-specific for file IDs: ['0b3d64a4-c558-43cb-bf57-a6561205f1e6', 'e96d57f5-2070-43d6-8044-1d68106a3c27', 'bae25e20-2425-4db3-a3fc-adcb09c7d431', 'bfb36530-62f6-489a-b0b9-970ab8e7c20c', '0b4db1d9-670b-4dd7-bd3a-dfa39897acbb', '8fbf46d7-5f7b-4b01-a5d1-173adcb55748', 'e779f8ce-9f9a-4575-af8c-4558c6405977', 'ee595b47-e9b8-4c82-82e6-7490d716baa7'], categorization_id: cat_mgchkov1_x8jntm
INFO:__main__:Found 7 transactions in database
INFO:__main__:Converted 7 transactions
INFO:__main__:Successfully loaded receipt for file_id: 0b3d64a4-c558-43cb-bf57-a6561205f1e6
INFO:__main__:Successfully loaded receipt for file_id: e96d57f5-2070-43d6-8044-1d68106a3c27
INFO:__main__:Successfully loaded receipt for file_id: bae25e20-2425-4db3-a3fc-adcb09c7d431
INFO:__main__:Successfully loaded receipt for file_id: bfb36530-62f6-489a-b0b9-970ab8e7c20c
INFO:__main__:Successfully loaded receipt for file_id: 0b4db1d9-670b-4dd7-bd3a-dfa39897acbb
INFO:__main__:Successfully loaded receipt for file_id: 8fbf46d7-5f7b-4b01-a5d1-173adcb55748
INFO:__main__:Successfully loaded receipt for file_id: e779f8ce-9f9a-4575-af8c-4558c6405977
INFO:__main__:Successfully loaded receipt for file_id: ee595b47-e9b8-4c82-82e6-7490d716baa7
INFO:__main__:Found 8 receipts, 0 missing
INFO:__main__:Starting matching with 8 receipts and 7 transactions
INFO:services.ai_matcher:Starting AI matching for 8 receipts against 7 transactions
INFO:services.ai_matcher:Processing receipt 1/8: PAYPAL *BZA BAWSKYJ - $37.55
INFO:services.ai_matcher:Found 1 candidates for receipt: PAYPAL *BZA BAWSKYJ
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: BOOKS BY BESSIE (score: 0.000)
INFO:services.ai_matcher:Found match: 0.000 - No meaningful similarity
The reason for this low score is that none of the candidate transactions have a perfect match with the receipt. The closest candidate is Candidate 1, but it has significant differences in vendor name, amount, and date, resulting in a very low confidence score.
INFO:services.ai_matcher:Processing receipt 2/8: Figma, Inc. - $27.0
INFO:services.ai_matcher:Found 1 candidates for receipt: Figma, Inc.
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: BOOKS BY BESSIE (score: 0.000)
INFO:services.ai_matcher:Found match: 0.000 - No meaningful similarity
However, since I'm not allowed to return "NONE" and must return the best match, I'll provide the next best option:
1|0.0|No meaningful similarity
Since there are no perfect matches, I'll consider the next best option.
Candidate 1 has a vendor name difference, amount difference, and date difference. However, it's the closest option available.
1|0.0|No meaningful similarity
However, I can provide a more detailed explanation of why it's the best option available.
The vendor name difference is significant, with "Figma, Inc." and "BOOKS BY BESSIE" being unrelated. The amount difference is also significant, with $27.0 and $55.0 being 103.7% apart. The date difference is 136 days, which is a significant difference.
However, since I
INFO:services.ai_matcher:Processing receipt 3/8: Eleven Labs Inc. - $111.87
INFO:services.ai_matcher:Found 3 candidates for receipt: Eleven Labs Inc.
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
WARNING:services.ai_matcher:Could not parse single match response: To determine the best match, I will evaluate each candidate based on the scoring criteria.
Candidate 1:
- Vendor similarity: 0.0 (A1 RENTAL BACKHOE DEPOSIT REFUND vs Eleven Labs Inc.)
- Amount difference: 88.13 (78.8%)
- Date difference: 115 days
- Description/notes relevance: 0.0 (no relevance)
- Total score: 0.0
Candidate 2:
- Vendor similarity: 0.0 (BOOKS BY BESSIE vs Eleven Labs Inc.)
- Amount difference: 56.87 (50.8%)
- Date difference: 145 days
- Description/notes relevance: 0.0 (no relevance)
- Total score: 0.0
Candidate 3:
- Vendor similarity: 0.0 (No Vendor vs Eleven Labs Inc.)
- Amount difference: 106.88 (95.5%)
- Date difference: 87 days
WARNING:services.ai_matcher:Failed to parse AI response for receipt: Eleven Labs Inc.
WARNING:services.ai_matcher:No match found for receipt: Eleven Labs Inc. - $111.87
INFO:services.ai_matcher:Processing receipt 4/8: Twitter, Inc. - $4.0
WARNING:services.ai_matcher:No candidates found for receipt: Twitter, Inc. - $4.0
WARNING:services.ai_matcher:No match found for receipt: Twitter, Inc. - $4.0
INFO:services.ai_matcher:Processing receipt 5/8: PAYPAL *BZABAWSKYJ - $37.55
INFO:services.ai_matcher:Found 1 candidates for receipt: PAYPAL *BZABAWSKYJ
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: BOOKS BY BESSIE (score: 0.000)
INFO:services.ai_matcher:Found match: 0.000 - No meaningful similarity
This is because none of the candidate transactions have a perfect match with the receipt. However, I must return the candidate with the highest match score, even if it's very low.
To calculate the match score, I considered the following:
- Vendor name similarity: None of the candidate transactions have a vendor name that matches the receipt.
- Amount accuracy: The amount on the receipt ($37.55) does not match any of the candidate transactions.
- Date proximity: The date on the receipt (2023-05-22) is significantly different from the dates on the candidate transactions.
- Description/notes relevance: None of the candidate transactions have a description or notes that match the receipt.
Since none of the candidate transactions have a meaningful similarity with the receipt, the best match is the one with the lowest possible score, which is 0.0.
INFO:services.ai_matcher:Processing receipt 6/8: Figma, Inc. - $27.0
INFO:services.ai_matcher:Found 1 candidates for receipt: Figma, Inc.
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: BOOKS BY BESSIE (score: 0.000)
INFO:services.ai_matcher:Found match: 0.000 - No meaningful similarity
However, since I must return the candidate with the highest match score, even if it's very low, I will provide the next best option:
5|0.2|Minimal similarity due to vendor name difference, amount difference of $28.0, and 136 days apart
INFO:services.ai_matcher:Processing receipt 7/8: Eleven Labs Inc. - $111.87
INFO:services.ai_matcher:Found 3 candidates for receipt: Eleven Labs Inc.
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
WARNING:services.ai_matcher:Could not parse single match response: To determine the best match, I will analyze each candidate transaction against the given receipt.
Candidate 1:
- Vendor similarity: 0.0 (A1 RENTAL BACKHOE DEPOSIT REFUND vs Eleven Labs Inc.)
- Amount difference: 88.13 (78.8%)
- Date difference: 115 days
- Description/notes relevance: 0.0 (no relevance)
- Overall score: 0.0 (no meaningful similarity)
Candidate 2:
- Vendor similarity: 0.0 (BOOKS BY BESSIE vs Eleven Labs Inc.)
- Amount difference: 56.87 (50.8%)
- Date difference: 145 days
- Description/notes relevance: 0.0 (no relevance)
- Overall score: 0.0 (no meaningful similarity)
Candidate 3:
- Vendor similarity: 0.0 (No Vendor vs Eleven Labs Inc.)
- Amount difference: 106.88 (95.5%)
WARNING:services.ai_matcher:Failed to parse AI response for receipt: Eleven Labs Inc.
WARNING:services.ai_matcher:No match found for receipt: Eleven Labs Inc. - $111.87
INFO:services.ai_matcher:Processing receipt 8/8: Twitter, Inc. - $4.0
WARNING:services.ai_matcher:No candidates found for receipt: Twitter, Inc. - $4.0
WARNING:services.ai_matcher:No match found for receipt: Twitter, Inc. - $4.0
INFO:services.ai_matcher:AI matching completed. Found 4 matches
INFO:__main__:Matching completed, got 4 results
INFO:__main__:Generated stats: {'total': 4, 'high_confidence': 0, 'low_confidence': 4, 'avg_score': 0.0}
INFO:__main__:Match-specific completed successfully with 4 matches
INFO: 102.89.45.216:14600 - "POST /match-specific HTTP/1.1" 200 OK
INFO: 102.89.45.216:16587 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO: 102.89.45.216:16587 - "POST /process/a8969315-6ed6-4dcd-9a47-3eb542d85d64 HTTP/1.1" 200 OK
INFO: 102.89.45.216:16587 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO: 102.89.45.216:16587 - "POST /process/9845ef9d-2bd3-4803-93f8-d8d5bca0de7b HTTP/1.1" 200 OK
INFO: 102.89.45.216:16587 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
WARNING:services.document_processor:Initial JSON parsing failed: Extra data: line 10 column 4 (char 246)
INFO: 102.89.45.216:16587 - "POST /process/ba36aa95-8fdb-4f16-973e-479f99da3100 HTTP/1.1" 200 OK
INFO: 102.89.45.216:16587 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO: 102.89.45.216:16587 - "POST /process/dc542f59-1105-470c-a401-56407f2bbecf HTTP/1.1" 200 OK
INFO: 102.89.45.216:16587 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO: 102.89.45.216:16587 - "POST /process/d0d43d67-1e25-47b8-bf74-8ce9695cb699 HTTP/1.1" 200 OK
INFO: 102.89.45.216:16533 - "POST /transactions/import/csv HTTP/1.1" 200 OK
INFO:__main__:Starting match-specific for file IDs: ['d0d43d67-1e25-47b8-bf74-8ce9695cb699', 'dc542f59-1105-470c-a401-56407f2bbecf', 'ba36aa95-8fdb-4f16-973e-479f99da3100', '9845ef9d-2bd3-4803-93f8-d8d5bca0de7b', 'a8969315-6ed6-4dcd-9a47-3eb542d85d64', '0b3d64a4-c558-43cb-bf57-a6561205f1e6', 'e96d57f5-2070-43d6-8044-1d68106a3c27', 'bae25e20-2425-4db3-a3fc-adcb09c7d431', 'bfb36530-62f6-489a-b0b9-970ab8e7c20c', '0b4db1d9-670b-4dd7-bd3a-dfa39897acbb', '8fbf46d7-5f7b-4b01-a5d1-173adcb55748', 'e779f8ce-9f9a-4575-af8c-4558c6405977', 'ee595b47-e9b8-4c82-82e6-7490d716baa7'], categorization_id: cat_mgci9kky_b9qz7l
INFO:__main__:Found 7 transactions in database
INFO:__main__:Converted 7 transactions
INFO:__main__:Successfully loaded receipt for file_id: d0d43d67-1e25-47b8-bf74-8ce9695cb699
INFO:__main__:Successfully loaded receipt for file_id: dc542f59-1105-470c-a401-56407f2bbecf
INFO:__main__:Successfully loaded receipt for file_id: ba36aa95-8fdb-4f16-973e-479f99da3100
INFO:__main__:Successfully loaded receipt for file_id: 9845ef9d-2bd3-4803-93f8-d8d5bca0de7b
INFO:__main__:Successfully loaded receipt for file_id: a8969315-6ed6-4dcd-9a47-3eb542d85d64
INFO:__main__:Successfully loaded receipt for file_id: 0b3d64a4-c558-43cb-bf57-a6561205f1e6
INFO:__main__:Successfully loaded receipt for file_id: e96d57f5-2070-43d6-8044-1d68106a3c27
INFO:__main__:Successfully loaded receipt for file_id: bae25e20-2425-4db3-a3fc-adcb09c7d431
INFO:__main__:Successfully loaded receipt for file_id: bfb36530-62f6-489a-b0b9-970ab8e7c20c
INFO:__main__:Successfully loaded receipt for file_id: 0b4db1d9-670b-4dd7-bd3a-dfa39897acbb
INFO:__main__:Successfully loaded receipt for file_id: 8fbf46d7-5f7b-4b01-a5d1-173adcb55748
INFO:__main__:Successfully loaded receipt for file_id: e779f8ce-9f9a-4575-af8c-4558c6405977
INFO:__main__:Successfully loaded receipt for file_id: ee595b47-e9b8-4c82-82e6-7490d716baa7
INFO:__main__:Found 13 receipts, 0 missing
INFO:__main__:Starting matching with 13 receipts and 7 transactions
INFO:services.ai_matcher:Starting AI matching for 13 receipts against 7 transactions
INFO:services.ai_matcher:Processing receipt 1/13: Figma, Inc. - $27.0
INFO:services.ai_matcher:Found 1 candidates for receipt: Figma, Inc.
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: BOOKS BY BESSIE (score: 0.000)
INFO:services.ai_matcher:Found match: 0.000 - No meaningful similarity
However, I can provide a more detailed analysis of why this is the case and what the closest match is.
The receipt has a vendor name of "Figma, Inc.", which does not match any of the candidate transactions. The closest match in terms of vendor name similarity is none, as there are no similar names.
The amount on the receipt is $27.0, which is significantly different from the amounts on the candidate transactions. The closest match in terms of amount accuracy is Candidate 1, but it has a difference of $28.0, which is a 103.7% difference.
The date on the receipt is 2025-06-19, which is also significantly different from the dates on the candidate transactions. The closest match in terms of date proximity is Candidate 1, but it is 136 days apart.
The description on the receipt is
INFO:services.ai_matcher:Processing receipt 2/13: Google LLC - $21.15
INFO:services.ai_matcher:Found 1 candidates for receipt: Google LLC
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: BOOKS BY BESSIE (score: 0.000)
INFO:services.ai_matcher:Found match: 0.000 - No meaningful similarity
The reason for this low score is that there are significant differences between the receipt and the candidate transactions. The vendor name is completely different ("Google LLC" vs. "BOOKS BY BESSIE"), the amount is significantly different ($21.15 vs. $55.0), and the date is 155 days apart.
INFO:services.ai_matcher:Processing receipt 3/13: PAYPAL *BZAABAWSKYJ - $37.55
INFO:services.ai_matcher:Found 1 candidates for receipt: PAYPAL *BZAABAWSKYJ
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: BOOKS BY BESSIE (score: 0.000)
INFO:services.ai_matcher:Found match: 0.000 - No meaningful similarity
However, since I must return the candidate with the highest match score, even if it's very low, I will provide the next best option:
5|0.15|Best available option despite significant differences in vendor and amount
INFO:services.ai_matcher:Processing receipt 4/13: Eleven Labs Inc. - $111.87
INFO:services.ai_matcher:Found 3 candidates for receipt: Eleven Labs Inc.
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: A1 RENTAL BACKHOE DEPOSIT REFUND (score: 0.000)
INFO:services.ai_matcher:Found match: 0.000 - No meaningful similarity
Explanation: None of the candidate transactions match the receipt in terms of vendor name, amount, date, or description. However, I must return a candidate, so I'm returning the first one with a confidence score of 0.0, indicating no meaningful similarity.
INFO:services.ai_matcher:Processing receipt 5/13: Figma, Inc. - $27.0
INFO:services.ai_matcher:Found 1 candidates for receipt: Figma, Inc.
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: BOOKS BY BESSIE (score: 0.000)
INFO:services.ai_matcher:Found match: 0.000 - No meaningful similarity
The reason for this low score is that there are significant differences in vendor name, amount, date, and description between the receipt and the candidate transactions. The vendor name is completely different, the amount is off by $28, the date is 136 days apart, and the description does not match.
INFO:services.ai_matcher:Processing receipt 6/13: PAYPAL *BZA BAWSKYJ - $37.55
INFO:services.ai_matcher:Found 1 candidates for receipt: PAYPAL *BZA BAWSKYJ
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: BOOKS BY BESSIE (score: 0.000)
INFO:services.ai_matcher:Found match: 0.000 - No meaningful similarity
However, since I'm not allowed to return "NONE" and must return the best match, I'll provide the next best option:
1|0.0|No meaningful similarity
Since there are no perfect matches, I'll look for the next best option.
Candidate 1 has a significant difference in vendor name (46.5%), amount difference (46.5%), and a large date difference (895 days). However, it's the only candidate available, so it's the best match.
1|0.0|No meaningful similarity
INFO:services.ai_matcher:Processing receipt 7/13: Figma, Inc. - $27.0
INFO:services.ai_matcher:Found 1 candidates for receipt: Figma, Inc.
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: BOOKS BY BESSIE (score: 0.000)
INFO:services.ai_matcher:Found match: 0.000 - No meaningful similarity
However, I can provide a more detailed explanation of why this is the case. None of the candidate transactions match the receipt perfectly, but I can calculate a score for each candidate based on the given criteria.
Candidate 1:
- Vendor name similarity: 0 ( BOOKS BY BESSIE vs Figma, Inc. )
- Amount accuracy: 0 ( $55.0 vs $27.0 )
- Date proximity: 0.007 ( 136 days difference )
- Description/notes relevance: 0 ( No relevance )
- Amount difference: 103.7% ( significant difference )
- Overall score: 0.0
Since none of the candidate transactions match the receipt perfectly, I will return the candidate with the highest score, which is still 0.0. However, I can suggest that the best available option is actually none of the
INFO:services.ai_matcher:Processing receipt 8/13: Eleven Labs Inc. - $111.87
INFO:services.ai_matcher:Found 3 candidates for receipt: Eleven Labs Inc.
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
WARNING:services.ai_matcher:Could not parse single match response: To determine the best match, I will evaluate each candidate transaction based on the scoring criteria.
Candidate 1:
- Vendor similarity: 0.0 (Eleven Labs Inc. vs A1 RENTAL BACKHOE DEPOSIT REFUND)
- Amount difference: 88.13 (78.8%)
- Date difference: 115 days
- Description/notes relevance: 0.0 (no relevance)
- Total score: 0.0
Candidate 2:
- Vendor similarity: 0.0 (Eleven Labs Inc. vs BOOKS BY BESSIE)
- Amount difference: 56.87 (50.8%)
- Date difference: 145 days
- Description/notes relevance: 0.0 (no relevance)
- Total score: 0.0
Candidate 3:
- Vendor similarity: 0.0 (Eleven Labs Inc. vs No Vendor)
- Amount difference: 106.88 (95.5%)
-
WARNING:services.ai_matcher:Failed to parse AI response for receipt: Eleven Labs Inc.
WARNING:services.ai_matcher:No match found for receipt: Eleven Labs Inc. - $111.87
INFO:services.ai_matcher:Processing receipt 9/13: Twitter, Inc. - $4.0
WARNING:services.ai_matcher:No candidates found for receipt: Twitter, Inc. - $4.0
WARNING:services.ai_matcher:No match found for receipt: Twitter, Inc. - $4.0
INFO:services.ai_matcher:Processing receipt 10/13: PAYPAL *BZABAWSKYJ - $37.55
INFO:services.ai_matcher:Found 1 candidates for receipt: PAYPAL *BZABAWSKYJ
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: BOOKS BY BESSIE (score: 0.000)
INFO:services.ai_matcher:Found match: 0.000 - No meaningful similarity
The reason for this low score is that there are significant differences in vendor name, amount, and date between the receipt and the candidate transactions. The vendor name is completely different, the amount is off by $17.45, and the date is 895 days apart.
INFO:services.ai_matcher:Processing receipt 11/13: Figma, Inc. - $27.0
INFO:services.ai_matcher:Found 1 candidates for receipt: Figma, Inc.
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: BOOKS BY BESSIE (score: 0.000)
INFO:services.ai_matcher:Found match: 0.000 - No meaningful similarity
However, since I must return the candidate with the highest match score, even if it's very low, I will provide the next best option:
5|0.2|Minimal similarity due to vendor name difference, but same category and date proximity
INFO:services.ai_matcher:Processing receipt 12/13: Eleven Labs Inc. - $111.87
INFO:services.ai_matcher:Found 3 candidates for receipt: Eleven Labs Inc.
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
WARNING:services.ai_matcher:Could not parse single match response: To determine the best match, I will evaluate each candidate transaction based on the scoring criteria.
Candidate 1:
- Vendor similarity: 0.0 (Eleven Labs Inc. vs A1 RENTAL BACKHOE DEPOSIT REFUND)
- Amount accuracy: 0.0 (no exact match)
- Date proximity: 0.0 (115 days difference)
- Description/notes relevance: 0.0 (no relevance)
Total score: 0.0
Candidate 2:
- Vendor similarity: 0.0 (Eleven Labs Inc. vs BOOKS BY BESSIE)
- Amount accuracy: 0.0 (no exact match)
- Date proximity: 0.0 (145 days difference)
- Description/notes relevance: 0.0 (no relevance)
Total score: 0.0
Candidate 3:
- Vendor similarity: 0.0 (Eleven Labs Inc. vs No Vendor)
- Amount accuracy: 0
WARNING:services.ai_matcher:Failed to parse AI response for receipt: Eleven Labs Inc.
WARNING:services.ai_matcher:No match found for receipt: Eleven Labs Inc. - $111.87
INFO:services.ai_matcher:Processing receipt 13/13: Twitter, Inc. - $4.0
WARNING:services.ai_matcher:No candidates found for receipt: Twitter, Inc. - $4.0
WARNING:services.ai_matcher:No match found for receipt: Twitter, Inc. - $4.0
INFO:services.ai_matcher:AI matching completed. Found 9 matches
INFO:__main__:Matching completed, got 9 results
INFO:__main__:Generated stats: {'total': 9, 'high_confidence': 0, 'low_confidence': 9, 'avg_score': 0.0}
INFO:__main__:Match-specific completed successfully with 9 matches
INFO: 102.89.45.216:11676 - "POST /match-specific HTTP/1.1" 200 OK
INFO: 102.89.45.216:28828 - "POST /transactions/import/csv HTTP/1.1" 200 OK
INFO: 102.89.45.216:14522 - "POST /transactions/import/csv HTTP/1.1" 200 OK
INFO: 102.89.45.216:2730 - "POST /transactions/import/csv HTTP/1.1" 200 OK
INFO:__main__:Starting match-specific for file IDs: ['d0d43d67-1e25-47b8-bf74-8ce9695cb699', 'dc542f59-1105-470c-a401-56407f2bbecf', 'ba36aa95-8fdb-4f16-973e-479f99da3100', '9845ef9d-2bd3-4803-93f8-d8d5bca0de7b', 'a8969315-6ed6-4dcd-9a47-3eb542d85d64', '0b3d64a4-c558-43cb-bf57-a6561205f1e6', 'e96d57f5-2070-43d6-8044-1d68106a3c27', 'bae25e20-2425-4db3-a3fc-adcb09c7d431', 'bfb36530-62f6-489a-b0b9-970ab8e7c20c', '0b4db1d9-670b-4dd7-bd3a-dfa39897acbb', '8fbf46d7-5f7b-4b01-a5d1-173adcb55748', 'e779f8ce-9f9a-4575-af8c-4558c6405977', 'ee595b47-e9b8-4c82-82e6-7490d716baa7'], categorization_id: cat_mgcolko1_wmfzzd
INFO:__main__:Found 119 transactions in database
INFO:__main__:Converted 119 transactions
INFO:__main__:Successfully loaded receipt for file_id: d0d43d67-1e25-47b8-bf74-8ce9695cb699
INFO:__main__:Successfully loaded receipt for file_id: dc542f59-1105-470c-a401-56407f2bbecf
INFO:__main__:Successfully loaded receipt for file_id: ba36aa95-8fdb-4f16-973e-479f99da3100
INFO:__main__:Successfully loaded receipt for file_id: 9845ef9d-2bd3-4803-93f8-d8d5bca0de7b
INFO:__main__:Successfully loaded receipt for file_id: a8969315-6ed6-4dcd-9a47-3eb542d85d64
INFO:__main__:Successfully loaded receipt for file_id: 0b3d64a4-c558-43cb-bf57-a6561205f1e6
INFO:__main__:Successfully loaded receipt for file_id: e96d57f5-2070-43d6-8044-1d68106a3c27
INFO:__main__:Successfully loaded receipt for file_id: bae25e20-2425-4db3-a3fc-adcb09c7d431
INFO:__main__:Successfully loaded receipt for file_id: bfb36530-62f6-489a-b0b9-970ab8e7c20c
INFO:__main__:Successfully loaded receipt for file_id: 0b4db1d9-670b-4dd7-bd3a-dfa39897acbb
INFO:__main__:Successfully loaded receipt for file_id: 8fbf46d7-5f7b-4b01-a5d1-173adcb55748
INFO:__main__:Successfully loaded receipt for file_id: e779f8ce-9f9a-4575-af8c-4558c6405977
INFO:__main__:Successfully loaded receipt for file_id: ee595b47-e9b8-4c82-82e6-7490d716baa7
INFO:__main__:Found 13 receipts, 0 missing
INFO:__main__:Starting matching with 13 receipts and 119 transactions
INFO:services.ai_matcher:Starting AI matching for 13 receipts against 119 transactions
INFO:services.ai_matcher:Processing receipt 1/13: Figma, Inc. - $27.0
INFO:services.ai_matcher:Found 44 candidates for receipt: Figma, Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 8: Unknown (score: 0.290)
INFO:services.ai_matcher:Found match: 0.290 - Close amount match, relevant note about office expenses, but significant date difference
This candidate has a relatively low confidence score due to the significant date difference (85 days apart) and the fact that the vendor name is unknown. However, the amount difference is moderate ($8.03), and the note mentions "Bought lunch for crew 102" which could be related to office expenses, making it a slightly better match than the other candidates.
INFO:services.ai_matcher:Processing receipt 2/13: Google LLC - $21.15
INFO:services.ai_matcher:Found 25 candidates for receipt: Google LLC
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 7: Unknown (score: 0.140)
INFO:services.ai_matcher:Found match: 0.140 - Closest amount match, but significant difference in vendor name and date
Reasoning:
- Vendor name similarity: 0 (Unknown vs Google LLC)
- Amount accuracy: 0.14 (18.08 vs 21.15, 14.5% difference)
- Date proximity: 0 (93 days difference)
- Description/notes relevance: 0 (Office Supplies vs Google Workspace)
Although the amount match is the closest among all candidates, the significant differences in vendor name and date result in a low confidence score.
INFO:services.ai_matcher:Processing receipt 3/13: PAYPAL *BZAABAWSKYJ - $37.55
INFO:services.ai_matcher:Found 62 candidates for receipt: PAYPAL *BZAABAWSKYJ
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
WARNING:services.ai_matcher:Could not parse single match response: After analyzing the receipt against all the candidate transactions, I found the best match to be:
Candidate 1: 0.09|Vendor name similarity, significant amount difference, and large date difference
Reason: Although the vendor name is unknown, the amount difference is relatively minor ($3.55) compared to other candidates. However, the date difference is significant (864 days), and the vendor name is unknown, resulting in a low confidence score.
WARNING:services.ai_matcher:Failed to parse AI response for receipt: PAYPAL *BZAABAWSKYJ
WARNING:services.ai_matcher:No match found for receipt: PAYPAL *BZAABAWSKYJ - $37.55
INFO:services.ai_matcher:Processing receipt 4/13: Eleven Labs Inc. - $111.87
INFO:services.ai_matcher:Found 90 candidates for receipt: Eleven Labs Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Unknown (score: 0.900)
INFO:services.ai_matcher:Found match: 0.900 - Same amount, minor difference in vendor name, and relatively close date
Reasoning:
- The amount matches exactly, with a minor difference of 0.1%.
- Although the vendor name is unknown, it's likely a typo or variation of Eleven Labs Inc.
- The date difference is 87 days, which is relatively close considering the other options.
This candidate has the highest match score, despite not being a perfect match.
INFO:services.ai_matcher:Processing receipt 5/13: Figma, Inc. - $27.0
INFO:services.ai_matcher:Found 44 candidates for receipt: Figma, Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 8: Unknown (score: 0.290)
INFO:services.ai_matcher:Found match: 0.290 - Close amount match, relevant note about office expenses, but significant date difference
This candidate has a close amount match ($18.97 vs $27.0), a relevant note about office expenses, but a significant date difference of 85 days.
INFO:services.ai_matcher:Processing receipt 6/13: PAYPAL *BZA BAWSKYJ - $37.55
INFO:services.ai_matcher:Found 62 candidates for receipt: PAYPAL *BZA BAWSKYJ
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
WARNING:services.ai_matcher:Could not parse single match response: After analyzing the receipt against all the candidate transactions, I found the best match to be:
Candidate 1: 0.09|Vendor name similarity, significant amount difference, and large date difference
Reason: Although the vendor name is unknown, the description in the receipt contains the vendor's name, which is a good match. However, the amount difference is significant (9.5%), and the date difference is large (864 days). This is the best available option despite the significant differences.
WARNING:services.ai_matcher:Failed to parse AI response for receipt: PAYPAL *BZA BAWSKYJ
WARNING:services.ai_matcher:No match found for receipt: PAYPAL *BZA BAWSKYJ - $37.55
INFO:services.ai_matcher:Processing receipt 7/13: Figma, Inc. - $27.0
INFO:services.ai_matcher:Found 44 candidates for receipt: Figma, Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 9.000000 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Unknown (score: 0.190)
INFO:services.ai_matcher:Found match: 0.190 - Vendor name similarity, amount difference of 9.8%, and no description match
This is because Candidate 1 has the closest vendor name similarity (Unknown vs Figma, Inc. is not possible, but it's the closest) and the smallest amount difference among all the candidates. Although the date difference is significant (62 days), it's still the best available option given the other factors.
INFO:services.ai_matcher:Processing receipt 8/13: Eleven Labs Inc. - $111.87
INFO:services.ai_matcher:Found 90 candidates for receipt: Eleven Labs Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 11.000000 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Unknown (score: 0.900)
INFO:services.ai_matcher:Found match: 0.900 - Vendor name similarity, exact amount match, 87 days apart
Reasoning:
- Vendor name similarity: Although the vendor name is unknown, it's likely that Eleven Labs Inc. is a similar or related entity to the vendor in Candidate 1, given the context of the transaction.
- Amount accuracy: The amount in Candidate 1 ($112.0) is very close to the amount in the receipt ($111.87), with a difference of only 0.1%.
- Date proximity: The date in Candidate 1 (2025-09-05) is 87 days apart from the date in the receipt (2025-06-10), which is a relatively small difference.
- Description/notes relevance: Although the description in Candidate 1 is not directly related to the receipt, it mentions "Bank Equipment rental for 5 days," which could
INFO:services.ai_matcher:Processing receipt 9/13: Twitter, Inc. - $4.0
INFO:services.ai_matcher:Found 2 candidates for receipt: Twitter, Inc.
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 7.000000 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
WARNING:services.ai_matcher:Could not parse single match response: Based on the given receipt and candidate transactions, I will analyze each candidate and return the best match.
Candidate 1:
- Vendor: Unknown (0.0 similarity)
- Amount: $3.86 (3.5% difference from $4.0)
- Date: 2025-09-03 (65 days difference)
- Notes: Bank No Description (no relevance to "X Premium Basic")
Score: 0.6 (Medium confidence due to minor amount difference, but unknown vendor and no description relevance)
Candidate 2:
- Vendor: Unknown (0.0 similarity)
- Amount: $5.66 (41.5% difference from $4.0)
- Date: 2025-08-29 (60 days difference)
- Notes: Bank No Description (no relevance to "X Premium Basic")
Score: 0.4 (Low confidence due to significant amount difference and unknown vendor)
Since neither candidate has a perfect match, I will choose
WARNING:services.ai_matcher:Failed to parse AI response for receipt: Twitter, Inc.
WARNING:services.ai_matcher:No match found for receipt: Twitter, Inc. - $4.0
INFO:services.ai_matcher:Processing receipt 10/13: PAYPAL *BZABAWSKYJ - $37.55
INFO:services.ai_matcher:Found 62 candidates for receipt: PAYPAL *BZABAWSKYJ
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 9.000000 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
WARNING:services.ai_matcher:Could not parse single match response: After analyzing the receipt against all the candidate transactions, I found the best match to be:
Candidate 1: 0.09|Vendor name similarity, significant amount difference, and large date difference
Reason: Although the amount difference is significant (9.5%), the vendor name similarity is the closest match among all candidates. The date difference is also substantial, but it's the best available option given the other differences.
WARNING:services.ai_matcher:Failed to parse AI response for receipt: PAYPAL *BZABAWSKYJ
WARNING:services.ai_matcher:No match found for receipt: PAYPAL *BZABAWSKYJ - $37.55
INFO:services.ai_matcher:Processing receipt 11/13: Figma, Inc. - $27.0
INFO:services.ai_matcher:Found 44 candidates for receipt: Figma, Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 10.000000 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 8: Unknown (score: 0.290)
INFO:services.ai_matcher:Found match: 0.290 - Closest amount match, minor date difference, and relevant note about office expenses
This candidate has a relatively low confidence score due to significant differences in vendor name and amount. However, it is the best available option given the provided candidate transactions.
INFO:services.ai_matcher:Processing receipt 12/13: Eleven Labs Inc. - $111.87
INFO:services.ai_matcher:Found 90 candidates for receipt: Eleven Labs Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 11.000000 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Unknown (score: 0.900)
INFO:services.ai_matcher:Found match: 0.900 - Same amount, minor difference in vendor name, and relatively close date
Reasoning:
- Vendor name similarity: The vendor name is unknown in both the receipt and the candidate transaction, so it's not a strong match. However, it's not a major difference either.
- Amount accuracy: The amount is $111.87 in the receipt and $112.0 in the candidate transaction, which is a minor difference of 0.1%.
- Date proximity: The date is 2025-06-10 in the receipt and 2025-09-05 in the candidate transaction, which is a difference of 87 days. This is not ideal, but it's not a major difference either.
- Description/notes relevance: There is no description or notes in the receipt, but the candidate transaction has a note about bank equipment rental. This is not directly
INFO:services.ai_matcher:Processing receipt 13/13: Twitter, Inc. - $4.0
INFO:services.ai_matcher:Found 2 candidates for receipt: Twitter, Inc.
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 7.000000 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
WARNING:services.ai_matcher:Could not parse single match response: Based on the provided receipt and candidate transactions, I will analyze each candidate and return the best match.
Candidate 1:
- Vendor: Unknown (0.0 similarity)
- Amount: $3.86 (3.5% difference from $4.0)
- Date: 2025-09-03 (65 days difference)
- Notes: Bank No Description (no relevance to "X Premium Basic")
- Amount difference: $0.14000000000000012 (3.5%)
Score: 0.6 (Medium confidence, minor differences in amount and date)
Candidate 2:
- Vendor: Unknown (0.0 similarity)
- Amount: $5.66 (41.5% difference from $4.0)
- Date: 2025-08-29 (60 days difference)
- Notes: Bank No Description (no relevance to "X Premium Basic")
- Amount difference: $1.6600000000000001 (41.5
WARNING:services.ai_matcher:Failed to parse AI response for receipt: Twitter, Inc.
WARNING:services.ai_matcher:No match found for receipt: Twitter, Inc. - $4.0
INFO:services.ai_matcher:AI matching completed. Found 8 matches
INFO:__main__:Matching completed, got 8 results
INFO:__main__:Generated stats: {'total': 8, 'high_confidence': 3, 'low_confidence': 5, 'avg_score': 0.49}
INFO:__main__:Match-specific completed successfully with 8 matches
INFO:__main__:Starting match-specific for file IDs: ['d0d43d67-1e25-47b8-bf74-8ce9695cb699', 'dc542f59-1105-470c-a401-56407f2bbecf', 'ba36aa95-8fdb-4f16-973e-479f99da3100', '9845ef9d-2bd3-4803-93f8-d8d5bca0de7b', 'a8969315-6ed6-4dcd-9a47-3eb542d85d64', '0b3d64a4-c558-43cb-bf57-a6561205f1e6', 'e96d57f5-2070-43d6-8044-1d68106a3c27', 'bae25e20-2425-4db3-a3fc-adcb09c7d431', 'bfb36530-62f6-489a-b0b9-970ab8e7c20c', '0b4db1d9-670b-4dd7-bd3a-dfa39897acbb', '8fbf46d7-5f7b-4b01-a5d1-173adcb55748', 'e779f8ce-9f9a-4575-af8c-4558c6405977', 'ee595b47-e9b8-4c82-82e6-7490d716baa7'], categorization_id: cat_mgcolko1_wmfzzd
INFO:__main__:Found 119 transactions in database
INFO:__main__:Converted 119 transactions
INFO:__main__:Successfully loaded receipt for file_id: d0d43d67-1e25-47b8-bf74-8ce9695cb699
INFO:__main__:Successfully loaded receipt for file_id: dc542f59-1105-470c-a401-56407f2bbecf
INFO:__main__:Successfully loaded receipt for file_id: ba36aa95-8fdb-4f16-973e-479f99da3100
INFO:__main__:Successfully loaded receipt for file_id: 9845ef9d-2bd3-4803-93f8-d8d5bca0de7b
INFO:__main__:Successfully loaded receipt for file_id: a8969315-6ed6-4dcd-9a47-3eb542d85d64
INFO:__main__:Successfully loaded receipt for file_id: 0b3d64a4-c558-43cb-bf57-a6561205f1e6
INFO:__main__:Successfully loaded receipt for file_id: e96d57f5-2070-43d6-8044-1d68106a3c27
INFO:__main__:Successfully loaded receipt for file_id: bae25e20-2425-4db3-a3fc-adcb09c7d431
INFO:__main__:Successfully loaded receipt for file_id: bfb36530-62f6-489a-b0b9-970ab8e7c20c
INFO:__main__:Successfully loaded receipt for file_id: 0b4db1d9-670b-4dd7-bd3a-dfa39897acbb
INFO:__main__:Successfully loaded receipt for file_id: 8fbf46d7-5f7b-4b01-a5d1-173adcb55748
INFO:__main__:Successfully loaded receipt for file_id: e779f8ce-9f9a-4575-af8c-4558c6405977
INFO:__main__:Successfully loaded receipt for file_id: ee595b47-e9b8-4c82-82e6-7490d716baa7
INFO:__main__:Found 13 receipts, 0 missing
INFO:__main__:Starting matching with 13 receipts and 119 transactions
INFO:services.ai_matcher:Starting AI matching for 13 receipts against 119 transactions
INFO:services.ai_matcher:Processing receipt 1/13: Figma, Inc. - $27.0
INFO:services.ai_matcher:Found 44 candidates for receipt: Figma, Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 8: Unknown (score: 0.390)
INFO:services.ai_matcher:Found match: 0.390 - Date proximity, description relevance, but significant amount difference
INFO:services.ai_matcher:Processing receipt 2/13: Google LLC - $21.15
INFO:services.ai_matcher:Found 25 candidates for receipt: Google LLC
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 7: Unknown (score: 0.140)
INFO:services.ai_matcher:Found match: 0.140 - Vendor name similarity (Google LLC vs Unknown), exact amount match is not possible, but amount difference is moderate, and date proximity is relatively good (93 days difference)
Note: The confidence score is low due to significant differences in vendor name and amount, but it's the best available option given the provided candidate transactions.
INFO:services.ai_matcher:Processing receipt 3/13: PAYPAL *BZAABAWSKYJ - $37.55
INFO:services.ai_matcher:Found 62 candidates for receipt: PAYPAL *BZAABAWSKYJ
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Unknown (score: 0.950)
INFO:services.ai_matcher:Found match: 0.950 - Exact amount match, minor date difference
Reasoning:
- The amount on the receipt ($37.55) matches exactly with Candidate 1 ($34.0, but considering the absolute value, it's $34.0).
- Although the date difference is significant (864 days), the amount match is a strong indicator of a potential match.
- The vendor name is unknown, but the description is not provided for any candidate, so it's not a deciding factor in this case.
Note that the confidence score is high despite the significant date difference, as the amount match is a strong indicator of a potential match.
INFO:services.ai_matcher:Processing receipt 4/13: Eleven Labs Inc. - $111.87
INFO:services.ai_matcher:Found 90 candidates for receipt: Eleven Labs Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Unknown (score: 0.900)
INFO:services.ai_matcher:Found match: 0.900 - Same amount, minor difference in vendor name, and relatively close date
Reasoning:
- Vendor name similarity: 0.8 (unknown vs Eleven Labs Inc. is not a perfect match, but the difference is minor)
- Amount accuracy: 0.95 (amount difference is 0.1%, which is considered minor)
- Date proximity: 0.9 (87 days difference is relatively close)
- Description/notes relevance: 0.8 (the description is not directly related to the receipt, but it's a plausible explanation for the transaction)
The confidence score is 0.9, which falls under the high confidence category.
INFO:services.ai_matcher:Processing receipt 5/13: Figma, Inc. - $27.0
INFO:services.ai_matcher:Found 44 candidates for receipt: Figma, Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Unknown (score: 0.890)
INFO:services.ai_matcher:Found match: 0.890 - Close vendor name match, minor amount difference, and relatively close date
Reasoning:
- Vendor name similarity: Figma, Inc. is not explicitly mentioned in the candidate transactions, but "Unknown" is a close match to the vendor name.
- Amount accuracy: The amount difference is $2.64, which is a relatively minor difference of 9.8%.
- Date proximity: The date difference is 62 days, which is not ideal but still relatively close.
- Description/notes relevance: There is no description or notes in the candidate transactions, so this factor does not contribute to the match score.
Note that while the match score is not perfect, Candidate 1 has the highest score among all the candidate transactions, making it the best available option despite significant differences in vendor and amount.
INFO:services.ai_matcher:Processing receipt 6/13: PAYPAL *BZA BAWSKYJ - $37.55
INFO:services.ai_matcher:Found 62 candidates for receipt: PAYPAL *BZA BAWSKYJ
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 1.000000 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
WARNING:services.ai_matcher:Could not parse single match response: After analyzing the receipt against all the candidate transactions, I found the best match to be:
Candidate 1: 0.09|Vendor name similarity, but significant amount difference and large date gap
This candidate has the highest match score despite significant differences in amount and date. The vendor name similarity is the primary reason for this match, but the large date gap and significant amount difference reduce the overall confidence score.
WARNING:services.ai_matcher:Failed to parse AI response for receipt: PAYPAL *BZA BAWSKYJ
WARNING:services.ai_matcher:No match found for receipt: PAYPAL *BZA BAWSKYJ - $37.55
INFO:services.ai_matcher:Processing receipt 7/13: Figma, Inc. - $27.0
INFO:services.ai_matcher:Found 44 candidates for receipt: Figma, Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 11.000000 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 8: Unknown (score: 0.290)
INFO:services.ai_matcher:Found match: 0.290 - Closest amount match, minor difference in vendor name, and some relevance in the notes (Bought lunch for crew, which might be related to office expenses)
Note: Although the amount difference is significant (29.7%), it's the closest match in terms of amount, and the notes provide some relevance to the office category.
INFO:services.ai_matcher:Processing receipt 8/13: Eleven Labs Inc. - $111.87
INFO:services.ai_matcher:Found 90 candidates for receipt: Eleven Labs Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 10.000000 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Unknown (score: 0.900)
INFO:services.ai_matcher:Found match: 0.900 - Same amount, minor date difference, unknown vendor matches with the receipt's unknown vendor
Explanation:
- Vendor similarity: The receipt's vendor is unknown, and Candidate 1's vendor is also unknown, so this is a perfect match in terms of vendor similarity.
- Amount accuracy: The amount on the receipt ($111.87) is very close to the amount in Candidate 1 ($112.0), with a difference of only $0.12999999999999545 (0.1%).
- Date proximity: The date on the receipt (2025-06-10) is 87 days apart from the date in Candidate 1 (2025-09-05), which is a relatively minor difference.
- Description/notes relevance: While the description in Candidate 1 does not match the description on the receipt, the notes mention "Bank Equipment rental
INFO:services.ai_matcher:Processing receipt 9/13: Twitter, Inc. - $4.0
INFO:services.ai_matcher:Found 2 candidates for receipt: Twitter, Inc.
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 7.000000 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
WARNING:services.ai_matcher:Could not parse single match response: Based on the provided receipt and candidate transactions, I will analyze each candidate and return the best match.
Candidate 1:
- Vendor: Unknown (0.0 similarity)
- Amount: $3.86 (3.5% difference from $4.0)
- Date: 2025-09-03 (65 days difference)
- Notes: Bank No Description (no relevance to "X Premium Basic")
- Amount difference: $0.14000000000000012 (3.5%)
Score: 0.6 (Medium confidence due to minor amount difference and lack of vendor and date match)
Candidate 2:
- Vendor: Unknown (0.0 similarity)
- Amount: $5.66 (41.5% difference from $4.0)
- Date: 2025-08-29 (60 days difference)
- Notes: Bank No Description (no relevance to "X Premium Basic")
- Amount difference: $1.660000000000000
WARNING:services.ai_matcher:Failed to parse AI response for receipt: Twitter, Inc.
WARNING:services.ai_matcher:No match found for receipt: Twitter, Inc. - $4.0
INFO:services.ai_matcher:Processing receipt 10/13: PAYPAL *BZABAWSKYJ - $37.55
INFO:services.ai_matcher:Found 62 candidates for receipt: PAYPAL *BZABAWSKYJ
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 10.000000 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Unknown (score: 0.190)
INFO:services.ai_matcher:Found match: 0.190 - Vendor name similarity, amount difference of 9.5%
This is because the vendor name is similar (PAYPAL *BZABAWSKYJ vs Unknown), but the amount is off by 9.5%. The date difference is significant (864 days), and the description/notes do not match. However, this is the best available option given the significant differences in the other candidates.
INFO:services.ai_matcher:Processing receipt 11/13: Figma, Inc. - $27.0
INFO:services.ai_matcher:Found 44 candidates for receipt: Figma, Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 10.000000 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Unknown (score: 0.600)
INFO:services.ai_matcher:Found match: 0.600 - Vendor name similarity (Figma, Inc. is similar to Unknown), moderate amount difference ($2.64), and date proximity (62 days apart)
Note: Although the amount difference is significant, the vendor name similarity and date proximity contribute to a moderate confidence score.
INFO:services.ai_matcher:Processing receipt 12/13: Eleven Labs Inc. - $111.87
INFO:services.ai_matcher:Found 90 candidates for receipt: Eleven Labs Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 11.000000 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Unknown (score: 0.900)
INFO:services.ai_matcher:Found match: 0.900 - Same vendor name similarity (Eleven Labs Inc. and Unknown), minor amount difference (0.1%), and relatively close date (87 days apart)
Note that while the vendor name is not an exact match, it is the closest match available, and the amount difference is minor. The date difference is also relatively close, considering the time frame.
INFO:services.ai_matcher:Processing receipt 13/13: Twitter, Inc. - $4.0
INFO:services.ai_matcher:Found 2 candidates for receipt: Twitter, Inc.
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 6.000000 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
WARNING:services.ai_matcher:Could not parse single match response: Based on the provided receipt and candidate transactions, I will analyze each candidate and return the best match.
Candidate 1:
- Vendor: Unknown (0.0 similarity to Twitter, Inc.)
- Amount: $3.86 (3.5% difference from $4.0)
- Date: 2025-09-03 (65 days difference)
- Notes: Bank No Description (no relevance to "X Premium Basic")
- Amount difference: $0.14000000000000012 (3.5%)
Score: 0.15 (Minimal similarity due to significant vendor name difference and moderate amount difference)
Candidate 2:
- Vendor: Unknown (0.0 similarity to Twitter, Inc.)
- Amount: $5.66 (41.5% difference from $4.0)
- Date: 2025-08-29 (60 days difference)
- Notes: Bank No Description (no relevance to "X Premium Basic")
- Amount difference: $1
WARNING:services.ai_matcher:Failed to parse AI response for receipt: Twitter, Inc.
WARNING:services.ai_matcher:No match found for receipt: Twitter, Inc. - $4.0
INFO:services.ai_matcher:AI matching completed. Found 10 matches
INFO:__main__:Matching completed, got 10 results
INFO:__main__:Generated stats: {'total': 10, 'high_confidence': 5, 'low_confidence': 4, 'avg_score': 0.61}
INFO:__main__:Match-specific completed successfully with 10 matches
INFO: 102.89.45.216:29795 - "POST /match-specific HTTP/1.1" 200 OK
INFO: 102.89.45.216:22092 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO: 102.89.45.216:22092 - "POST /process/82e672e4-a1a1-4df2-9b7d-f0cfa3307ed9 HTTP/1.1" 200 OK
INFO: 102.89.45.216:22092 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO: 102.89.45.216:22092 - "POST /process/c4a7f61d-9d2a-4e6a-b86d-bb958a06d5f3 HTTP/1.1" 200 OK
INFO: 102.89.45.216:22092 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
WARNING:services.document_processor:Initial JSON parsing failed: Extra data: line 10 column 4 (char 246)
INFO: 102.89.45.216:22092 - "POST /process/1281627c-59fc-4efa-beae-a8a69f3dd508 HTTP/1.1" 200 OK
INFO: 102.89.45.216:22092 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO: 102.89.45.216:22092 - "POST /process/ee93fc23-e6f6-47ee-81da-c5b41319d1bc HTTP/1.1" 200 OK
INFO: 102.89.45.216:22092 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO: 102.89.45.216:22092 - "POST /process/058a0bcf-d25e-49b3-903c-45559de871ad HTTP/1.1" 200 OK
INFO: 199.241.139.243:49820 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO: 199.241.139.243:49836 - "POST /process/2d005728-3cce-4456-be4a-952188203772 HTTP/1.1" 200 OK
INFO: 199.241.139.243:49850 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO: 199.241.139.243:49866 - "POST /process/de39fc65-0565-4c45-a559-bcda66af9c4a HTTP/1.1" 200 OK
INFO: 199.241.139.243:17706 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO: 199.241.139.243:17710 - "POST /process/0f9b5c0f-ab7f-47f6-8edf-f5dab0badd64 HTTP/1.1" 200 OK
INFO: 199.241.139.243:17714 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
WARNING:services.document_processor:Initial JSON parsing failed: Extra data: line 10 column 4 (char 246)
INFO: 199.241.139.243:17730 - "POST /process/cd679479-376d-42f0-ad9e-0743c89cd9fe HTTP/1.1" 200 OK
INFO: 199.241.139.243:17740 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO: 199.241.139.243:17754 - "POST /process/0046dcd7-86a7-4153-be65-cddd3774a232 HTTP/1.1" 200 OK
INFO: 199.241.139.243:39628 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO: 199.241.139.243:39644 - "POST /process/d0fe3ebb-094b-4191-9202-9ab216811ec9 HTTP/1.1" 200 OK
INFO: 199.241.139.243:39652 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO: 199.241.139.243:39656 - "POST /process/1a23de15-07a5-4998-9d3f-6a6345aba237 HTTP/1.1" 200 OK
INFO: 199.241.139.243:39658 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO: 199.241.139.243:39674 - "POST /process/cd3cc6e2-100e-462a-ba4a-3d03ee2da57f HTTP/1.1" 200 OK
INFO: 199.241.139.243:26574 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
WARNING:services.document_processor:Initial JSON parsing failed: Extra data: line 10 column 4 (char 246)
INFO: 199.241.139.243:26586 - "POST /process/ffb999aa-bfd1-4a8a-a7e6-4700b284c30a HTTP/1.1" 200 OK
INFO: 199.241.139.243:26596 - "POST /upload-multiple HTTP/1.1" 200 OK
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO: 199.241.139.243:26602 - "POST /process/a1a16ce3-ef6d-466c-8606-4ba9501f86a7 HTTP/1.1" 200 OK
INFO: 199.241.139.243:46078 - "POST /transactions/import/csv HTTP/1.1" 200 OK
INFO:__main__:Starting match-specific for file IDs: ['a1a16ce3-ef6d-466c-8606-4ba9501f86a7', 'ffb999aa-bfd1-4a8a-a7e6-4700b284c30a', 'cd3cc6e2-100e-462a-ba4a-3d03ee2da57f', '1a23de15-07a5-4998-9d3f-6a6345aba237', 'd0fe3ebb-094b-4191-9202-9ab216811ec9', '0046dcd7-86a7-4153-be65-cddd3774a232', 'cd679479-376d-42f0-ad9e-0743c89cd9fe', '0f9b5c0f-ab7f-47f6-8edf-f5dab0badd64', 'de39fc65-0565-4c45-a559-bcda66af9c4a', '2d005728-3cce-4456-be4a-952188203772'], categorization_id: cat_mgcvsk8r_6upxfy
INFO:__main__:Found 123 transactions in database
INFO:__main__:Converted 123 transactions
INFO:__main__:Successfully loaded receipt for file_id: a1a16ce3-ef6d-466c-8606-4ba9501f86a7
INFO:__main__:Successfully loaded receipt for file_id: ffb999aa-bfd1-4a8a-a7e6-4700b284c30a
INFO:__main__:Successfully loaded receipt for file_id: cd3cc6e2-100e-462a-ba4a-3d03ee2da57f
INFO:__main__:Successfully loaded receipt for file_id: 1a23de15-07a5-4998-9d3f-6a6345aba237
INFO:__main__:Successfully loaded receipt for file_id: d0fe3ebb-094b-4191-9202-9ab216811ec9
INFO:__main__:Successfully loaded receipt for file_id: 0046dcd7-86a7-4153-be65-cddd3774a232
INFO:__main__:Successfully loaded receipt for file_id: cd679479-376d-42f0-ad9e-0743c89cd9fe
INFO:__main__:Successfully loaded receipt for file_id: 0f9b5c0f-ab7f-47f6-8edf-f5dab0badd64
INFO:__main__:Successfully loaded receipt for file_id: de39fc65-0565-4c45-a559-bcda66af9c4a
INFO:__main__:Successfully loaded receipt for file_id: 2d005728-3cce-4456-be4a-952188203772
INFO:__main__:Found 10 receipts, 0 missing
INFO:__main__:Starting matching with 10 receipts and 123 transactions
INFO:services.ai_matcher:Starting AI matching for 10 receipts against 123 transactions
INFO:services.ai_matcher:Processing receipt 1/10: Eleven Labs Inc. - $111.87
INFO:services.ai_matcher:Found 94 candidates for receipt: Eleven Labs Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Unknown (score: 0.870)
INFO:services.ai_matcher:Found match: 0.870 - Same vendor name, exact amount match, 87 days apart
Reasoning:
- Vendor name similarity: 0.95 (perfect match)
- Amount accuracy: 0.95 (exact match)
- Date proximity: 0.8 (87 days apart, which is a relatively minor difference)
- Description/notes relevance: 0.8 (no direct relevance, but the vendor name is the same)
The candidate with the highest match score is Candidate 1, with a confidence score of 0.87.
INFO:services.ai_matcher:Processing receipt 2/10: PAYPAL *BZAABAWSKYJ - $37.55
INFO:services.ai_matcher:Found 66 candidates for receipt: PAYPAL *BZAABAWSKYJ
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Books by Bessie (score: 0.950)
INFO:services.ai_matcher:Found match: 0.950 - Perfect match in vendor name, exact amount match, and exact date match.
This is because Candidate 1 has a perfect match in vendor name, amount, and date, which is the highest scoring criteria.
INFO:services.ai_matcher:Processing receipt 3/10: Figma, Inc. - $27.0
INFO:services.ai_matcher:Found 48 candidates for receipt: Figma, Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Books by Bessie (score: 0.950)
INFO:services.ai_matcher:Found match: 0.950 - Same vendor name, exact amount match, exact date match
This is because Candidate 1 has the exact same vendor name, amount, and date as the receipt, resulting in a perfect match score of 0.95.
INFO:services.ai_matcher:Processing receipt 4/10: Google LLC - $21.15
INFO:services.ai_matcher:Found 29 candidates for receipt: Google LLC
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Books by Bessie (score: 0.950)
INFO:services.ai_matcher:Found match: 0.950 - Perfect match in vendor name, exact amount match, and exact date match
INFO:services.ai_matcher:Processing receipt 5/10: Figma, Inc. - $27.0
INFO:services.ai_matcher:Found 48 candidates for receipt: Figma, Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Books by Bessie (score: 0.950)
INFO:services.ai_matcher:Found match: 0.950 - Perfect match in vendor name, exact amount match, and exact date match
INFO:services.ai_matcher:Processing receipt 6/10: Eleven Labs Inc. - $111.87
INFO:services.ai_matcher:Found 94 candidates for receipt: Eleven Labs Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Unknown (score: 0.870)
INFO:services.ai_matcher:Found match: 0.870 - Same vendor name, exact amount match, 87 days apart
Reasoning:
- Vendor name similarity: 0.95 (same vendor name, Eleven Labs Inc.)
- Amount accuracy: 0.95 (exact amount match, $111.87)
- Date proximity: 0.85 (87 days apart, which is a relatively small difference)
- Description/notes relevance: 0.80 (no direct match, but the transaction is related to a bank equipment rental)
The candidate with the highest match score is Candidate 1, with a confidence score of 0.87.
INFO:services.ai_matcher:Processing receipt 7/10: PAYPAL *BZA BAWSKYJ - $37.55
INFO:services.ai_matcher:Found 66 candidates for receipt: PAYPAL *BZA BAWSKYJ
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 11.000000 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Books by Bessie (score: 0.950)
INFO:services.ai_matcher:Found match: 0.950 - Perfect match in vendor name, exact amount match, and exact date match
This is because Candidate 1 has a perfect match in vendor name ("PAYPAL *BZA BAWSKYJ" vs "PAYPAL *BZABAWSKYJ"), exact amount match ($37.55), and exact date match (2023-05-22).
INFO:services.ai_matcher:Processing receipt 8/10: Figma, Inc. - $27.0
INFO:services.ai_matcher:Found 48 candidates for receipt: Figma, Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 11.000000 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Books by Bessie (score: 0.950)
INFO:services.ai_matcher:Found match: 0.950 - Perfect match in vendor name, exact amount match, and exact date match
INFO:services.ai_matcher:Processing receipt 9/10: Google LLC - $21.15
INFO:services.ai_matcher:Found 29 candidates for receipt: Google LLC
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 11.000000 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Books by Bessie (score: 0.950)
INFO:services.ai_matcher:Found match: 0.950 - Perfect match: same vendor, amount, and date
This candidate has a perfect match score of 0.95 due to the exact match in vendor name, amount, and date.
INFO:services.ai_matcher:Processing receipt 10/10: Figma, Inc. - $27.0
INFO:services.ai_matcher:Found 48 candidates for receipt: Figma, Inc.
INFO:services.ai_matcher:Limited candidates to top 10 by amount similarity
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
INFO:groq._base_client:Retrying request to /openai/v1/chat/completions in 10.000000 seconds
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
INFO:services.ai_matcher:AI selected candidate 1: Books by Bessie (score: 0.950)
INFO:services.ai_matcher:Found match: 0.950 - Same vendor name, exact amount match, exact date match
This is because Candidate 1 has an exact match in vendor name, amount, and date, which meets the scoring criteria for a perfect match.
INFO:services.ai_matcher:AI matching completed. Found 10 matches
INFO:__main__:Matching completed, got 10 results
INFO:__main__:Generated stats: {'total': 10, 'high_confidence': 10, 'low_confidence': 0, 'avg_score': 0.97}
INFO:__main__:Match-specific completed successfully with 10 matches
INFO: 199.241.139.243:50450 - "POST /match-specific HTTP/1.1" 200 OK
+3 -1
View File
@@ -13,4 +13,6 @@ aiofiles
google-auth
google-auth-oauthlib
google-auth-httplib2
google-api-python-client
google-api-python-client
sqlalchemy
pydantic-settings