Add test script for JSON extraction functionality

This commit introduces a new test script, `test_json_extraction.py`, which verifies the correctness of the JSON extraction logic. The script includes a function to extract the first valid JSON object from raw input and a series of test cases covering various scenarios, such as clean JSON, JSON with extra text, nested JSON, and escaped quotes. The tests ensure that the extraction function behaves as expected and handles edge cases appropriately.
This commit is contained in:
2025-10-09 19:56:22 +00:00
parent 2e020437a8
commit 3559cbe19d
5 changed files with 769 additions and 891 deletions
+5 -1
View File
@@ -152,9 +152,11 @@ SCORING CRITERIA:
- Minimal similarity: 0.1-0.19
- No meaningful similarity: 0.0-0.09
The most important factor to consider is the Amount for both the transaction and the receipt. The closer the amounts, the higher the score. If the amounts are different or not close return a low score (0-0.1) based on other factors.
Consider vendor name similarity, amount accuracy, date proximity, and description/notes relevance.
IMPORTANT: You MUST return the candidate with the highest match score, even if it's very low. Never return NONE.
IMPORTANT:
You MUST return the candidate with the highest match score, even if it's very low. Never return NONE.
Return ONLY the best match in this exact format:
CANDIDATE_NUMBER|CONFIDENCE_SCORE|REASON
@@ -338,6 +340,8 @@ Example of low match: 5|0.15|Best available option despite significant differenc
Consider description and category similarity in your scoring.
The most important factor to consider is the Amount for both the transaction and the receipt. The closer the amounts, the higher the score. If the amounts are different or not close return a low score (0-0.1) based on other factors.
IMPORTANT: Return ONLY the score and reason separated by a pipe character.
Format: [score]|[reason]
Example: 0.85|Same vendor, same amount, 2 days apart