Files
ds_quickbooks/__pycache__/document_processor.cpython-311.pyc
T

100 lines
11 KiB
Plaintext
Raw Normal View History

§
Æ•bhõãóˆddlZddlZddlZddlmZddlZddlmZmZm Z m
Z
ddl Z ddl Z ddl
Z
ddlmZGdd¦«ZdS)éN)ÚImage)ÚDictÚAnyÚListÚOptional)ÚdatetimecóìeZdZdZdededeeeffdZdedeeeffdZdedefdZ d edeeeffd
Z
d edefd Z d edeeeffd
Z dedeeeffdZ
dededefdZdS)ÚDocumentProcessorcó\tjtj¬¦«|_d|_dS)N)Úapi_keyz)meta-llama/llama-4-scout-17b-16e-instruct)ÚgroqÚGroqÚconfigÚ GROQ_API_KEYÚclientÚmodel)Úselfs ú0/Users/user/mkd/quickbooks/document_processor.pyÚ__init__zDocumentProcessor.__init__
s$Ý”i­Ô(;Ð<ˆŒ Ø@ˆŒ
ˆ
ˆ
óÚ file_pathÚ file_typeÚreturncƒóDK | ¦«dvr| |¦«ƒd{VS| ¦«dkr| |¦«ƒd{VStd|¦«#t$r}dt |¦«icYd}~Sd}~wwxYw)z.Process uploaded file and extract receipt data)ÚjpgÚjpegÚpngÚgifÚbmpNÚpdfzUnsupported file type: Úerror)ÚlowerÚ_process_imageÚ _process_pdfÚ
ValueErrorÚ ExceptionÚstr)rrrÚes rÚ process_filezDocumentProcessor.process_fileèèðŠÑ Ô Ð$HÐÑÑ" !×.¨yÑ Ð!F¸9Ð!FÐ!FÑGøÝðS ™VœVÐ $øøøøð %øøøs(„0A:µ2A:Á(A:Á:
BÂBÂBÂBÚ
image_pathcƒó„K | |¦«}d}|jjj dd|dœddd|idœgd œg|jd
d ¬ ¦«}|jd
jj  ¦«}| 
|¦«S#t$r}ddt|¦«icYd}~Sd}~wwxYw)z)Extract data from image using Groq visiona^
Analyze this receipt image and extract the following information in JSON format:
{
"vendor": "Store/company name",
"total_amount": 0.00,
"tax_amount": 0.00,
"date": "YYYY-MM-DD",
"category": "Food/Transport/Office/Other",
"confidence": 0.95
}
Rules:
- Extract vendor name as it appears on receipt
- Total amount should be the final total including tax
- Tax amount is separate tax line if available
- Date should be the date on the receipt
- Categorize based on vendor type (Starbucks=Food, Shell=Transport, etc.)
- Confidence score 0-1 based on how clear the receipt is
Return only valid JSON.
ÚuserÚtext)Útyper-Ú image_urlÚurlzdata:image/jpeg;base64,)r.r/©ÚroleÚcontentéôçš™™™™™¹?)ÚmessagesrÚ
max_tokensÚ temperaturerr!zImage processing error: N)
Ú
_encode_imagerÚchatÚ completionsÚcreaterÚchoicesÚmessager3ÚstripÚ_parse_extraction_resultr&r')rr*Ú base64_imageÚpromptÚresponseÚ result_textr(s rr#z DocumentProcessor._process_imagesèèð5 Bà×-¨jÑ9ˆˆFð.”{Ô3×!'à%+°VÐ<à(3à$)Ð+SÀ\Ð+SÐ+Sð."ððð$ð ð ð
ð”jØØð#ôˆHð**¨1Ô=×EˆKØ×Ñ =øåð Bð Bð BØÐ¸A¹¼Ð Aøøøøð Bøøøs„BBÂ
B?Â!B:Â4B?Â:B?cóÈt|d¦«5}tj| ¦«¦« d¦«cddd¦«S#1swxYwYdS)zEncode image to base64 stringÚrbzutf-8N)ÚopenÚbase64Ú b64encodeÚreadÚdecode)rr*Ú
image_files rr9zDocumentProcessor._encode_imageVå
*˜
 G Ô# J§O¢OÑ$5Ô$5Ñ6×=¸ Gð Gð Gð Gñ Gô Gð Gð Gð Gð Gð Gð Gøøøð Gð Gð Gð Gð Gð Gs9AÁAÁAÚpdf_pathcƒó²K | |¦«}| |¦«S#t$r}ddt|¦«icYd}~Sd}~wwxYw)z2Extract data from PDF by converting to image firstr!zPDF processing error: N)Ú_extract_text_from_pdfÚ_process_text_contentr&r')rrMÚ text_contentr(s rr$zDocumentProcessor._process_pdf[sèèð @à×6°xÑ@ˆLØ×-¨lÑ ;øåð @ð @ð @ØÐ>µc¸!±f´ ?øøøøð @øøøs„).®
A¸AÁ AÁAcó t|d¦«5}tj|¦«}d}|jD]}|| ¦«dzz
}Œ|cddd¦«S#1swxYwYdS#t
$r }Yd}~dSd}~wwxYw)zExtract text from PDFrFÚú
N)rGÚPyPDF2Ú PdfReaderÚpagesÚ extract_textr&)rrMÚfileÚ
pdf_readerr-Úpager(s rrOz(DocumentProcessor._extract_text_from_pdfeð ÝÑ
¨Ý-¨dÑ3
ØØ7˜/°$Ñ6Dð 
ð
ð
ð
ñ
ô
ð
ð
ð
ð
ð
ð
øøøð
ð
ð
ð
ð
ð
øõ ð ð ð Ø22222øøøøð øøøs4A(<AÁ A(ÁAÁA(Á"AÁ#A(Á(
A=Á8A=rQcóD d|d}|jjj |jd|dœgdd¬¦«}|jdjj ¦«}|  |¦«S#t$r}d d
t|¦«icYd }~Sd }~wwxYw) z3Process text content using Groq (fallback for PDFs)z
Analyze this receipt text and extract the following information in JSON format:
Receipt Text:

Extract:
{
"vendor": "Store/company name",
"total_amount": 0.00,
"tax_amount": 0.00,
"date": "YYYY-MM-DD",
"category": "Food/Transport/Office/Other",
"confidence": 0.95
}
Rules:
- Extract vendor name as it appears on receipt
- Total amount should be the final total including tax
- Tax amount is separate tax line if available
- Date should be the date on the receipt
- Categorize based on vendor type
- Confidence score 0-1 based on clarity
Return only valid JSON.