Files
ds_fire_fighter/data_ingestion/__pycache__/utils.cpython-311.pyc
T

55 lines
13 KiB
Plaintext
Raw Normal View History

2024-08-05 22:14:19 +01:00
§
0µf¾)ãó€ddlmZddlmZddlZddlmZddlmZddl m
2024-08-07 17:50:40 +01:00
Z
ddl m Z ddl m Z dd l
mZdd
lmZdd lmZddlZddlZddlZddlZdd lmZe¦«ejd
¦«adZe¦«ZdZdZdZdZ dZ!dZ"dZ#dZ$d#dZ%edfdZ&de'fdZ(de)de)de)fdZ*de)de)de)fd „Z+d$d"„Z,dS)%é©ÚHuggingFaceBgeEmbeddings)ÚRecursiveCharacterTextSplitterN)ÚInMemoryDocstore)ÚFAISS)Ú PyPDFLoader)Ú
TextLoader)ÚDocx2txtLoader)Úuuid4)ÚDocument)Ú
2024-08-07 17:50:40 +01:00
TextExtractor)Ú load_dotenvÚOPENAI_API_KEYcó>d}ddi}ddi}t|||¬¦«}|S)NzBAAI/bge-small-enÚdeviceÚcudaÚnormalize_embeddingsT)Ú
model_nameÚ model_kwargsÚ
encode_kwargsr)rrrÚ
embeddingss úWc:\Users\timmy_3aupohg\Downloads\Manaknight Projects\ds_aiindex\data_ingestion\utils.pyÚload_embedding_modelrs?Ø$€JؘfÐ%€LØ+¨TÐ2€MÝ%°LÐP]ðñô€Jð ÐócóL|dj}|dj}tddtd¬¦«}| |g¦«}g}t |¦«D]I\}}| ¦«}||d<t|j|¬¦«} | | ¦«ŒJ|S)Nré
F)Ú
chunk_sizeÚ
chunk_overlapÚlength_functionÚis_separator_regexÚpage©Ú page_contentÚmetadata) r#r$rÚlenÚcreate_documentsÚ enumerateÚcopyr Úappend)
2024-08-08 14:58:44 +01:00
ÚdocÚtextr$Ú
text_splitterÚdocsÚ documentsÚchunkÚ doc_metadataÚdocuments
rr&r&%Ø ˆqŒ6Ô €DØ1ŒvŒ€HÝØÝØ ð ñô€Mð × )¨4¨&Ñ 1€Dà€Iݘd‘O”Oð#‰ˆˆ5à—}’}‘ˆ Ø ˆ ݨÔ);ÀlÐØ×Ò˜Ñ Ðrcóš t|¦«}| ¦«}t|¦«}|S#td|¦«xYw©NzError loading -- )r Úloadr&Ú
ValueError)Ú
document_pathÚtxt_docr+r-s rÚload_txt_documentr9:sSð˜Ø|Š|‰~Œ~ˆå Ñ؈ øðÐ<¨]Ð=øøøó 36A
cóš t|¦«}| ¦«}t|¦«}|S#td|¦«xYwr4)r
r5r&r6)r7Údocx_docr+r-s rÚload_docx_documentr=EsSð! Ø}Š}‰Œˆå Ñ؈ øðÐ<¨]Ð=øøør:có| t|¦«}| ¦«}|S#td|¦«xYwr4)rÚload_and_splitr6)r7Úpdf_docÚpagess rÚload_pdf_documentrBQsJð˜,ˆØ×(ˆØˆ øðÐ<¨]Ð=øøøs$'§;cóþ| d¦«rt|¦«S| d¦«rt|¦«S| d¦«rt|¦«St d|¦«)Nz.pdfz.txtz.docxzUnsupported document type for )ÚendswithrBr9r=r6)r7s rÚ
load_documentrE[sˆØ×Ò˜fÑKÝ  Ñ× Ò  Ñ 'Ô 'ðKÝ  Ñ× Ò  Ñ (Ô (ðKÝ! ÐI¸JrcóÈt|d¦«5}tj| ¦«¦« d¦«cddd¦«S#1swxYwYdS)rbzutf-8)ÚopenÚbase64Ú b64encodeÚreadÚdecode)Ú
image_pathÚ
image_files rÚ encode_imagerOfs™Ý ˆJ˜ÑÔðÝ Ô ˜JŸOšOÑ .× 5°gÑ ?øøøð?s9AÁAÁAcó(t|¦«}ddtdœ} dddddœd d
d |id œgd
œgddœ}tjd||¬¦«}| ¦«dddd}n#t
$r }d}Yd}~nd}~wwxYw|S)Nzapplication/jsonzBearer )z Content-TypeÚ
Authorizationz gpt-4o-miniÚuserr+uWhat’s in this image?)Útyper+Ú image_urlÚurlzdata:image/jpeg;base64,)rSrT)ÚroleÚcontenti,)ÚmodelÚmessagesÚ
max_tokensz*https://api.openai.com/v1/chat/completions)ÚheadersÚjsonÚchoicesrÚmessagerWú$Image not good enough for processing)rOÚapi_keyÚrequestsÚpostr\Ú Exception)rMÚ base64_imager[ÚpayloadÚresponseÚes rÚ
process_imagerhksõ  
Ñ+€Lð,¥7Ðð€Gð
%+Ø$=ððð
2024-08-08 14:58:44 +01:00
%0à %Ð'OÀÐ'OÐ'Oð*ððð ðððð"ð'
ð
ˆõ,”=Ð!MÐW^ÐelÐmˆà—=’=‘?”? 9Ô-¨aÔÔ;¸IÔˆøÝ ðˆˆˆˆˆøøøøð:øøøð €OsžAA9Á9
BÂB
Â
BcóX| d¦«d d¦«d}d|i}t¦«}| |¦«}d d|D¦«¦«}|dkrt |¦«}|dkrt ||¬ ¦«}|gSdS)
/éÿÿÿÿú.rÚfilenameÚc3óvK|]4}| ¦«s| ¦«s|dk¯0|VŒ5dS)ú
N)ÚisalnumÚisspace)Ú.0rgs rú <genexpr>z(create_image_document.<locals>.<genexpr>Ÿs@èèРa§i¢i¡k¤kÐN°Q·Y²Y±[´[ÐNÀAÈÂIÀI1ÀIÀIÀIÀIÐNrr_r")Úsplitr
Úread_text_from_imageÚjoinrhr )rMÚ
2024-08-05 22:14:19 +01:00
image_namer$Útext_extractorr+r*s rÚcreate_image_documentrzà×! & *×Ñ5°aÔ8€Jà˜'€HÝ"_”_€NØ × .¨zÑ :€Dà
7Š7ÐN˜dÐ N€Dð ˆr‚z€zݘZÑð РD°8Ð<ˆàˆuˆ à ˆrÚdatacóT| d|¦«td¦«dS)vec-db/index/faiss_index_zEmbeddings saved)Ú
save_localÚprint)rÚkeys rÚsave_embedded_datar¯s4Ø ×ÒÐ9°CÐÐÑÔÐÐÐrcó:tjd||d¬¦«}|S)Nr}T)Úallow_dangerous_deserialization)rÚ
load_local)rr€Úembed_dbs rÚload_embedded_datar†³s(Ý
Ô
Ð?¸ÐmqÐ
2024-08-08 14:58:44 +01:00
r€(Ø €/rÚdirectory_pathcó:gd¢}gd¢}gd¢}gd¢}tj|¦«}g}g}g}|D]S} tj || ¦«}
|  d¦«d|  d¦«d} } | |vrot |
¦«}
| |
¦«| t|