Files
ds_fire_fighter/__pycache__/utils.cpython-311.pyc
T

175 lines
29 KiB
Plaintext
Raw Normal View History

2024-08-05 22:14:19 +01:00
§
2024-08-16 21:39:28 +01:00
F¨¿fs\ã óddlmZddlmZddlZddlmZddlmZddl m
2024-08-07 17:50:40 +01:00
Z
ddl m Z ddl m Z dd l
2024-08-16 17:37:28 +01:00
mZdd
lmZdd lmZdd lmZdd
2024-08-15 23:17:17 +01:00
lmZddlmZddlZddlZej ej ej ej e ¦«d¦«¦«¦«ddl!m"Z"ddl#Z#ddl$m%Z%m&Z&m'Z'ddl(m)Z)ddl*Z*ddl+Z+ddl,m-Z-ddl.Z.ddl/Z/ddl0Z0ddl1Z2ddl3m4Z4ddl5Z5ddl6Z6ddl7m8Z8ddl9Z9ddl:m;Z;e;¦«ej<d¦«a=ej<d¦«ej>d<e-ej<d¦«¬¦«Z?dZ@eddd¬¦«ZAdZBe"jCd¦«eB¦«ZDe"jCd ¦«dGd"„ZEd#„ZFd$„ZGd%„ZHd&„ZId'„ZJd(„ZKdHd*„ZLd+„ZMdId-„ZNdJd/„ZOdKd1„ZPd2eQd3eRfd4„ZSd5eTd6eQfd7„ZUdLd9„ZVeDd8fd:„ZWd;„ZXd<eQfd=„ZYd>eTd?eTd@eTfdA„ZZd>eTd?eTd@eTfdB„Z[dC„Z\dD„Z]dMdF„Z^dS)Né©ÚHuggingFaceBgeEmbeddings)ÚRecursiveCharacterTextSplitterN)ÚInMemoryDocstore)ÚFAISS)Ú PyPDFLoader)Ú
2024-08-16 17:37:28 +01:00
TextLoader)ÚDocx2txtLoader)ÚChatGroq)ÚPromptTemplate)ÚStrOutputParser)Úuuid4)ÚDocument)Ú
TextExtractorz..)Úlogger)ÚImageÚ ImageDrawÚ ImageFont)ÚThreadPoolExecutor)ÚGroq)Ú AudioSegment)Ú
VideoFileClip)Ú load_dotenvÚOPENAI_API_KEYÚ GROQ_API_KEY)Úapi_keyúwhisper-large-v3zllama3-8b-8192éd)Ú temperatureÚ
model_nameÚ
max_tokenscó>d}ddi}ddi}t|||¬¦«}|S)NzBAAI/bge-small-enÚdeviceÚcudaÚnormalize_embeddingsT)r Ú model_kwargsÚ
encode_kwargsr)r r&r'Ú
embeddingss úHc:\Users\timmy_3aupohg\Downloads\Manaknight Projects\ds_aiindex\utils.pyÚload_embedding_modelr*2s?Ø$€Jؘ%€LØ+¨TÐ2€MÝ%°LÐP]ðñô€Jð ÐózLoading the embedding modelzEmbedding model loadedÚtextcó~tjd¦«|dj}|dj}t ddt
2024-08-07 17:50:40 +01:00
d¬¦«}| |g¦«}g}t|¦«D]N\}}| ¦«} || d<|| d<t|j| ¬ ¦«}
2024-08-16 17:37:28 +01:00
| 
|
¦«ŒO|S)
NzCreating documents from textréèé
F)Ú
chunk_sizeÚ
chunk_overlapÚlength_functionÚis_separator_regexÚpageÚ file_type©Ú page_contentÚmetadata) rÚinfor7r8rÚlenÚcreate_documentsÚ enumerateÚcopyrÚappend) Údocr5r,r8Ú
2024-08-08 14:58:44 +01:00
text_splitterÚdocsÚ documentsÚchunkÚ doc_metadataÚdocuments r)r;r;CÝ
2024-08-16 17:37:28 +01:00
„KÐ ˆqŒ6Ô €DØ1ŒvŒ€HÝØÝØ ð ñô€Mð × )¨4¨&Ñ 1€Dà€Iݘd‘O”Oð#‰ˆˆ5à—}’}‘ˆ Ø ˆ Ø$-ˆ ¨Ô);ÀlÐØ×Ò˜Ñ Ðr+cóÈtjd|¦« t|¦«}| ¦«}t |¦«}|S#t d|¦«xYw)NzLoading text document from úError loading -- )rr9r Úloadr;Ú
ValueError)Ú
document_pathÚtxt_docr,rAs r)Úload_txt_documentrMZslÝ
„KÐ=¨mИØ|Š|‰~Œ~ˆå Ñ؈ øðÐ<¨]Ð=øøøó ™3A
Á
A!cóÈtjd|¦« t|¦«}| ¦«}t |¦«}|S#t d|¦«xYw)NzLoading docx document from rH)rr9r
rIr;rJ)rKÚdocx_docr,rAs r)Úload_docx_documentrQfslÝ
„KÐ=¨mÐ! Ø}Š}‰Œˆå Ñ؈ øðÐ<¨]Ð=øøørNcóªtjd|¦« t|¦«}| ¦«}|S#t d|¦«xYw)NzLoading pdf document from rH)rr9rÚload_and_splitrJ)rKÚpdf_docÚpagess r)Úload_pdf_documentrVsscÝ
„KÐ<¨]И,ˆØ×(ˆØˆ øðÐ<¨]Ð=øøøs ™$>¾Acóþ| d¦«rt|¦«S| d¦«rt|¦«S| d¦«rt|¦«St d|¦«).pdfú.txtú.docxzUnsupported document type for )ÚendswithrVrMrQrJ)rKs r)Ú
2024-08-16 17:37:28 +01:00
load_documentr\~sˆØ×Ò˜fÑKÝ  Ñ× Ò  Ñ 'Ô 'ðKÝ  Ñ× Ò  Ñ (Ô (ðKÝ! ÐI¸Jr+cóötjd|¦«t|d¦«5}tj| ¦«¦« d¦«cddd¦«S#1swxYwYdS)NzEncoding image Úrbzutf-8)rr9ÚopenÚbase64Ú b64encodeÚreadÚdecode)Ú
image_pathÚ
image_files r)Ú encode_imagerfŠÝ
„KÐ. 
ˆj˜$Ñ Ô ðC :ÝÔ 
§¢Ñ 1Ô 1Ñ2×9¸CðCðCðCñCôCðCðCðCðCðCðCøøøðCðCðCðCðCðCs¨9A.Á.A2Á5A2cóVtjd|¦«t|¦«}ddtdœ} ddddd œd
d d |id
œgdœgddœ}t jd||¬¦«}| ¦«dddd}n#t$r }d}Yd}~nd}~wwxYw|S)NzProcessing image zapplication/jsonzBearer )z Content-TypeÚ
2024-08-09 16:33:21 +01:00
Authorizationz gpt-4o-miniÚuserr,uWhat’s in this image?)Útyper,Ú image_urlÚurlzdata:image/jpeg;base64,)rjrk)ÚroleÚcontenti,)ÚmodelÚmessagesr!z*https://api.openai.com/v1/chat/completions)ÚheadersÚjsonÚchoicesrÚmessagernú$Image not good enough for processing)rr9rfrÚrequestsÚpostrrÚ Exception)rdÚ base64_imagerqÚpayloadÚresponseÚes r)Ú
process_imager}sÝ
„KÐ0 JÐ  
Ñ+€Lð,¥7Ðð€Gð
%+Ø$=ððð
%0à %Ð'OÀÐ'OÐ'Oð*ððð ðððð"ð'
ð
ˆõ,”=Ð!MÐW^ÐelÐmˆà—=’=‘?”? 9Ô-¨aÔÔ;¸IÔˆøÝ ðˆˆˆˆˆøøøøð:øøøð €OsµABÂ
B&ÂB!Â!B&Úimagecóˆtjd|¦«| d¦«d d¦«d}||dœ}t¦«}| |¦«}d d|D¦«¦«}|dkrt
|¦«}|d krt||¬
¦«}|gSdS) NzCreating image document from ú\éÿÿÿÿú.r)Úsourcer5Úc3óvK|]4}| ¦«s| ¦«s|dk¯0|VŒ5dS)ú
N)ÚisalnumÚisspace)Ú.0r|s r)ú <genexpr>z(create_image_document.<locals>.<genexpr>Æs@èèРa§i¢i¡k¤kÐN°Q·Y²Y±[´[ÐNÀAÈÂIÀI1ÀIÀIÀIÀIÐNr+rur6)rr9ÚsplitrÚread_text_from_imageÚjoinr}r)rdr5Ú
image_namer8Útext_extractorr,r?s r)Úcreate_image_documentr½Ý
„KÐ
Ð×! Ô+×1°#Ñ6°qÔ9€Jà$°9Ð=€HÝ"_”_€NØ × .¨zÑ :€Dà
7Š7ÐN˜dÐ N€Dð ˆr‚z€zݘZÑð РD°8Ð<ˆàˆuˆ à ˆr+cótjd|¦«t|d¦«5}tjj || ¦«fd¬¦«}ddd¦«n #1swxYwY|jS)NzTranscribing audio file r^r)Úfilero) rr9r_ÚclientÚaudioÚ translationsÚcreaterbr,)ÚfilepathrÚ translations r)Ú
audio_to_textr™×Ý
„KÐ5¨8Ð
2024-08-16 17:37:28 +01:00
ˆh˜Ñ Ô ð
 Ý”lÔ˜DŸIšI™KœKÐ
ô
ˆ ð
2024-08-09 16:33:21 +01:00
ð
2024-08-16 17:37:28 +01:00
ð
ñ
ô
ð
ð
ð
2024-08-09 16:33:21 +01:00
ð
ð
2024-08-15 21:18:38 +01:00
ð
2024-08-16 17:37:28 +01:00
øøøð
ð
ð
ð
ð
Ô Ðs¨;A/Á/A3Á6A3Tcó^tjd|d¦«|dzdz}tj|¦«}t |¦«}t
j |¦« d¦«d}|d}t
2024-08-09 16:33:21 +01:00
j  |¦«st j
2024-08-16 17:37:28 +01:00
|¦«g}||kr§tj ||z ¦«} tt| ¦«¦«D]r}
|
2024-08-09 16:33:21 +01:00
|z} t| |z|¦«} || | }
2024-08-16 17:37:28 +01:00
|d|d |
d
zd }|
 |d ¬
¦«| |¦«|rt%d|¦«ŒsnH|d|d}| |d ¬
¦«| |¦«|rt%d|¦«||fS)NzSplitting audio file z by durationé<r.rrÚ_chunksú_chunkéú.mp3Úmp3)Úformatz
Exporting z _chunk1.mp3)rr9rÚ from_filer:ÚosÚpathÚbasenamerÚexistsÚmakedirsÚmathÚceilÚrangeÚintÚminÚexportr>Úprint)Úaudio_file_pathÚchunk_duration_minutesÚ print_outputÚchunk_length_msr”Úaudio_duration_msÚ
base_filenameÚ chunk_folderÚ chunk_pathsÚ
2024-08-13 21:30:01 +01:00
num_chunksrCÚstart_msÚend_msrDÚchunk_filenames r)Úsplit_audio_by_durationr¼áÝ
2024-08-16 17:37:28 +01:00
„KÐÐ,¨rÑ1°DÑ8€Oõ
Ô "  3€EݘE™
œ
2024-08-13 21:30:01 +01:00
Ðõ”G×$ _Ñ;¸CÑÔC€MØ,€LÝ
2024-08-15 21:18:38 +01:00
Œ7>Š>˜ 
Œ €Kà˜?Ò”YÐ0°?Ñ
å•s˜:Ñ 5ˆ˜*ˆHݘ OÑ3Ð5FÑGˆFؘ( 6˜*ˆEØ ,ÐM¨}ÐMÀAÀaÁCÐMˆNØ LŠL˜°ˆLÑ × Ò ˜~Ñ ð
Ð3 4øðE¨=ÐØ
2024-08-13 21:30:01 +01:00
Š ^¨Eˆ Ñ×Ò˜>Ñ ð Ð/˜~Ð ˜Ð $r+r”c ó8tjd|¦«t||¦«\}}g}|D]Á}t|¦«}tj |¦«}tjd|¦«} | r8|   d¦«}
t|   d¦«¦«} n'tj  |¦«d}
d} | dz
|z} | |z}
2024-08-16 17:37:28 +01:00
t|
ttj|¦«¦«dz¦«}| dzdkr%t| ¦«d} t|
¦«d}
2024-08-15 23:17:17 +01:00
n|t!| ¦« d¦«\}}t!|
¦« d¦«\}}t|¦«d z}t|¦«d z}|d
|} |d
|}
|
| d |
|d œ}t%||¬
¦«}| |¦«ŒÃt)j|¦«t-jd¦«|S)NzTranscribing audio chunks from z(.*)_chunk(\d+)\.mp3$rŸéri`êz:00réú-)Ú timestampr5r6gš™™™™™É?)rr9r™ÚreÚsearchÚgroupr¬Úsplitextr­r:rÚstrrrr>ÚshutilÚrmtreeÚtimeÚsleep)r5rBÚ
2024-08-16 17:37:28 +01:00
chunk_pathÚ
2024-08-16 21:39:28 +01:00
transcriptr»ÚmatchrµÚ chunk_indexÚ start_minÚend_minÚactual_end_minÚ
start_min_intÚ
start_min_decÚ end_min_intÚ end_min_decÚ start_secÚend_secr8rFs r)Útranscribe_audio_chunksrÙ sIÝ
„KÐC°/ÐDå 7¸ÐI_Ñ `Ô `Ñ€L€IØ))#ˆ
å" .ˆ
õœ×)¨*ÑÝ” Ð2°NÑØ ð Ø!ŸKšK¨™NœNˆMݘeŸkšk¨!™nœnÑ-ˆKˆKõœG×,¨^Ñ<¸?ˆˆ! 1_Ð(>Ñ>ˆ ØÐ 6Ñ6ˆÝ˜W¥s­<Ô+AÀ/Ñ+RÔ+RÑ'SÔ'SÐW\Ñ'\Ñ^ˆð q‰=˜ Рݘy™>œ>Ð.ˆ˜W™œÐ*ˆGˆ,/¨y©>¬>×+?Ò+?ÀÑ+DÔ+DÑ (ˆM˜=Ý'*¨7¡|¤|×'9Ò'9¸#Ñ'>Ô'>Ñ $ˆK˜å˜*¨QÑ.ˆ˜Ñ*ˆ6¨9Ð6ˆ0 0ˆ(Ø )Ð5¨GÐðˆõ
2024-08-16 17:37:28 +01:00
¨¸hÐØ×Ò˜Ñ „MÔÐõ „JˆsO„O€Oà Ðr+écó(t|||¦«}|S)N))r5rBs r)Úcreate_audio_documentrÜEsÝÐ9OÐQZÑ[€IØ Ðr+Ú
video_pathÚ
time_intervalcó>tjd|¦«t|¦«}|j}| dd¦«}|j |¦«}tj  tj 
2024-08-16 21:39:28 +01:00
|¦«¦«d}tj  tj  |¦«|¦«}tj
2024-08-15 23:17:17 +01:00
|d¬¦«tj|¦«}t!|dd¦«}t#dt%|¦«|¦«D]} | }
t'| |zt%|¦«¦«} |
d | } tj  || d
2024-08-16 17:37:28 +01:00
¦«}
tj||
¬ ¦« |
2024-08-16 21:39:28 +01:00
d ¬
2024-08-16 17:37:28 +01:00
¦« ¦«ŒŒt/d|d¦«t1|dd¬¦«}tjd|¦«tj|¦«|S)NzPreprocessing video data from z.mp4r rÚexist_okr¢ÚdurationrÁzs.png)ÚssrŸ)ÚvframeszSnapshots saved in rçà?Úvideo)r5zDocuments created from video )rr9rÚreplacer”Úwrite_audiofiler¤rÚdirnamer¨ÚffmpegÚprobeÚfloatr«r­ÚinputÚoutputÚrunr¯Úremove)Ú
2024-08-16 21:39:28 +01:00
audio_pathÚ
video_nameÚ snapshot_dirrërCÚ
2024-08-15 23:17:17 +01:00
start_timeÚend_timeÚ interval_strÚ frame_imgrBs r)Úpreprocess_video_datarùJsÝ
2024-08-16 17:37:28 +01:00
„KÐÐ ˜ %€EðŒ~€Hð×# F¨FÑ3€JØ
Œ ×# JÑ/€Aõ×!¥"¤'×"2Ò"2°:Ñ">Ô">ÑÔB€Jõ”7—<¤§¢°
Ñ ;Ô ;À
2024-08-16 21:39:28 +01:00
¸_ÑM€LÝ„K  tÐ
ŒL˜Ñ $€EÝU˜8”_ 1€Hõ1•c˜(m”m 
2024-08-16 17:37:28 +01:00

2024-08-16 21:39:28 +01:00
ð
ˆØˆ
2024-08-16 17:37:28 +01:00
Ýq˜=Ñ(­#¨h©-¬-Ñ8ˆð1 xÐ õ”G—LL °,Ð/EÐ/EÐ/EÑFˆ õ
Ü
: 


ŠVI qˆ
2024-08-16 21:39:28 +01:00
2024-08-16 17:37:28 +01:00
ŠS‰UŒUˆUˆ
Ð
/  Ð
& ÐX_Ð`€IÝ
„KÐ
ЄIˆjÑÔÐØ Ðr+Ú
document_pageÚreturncó²tjd¦«tddg¬¦«}|tzt ¦«z}| d|i¦«}|S)NzSummarizing documentao<|begin_of_text|><|start_header_id|>system<|end_header_id|>
2024-08-16 21:39:28 +01:00
Create a short summary of the document based on the provided text.
2024-08-16 17:37:28 +01:00
Start with: This document is about...
2024-08-16 21:39:28 +01:00
<|eot_id|><|start_header_id|>user<|end_header_id|>
DOCUMENT: {document_page}
2024-08-15 23:17:17 +01:00
2024-08-16 17:37:28 +01:00
<|eot_id|><|start_header_id|>assistant<|end_header_id|>rú)ÚtemplateÚinput_variables)rr9r ÚGROQ_LLMr
Úinvoke)Úinitiator_promptÚinitiator_routerrîs r)Údoc_summarizerrskÝ
2024-08-16 21:39:28 +01:00
„KÐCð ñ ô Ðð(­(Ñ2µ_Ñ5FÔ5FÑØ