Files
Anton_wireframe/app/services/__pycache__/llm_parser.cpython-312.pyc
T

190 lines
33 KiB
Plaintext
Raw Normal View History

Ë
Níäh<†ãóºddlZddlZddlZddlmZddlZddlmZddl m
Z
m Z m Z m
Z
mZmZddlmZddlmZddlmZmZddlmZGd „d
e«ZGd d «Zy)
éN)ÚOptional)Úget_db_session)Ú
CompanyMemberÚ CompanyTableÚ FundTableÚInvestorMemberÚ
InvestorTableÚ SectorTable)Ú
ChatOpenAI)Ú BaseModel)Ú CompanyDataÚ InvestorData)ÚSessioncó<eZdZUdZdZeed<dZeed<dZ eed<y) ÚCurrencyConversionz,Schema for LLM currency conversion responsesrÚ
amount_usdÚhighÚ
confidenceÚÚnotesN)
Ú__name__Ú
__module__Ú __qualname__Ú__doc__rÚintÚ__annotations__rÚstrr©óúN/home/oluwasanmi/Documents/Work/MKD/anton_wireframe/app/services/llm_parser.pyrrs"Ù€JÓØ€JÓØ€Eˆ3„Orrc ó‚eZdZdZdedeefdZdedeefdZ deded edeefd
Z
d deded ed edeef
d
Z de dedee
fdZde dedeefdZde dedefdZde dedefdZ d!de dedede
fdZ d"dej2dededeeezfdZd"dej6defdZd"dej6defdZy )#ÚInvestorProcessorcó*ttjd«ddd¬«|_|jj t
«|_|jj t«|_|jj t«|_
y)OPENROUTER_API_KEYzhttps://openrouter.ai/api/v1zopenai/gpt-4o-minir)Úapi_keyÚbase_urlÚmodelÚ temperature) r ÚosÚgetenvÚllmÚwith_structured_outputrÚcurrency_converter_llmrÚinvestor_structured_llmr
Úcompany_structured_llm)Úselfs r Ú__init__zInvestorProcessor.__init__spÜÜ—Iô 
ˆŒð'+§h¡h×&EÑ&EÜ ó'
ˆÔ(,§x¡x×'FÑ'FÄ|Ó'TˆÔ$Ø&*§h¡h×&EÑ&EÄkÓ&RˆÕ#rÚ
amount_strÚreturncƒóK|r
|dk(s|dk(ry d|d}|jj|«ƒd{}|jdkDr |jSdS7Œ!#t$r}t d|d|«Yd}~yd}~wwxYw­w) zØ
Use LLM to convert currency amounts to USD integers.
Handles formats like:
- "EUR 850,000,000"
- "$5M"
- "GBP 10-20 million"
- "Approximately EUR 100 million"
ú
Not AvailableÚ0NzáConvert this amount to USD as an integer (whole number, no decimals).
If it's a range, use the midpoint. If already in USD, just extract the number.
Remove all commas and convert millions/billions to actual numbers.
Amount: a

Examples:
- "EUR 850,000,000" -> 935000000 (assuming EUR to USD rate ~1.10)
- "$5M" -> 5000000
- "GBP 10-20 million" -> 18000000 (midpoint 15M * 1.20 rate)
- "Approximately EUR 100 million" -> 110000000
Return only the USD integer amount with current exchange rates.rzError converting currency 'z': )r-ÚainvokerÚ ExceptionÚprint)r0r2ÚpromptÚresultÚes r Úconvert_to_usdz InvestorProcessor.convert_to_usd/èø€ñ˜Z¨?Ò:¸jÈCÒ>OØð ð ð
ˆ ð@ð Cˆ ×6×>¸F×FˆFØ(.×(9Ñ(9¸AÒ(=6× GÀ4Ð Gùäò Ü Ð
¨|¸3¸q¸cÐ ûð üsJB‘$AµAAÁBÁAÁBÁAÁ A=Á"A8Á3BÁ8A=Á=BÚjson_strcó¾|rtj|«ry tj|«}|S#tj$r}t d|«Yd}~yd}~wwxYw)z„
Manually parse the JSON profile from the CSV.
Returns a cleaned dictionary with the investor profile data.
NzError parsing JSON: )ÚpdÚisnaÚjsonÚloadsÚJSONDecodeErrorr9)r0r>Úprofiler<s r Úparse_json_profilez$InvestorProcessor.parse_json_profilePsVñ
œ2Ÿ7™7 8Ôð ä—j‘j Ó*ˆG؈NøÜ× Ü Ð¨Ð ûð úsš1±AÁAÁAÚnameÚwebsiteÚ profile_jsoncƒó–K|j|«}|sy |r|j«nd|r|j«nd|jd«|jd«ddd|jdg«|jdg«|jdg«|jd«|jdg«|jd i«ggd
œ}|jd i«}|rnt|t«r^|jd «}|rK|d
k7rF|j |«ƒd{}||d<|jd«|d<|jd«|d<|jdg«} | D]{}
t|
t«sŒ|
jd«sŒ&|dj
|
jd«|
jd«|
jd«d|
jd«dœ«Œ}|jdg«} | D]"} t| t«sŒ| jd«d| jd«d| jd«| jd«| jdg«| jdg«| jdg«dœ }
| jd «}|r.|d
k7r)|j |«ƒd{}|rt|«|
d!<| jd"«}|r.|d
k7r)|j |«ƒd{}|rt|«|
d#<|dj
|
«Œ%|S7Œ7Œq7Œ2#t$r}td$|d%|«Yd}~yd}~wwxYw­w)&z…
Process investor profile from CSV data.
Manually extracts fields and uses LLM only for currency conversion.
headquartersÚinvestorDescriptionÚinvestmentThesisFocusÚportfolioHighlightsÚlinkedDocumentsÚresearcherNotesÚmissingImportantFieldsÚsources)rGrHrKÚ descriptionÚaumÚaum_as_of_dateÚaum_source_urlÚinvestment_thesisÚportfolio_highlightsÚlinked_documentsÚresearcher_notesÚmissing_important_fieldsrRÚ team_membersÚfundsÚoverallAssetsUnderManagementÚ aumAmountr5rTÚasOfDaterUÚ sourceUrlrVÚseniorLeadershiprGr\Útitle)rGrcÚroleÚemailÚ
source_urlr]ÚfundNameÚfundSizeSourceUrlÚsourceProviderÚgeographicFocusÚinvestmentStageFocusÚ sectorFocus) Ú fund_nameÚ fund_sizeÚfund_size_source_urlÚestimated_investment_sizerfÚsource_providerÚgeographic_focusÚinvestment_stage_focusÚ sector_focusÚfundSizernÚestimatedInvestmentSizerpz&Error processing investor profile for ú: )
rFÚstripÚgetÚ
isinstanceÚdictr=Úappendrr8r9)r0rGrHrIrEÚ
investor_dataÚaum_dataÚ
aum_amountÚaum_usdÚsenior_leadershipÚmemberr]ÚfundÚ fund_dataÚ
fund_size_strÚ
fund_size_usdÚ est_size_strÚ est_size_usdr<s r Úprocess_investor_profilez*InvestorProcessor.process_investor_profile`s8èø€ð×)¨,ÓÙØðQ ñ)-˜Ÿ
œ °$Ù.5˜7Ÿ=™=œ?¸4Ø '§ ¡ ¨NÓ ;Ø&Ÿ{™{Ð+@ÓØ"&Ø"&Ø%,§[¡[Ð1HÈ"Ó%MØ(/¯ © Ð4IÈ2Ó(NØ$+§K¡KÐ0AÀ2Ó$FØ$+§K¡KÐ0AÓ$BØ,3¯K©KÐ8PÐRTÓ,UØ"Ÿ;™; y°"Ó5Ø "ØñˆMð&—{{Ð#AÀ2ÓFˆœJ x´Ô%Ÿ\™\¨+Ó6
Ù Ò"?à$(×$7Ñ$7¸
Ó$C×CGØ+2M (Ø6>·l±lÀ:Ó6NMÐ"2Ñ3Ø6>·l±lÀ;Ó6OMÐ"2Ñ!(§ ¡ Ð,>ÀÓ CÐ Ø
ܘf¤dÕ·
±
¸6Õ0BØ! 1×8à$*§J¡J¨vÓ$6Ø%+§Z¡Z°Ó%8Ø$*§J¡J¨wÓ$7Ø%)Ø*0¯*©*°[Ó*Añ õð
ð—K‘K ¨Ó,ˆEØó
=ܘd¤DÕ)à%)§X¡X¨jÓ%9Ø%)Ø04·±Ð9LÓ0MØ59Ø&*§h¡h¨{Ó&;Ø+/¯8©8Ð4DÓ+EØ,0¯H©HÐ5FÈÓ,KØ26·(±(Ð;QÐSUÓ2VØ(,¯©°ÀÓ(Cñ
!%)§H¡H¨ZÓ$8¸/Ò)IØ.2×.AÑ.AÀ-Ó.P×(P˜
Ù(Ü58¸Ó5G˜I $(§8¡8Ð,EÓ#F ¸Ò(GØ-1×-@Ñ-@ÀÓ-N×'N˜ Ù'ÜEHÈÓEV˜IÐ&AÑ! *×1°)Ö<ð9
=ð< ðgDùðL)Qøð(Oùôò Ü Ð:¸4¸&ÀÀ1À#Ð ûð üs{M ˜DL"ÄLÄAL"Å5L"ÆBL"ÈB$L"Ê,LÊ-AL"Ë-L Ë.,L"ÌM ÌL"ÌL"Ì L"Ì" MÌ+MÌ<M ÍMÍM investor_namescƒóK|j|«}|sy |r|j«nd|r|j«nd|jd«|jd«|jd«dg|jdg«|jd«|jdg«|jd«|jd g«|jd
i«gd œ}|r`tj|«rKt |«j
d «Dcgc]}|j«Œ}}|Dcgc]}|sŒ|Œ c}|d
<|jdg«} | s|jdg«} | D]j}
t|
t«sŒ|
jd«sŒ&|dj|
jd«|
jd«|
jd«dœ«Œl|jdd«} | rdgd¢} | D][}
tj|
| tj«}|sŒ+ t|jd««}d|cxkrdkr nn||d<|SŒ]|Scc}wcc}w#t$rYŒuwxYw#t$r}t!d|d|«Yd}~yd}~wwxYw­w)zl
Process company profile from CSV data.
Manually extracts fields without using LLM.
2025-09-25 17:00:38 +01:00
companyDescriptionrjÚsectorDescriptionÚclientCategoriesÚproductDescriptionrOrPrQrR)rGrHrSÚlocationÚindustryÚ founded_yearÚkey_executivesÚclient_categoriesÚproduct_descriptionrYrZr[rRú,rŠÚ
keyExecutivesrbrGr“rcra)rGrcrfrSr)zfounded in (\d{4})zfounded (\d{4})uGegründet (\d{4})zestablished in (\d{4})z
since (\d{4})z \((\d{4})\)éilrz%Error processing company profile for rw)rFrxryr@ÚnotnarÚsplitrzr{r|ÚreÚsearchÚ
2025-09-25 17:00:38 +01:00
IGNORECASErÚgroupr8r9)r0rGrHrIrEÚ company_dataÚinvÚ investorsr“Ú exec_memberrSÚ
year_patternsÚpatternÚmatchÚyearr<s r Úprocess_company_profilez)InvestorProcessor.process_company_profile¾swèø€ð×)¨,ÓÙØðD ñ)-˜Ÿ
2025-09-25 17:00:38 +01:00
œ °$Ù.5˜7Ÿ=™=œ?¸&Ÿ{™{Ð+?Ó#ŸK™KÐ(9Ó#ŸK™KÐ(;Ó<Ø $Ø"$Ø%,§[¡[Ð1CÀRÓ%HØ'.§{¡{Ð3GÓ'HØ$+§K¡KÐ0AÀ2Ó$FØ$+§K¡KÐ0AÓ$BØ,3¯K©KÐ8PÐRTÓ,UØ"Ÿ;™; y°"Ó5Ø"$ñˆLñ$¤"§(¡(¨>Ô":ä47¸Ó4G×4MÑ4MÈcÓ4RÖS¨S˜SŸY™YS ÐSØAJÖ1R¸#Êc²#Ò1R Ð%Ÿ[™[¨¸=ˆ!à!(§¡Ð-?ÀÓ!Dà
 ܘk¬4Õ0°[·_±_ÀVÕ5LØ Ð!1Ñ2×9à$/§O¡O°FÓ$;Ø%0§_¡_°WÓ%=Ø*5¯/©/¸+Ó*Fñõð
ð'×*¨=¸=ˆò!
ð %ŸI™I g¨{¼B¿M¹MÓJð%Ü#& u§{¡{°1£~Ó#6˜# 3¨tÕ3Ø?C  ¨^Ñ <Ø %ð Ð øð  Ð ùòUTùÚ1RøôL%ûô
ò Ü Ð9¸$¸¸rÀ!ÀÐ ûð üsJ ˜C3I"Ä I Ä"I"Ä(IÄ0IÄ4AI"Å6I"ÆB I"È-IÉI"ÉJ ÉI"ÉJ É
I"É IÉI"ÉIÉI"É" JÉ+JÉ<J ÊJÊJ ÚdbrŸc
ó  |jt«j|d¬«j«}|r®|}|j d«xs |j
|_|j d«xs |j |_|j d«xs |j|_|j d«xs |j|_|j d«r|d|_ n€t|d|j d«|j d«|j d«|j d«|j d«¬«}|j|«|j«|r=|jt«j|j¬ «j«|j d
g«D]Y}t|j d«|j d «|j d «|j¬
«}|j|«Œ[|j dg«D]p}|jt«j|j!«¬«j«}|sŒG||j"vsŒV|j"j%|«Œr|S#t&$r(} t)d| «|j+«Yd} ~ yd} ~ wwxYw)z-Save manually parsed company data to databaserG©rGrHrrSrr)rGrHrrSrr)Ú
company_idr“rcrf)rGrdÚlinkedinr«z"Error saving company to database: N)ÚqueryrÚ filter_byÚfirstryrHrrSrrÚaddÚflushrÚidÚdeleter rxÚportfolio_companiesr|r8r9Úrollback)
r0Úexisting_companyÚcompanyÚ exec_datarÚ
investor_nameÚinvestorr<s
r Ú_save_parsed_company_to_dbz,InvestorProcessor._save_parsed_company_to_dbszð> ðœÓ&×0°lÀ6Ñ6JÐK×
ñ à*Ø".×"2Ñ"2°9Ó"=Ò"PÀÇÁØ#/×#3Ñ#3°JÓ#?Ò#SÀ7×CSÑCSÔ à ×$ J°w×7JÑ7JðÔ$0×#3Ñ#3°JÓ#?Ò#SÀ7×CSÑCSÔ Ø×# 3Ø+7¸Ñ+G% (×,¨YÓ-¨jÓ9Ø ,× 0Ñ 0°Ó ?Ø)×-¨jÓ9Ø!-×!1Ñ!1°.Ó!Aô
2025-09-25 17:00:38 +01:00
ðw”Ø
ñ ØœÓ1¸W¿Z¹ZÐH×)×-Ð.>ÀÓ
 Ü  &Ÿ]™]¨<Ó&Ÿz™zô ð v•ð
2025-09-25 17:00:38 +01:00
ð".×!1Ñ!1Ð2BÀBÓ!Gò
E
ð—HHœ]ÓY M×$7Ñ$7Ó$9U“Wðò
à h×&BÑ&BÒ ×4×;¸
EðˆNøäò Ü Ð6°q°cÐ K‰KŒMÜûð ús$I+JÉ.JÉ=JÊ K
Ê%KËK
r}cóÜ |jt«j|d¬«j«}|r³|}|j d«xs |j
|_|j d«xs |j |_|j d«xs |j|_|j d«xs |j|_|j d«xs |j|_ |j d«xs |j|_
|j d «xs |j|_ |j d
«xs |j|_ |j d «xs |j|_
|j d «xs |j|_|j d
«xs |j|_|j d«xs |j |_nðt|d|j d«|j d«|j d«|j d«|j d«|j d«|j d «|j d
«|j d «|j d «|j d
«|j d«¬«
}|j#|«|j%«|r=|jt&«j|j(¬«j+«|j dg«D]y}t'|j d«|j d«|j d«|j d«|j d«|j(¬«}|j#|«Œ{|r=|jt,«j|j(¬«j+«|j dg«D}t-|j(|j d«|j d«|j d«|j d«|j d«|j d«|j d«|j d«|j d«¬ «
}|j#|«Œ»|S#t.$r(} t1d!| «|j3«Yd"} ~ y"d"} ~ wwxYw)#z.Save manually parsed investor data to databaserGrHrKrSrTrUrVrWrXrYrZr[rR)
rGrHrKrSrTrUrVrWrXrYrZr[rR)Ú investor_idr\rdrcrerf)rGrdrcrerfr]rmrnrorprqrrrsrt)
rmrnrorprfrqrrrsrtz#Error saving investor to database: N)r­r ryrHrKrSrTrUrVrWrXrYrZr[rRrrr8r9)
r0r}Úexisting_investorrºÚ member_datarr„r<s
r Ú_save_parsed_investor_to_dbz-InvestorProcessor._save_parsed_investor_to_dbSsnðj ðœÓ'×1°}ÀVÑ7LÐM×
ò,Ø#0×#4Ñ#4°YÓ#?Ò#SÀ8×CSÑCSÔ à!×% ×9NÑ9NðÔ"×% L¸×8LÑ8LðÔ -×ÓG¸8¿<¹< à!×%Ð&6ÓR¸8×;RÑ;RðÔ"×%Ð&6ÓR¸8×;RÑ;RðÔ"×%Ð&9ÓX¸h×>XÑ>XðÔ"×%Ð&<Ó×Ô
"×%Ð&8ÓV¸X×=VÑ=VðÔ"×%Ð&8ÓV¸X×=VÑ=VðÔ"×%Ð&@Ó×Ô$1×#4Ñ#4°YÓ#?Ò#SÀ8×CSÑCSÕ ô& )×-¨iÓ8Ø!.×!2Ñ!2°>Ó!BØ -× 1Ñ 1°-Ó @Ø%×)¨%Ó0Ø#0×#4Ñ#4Ð5EÓ#FØ#0×#4Ñ#4Ð5EÓ#FØ&3×&7Ñ&7Ð8KÓ&LØ)6×):Ñ):Ð;QÓ)RØ%2×%6Ñ%6Ð7IÓ%JØ%2×%6Ñ%6Ð7IÓ%JØ-:×->Ñ->Ø.ð*×-¨iÓð" Ø
ñœÓ2¸x¿{¹{ÐÀÓ
 ܨӨÓ%Ÿ/™/¨'Ó%Ÿ/™/¨'Ó¨|Ó<Ø (§ ¡ ô
ðv•ð
2025-09-25 17:00:38 +01:00
ñœÓ-¸(¿+¹+ÐF×.¨w¸Ó
 Ü Ø (§ ¡ Ø'Ÿm™m¨KÓ'Ÿm™m¨KÓ8Ø)2¯©Ð7MÓ)NØ.7¯m©mØ/ð )Ÿ}™}¨\Ó:Ø$-§M¡MÐ2CÓ$DØ%.§]¡]Ð3EÓ%FØ+4¯=©=Ð9QÓ+RØ!*§¡¨~Ó!>ô
ðt• ð
ð"ˆOøäò Ü Ð°sÐ K‰KŒMÜûð úsR7R:Ò: S+ÓS&Ó&S+Ú sector_namecóê|jt«jtj|k(«j «}|s-t|¬«}|j |«|j
«|S)z%Get existing sector or create new onerª)r­r
ÚfilterrG)r0Úsectors r Ú_get_or_create_sectorz'InvestorProcessor._get_or_create_sectorÃsUàœ+Ó&×-¬k×.>Ñ.>À+Ñ.MÓN×VˆÙÜ  kÔ2ˆFØ F‰F6ŒNØ H‰HŒJ؈
rc ódt|jj|jj|jj|jj
|jj |jj|jj¬«}|j|«|j«|jD]J}t|j|j|j|j¬«}|j|«ŒL|j D]9}|j#||j«}|j j%|«Œ;|j&D]@}t)|ggg¬«} |j+|| d¬«}
|j&j%|
«ŒB|S)zSave investor data to database)rGrSrTÚcheck_size_lowerÚcheck_size_upperrrÚnumber_of_investments)rGrdre)ÚsectorsÚmembersr¡T)Úskip_investors)r rGrSrTrrr\rrdrer|r´r
Ú_save_company_to_db) r0r}r¿rÚ sector_datarÄÚcompany_schemarŸs r Ú_save_investor_to_dbz&InvestorProcessor._save_investor_to_dbÌô
×'×%×.××&×*×3×*×3×DØ"/×"8Ñ"8×"NÑ"Nô
ˆð ˆØ
Œ
ð ˆKÜ × ×!×$ŸK™Kô ˆ
F‰F6 ð)× ,ˆ×°K×4DÑ4DÓEˆFØ × Ñ × # FÕ 
,× 9ˆØØô ˆ ×.¨r°<ÐPTÐUˆ × Õ ˆrcóh|jt«jtj|jjk(«j «}|r|St|jj|jj |jj|jj|jj|jj¬«}|j|«|j«|jD]W}|jsŒt|j|j|j |j"¬«}|j|«ŒY|j$D]9}|j'||j«} |j$j)| «Œ;|s||j*D]m}
|jt,«jt,j|
jk(«j «} | sŒS|j*j)| «Œo|S)zSave company data to database)rGrrrSrrH)rGrd)r­rrGrrrSrrHrrdr|r ) r0r¿rr}s r z%InvestorProcessor._save_company_to_dbúð
H‰H”\Ó
‰V”L××)=Ñ)=×)BÑ)BÑ
‰U ñ
Ø Ø× ×

ˆð ˆØ
2025-09-25 17:00:38 +01:00
Œ
ð ˆKØ×ÓÜ$×(×$×&Ÿz™zô ð v•ð ð +ˆ×°K×4DÑ4DÓEˆ O‰O× "  
Ø!-×!7Ñ!7ò
@