Files
Anton_wireframe/app/services/__pycache__/llm_parser.cpython-312.pyc
T

149 lines
26 KiB
Plaintext
Raw Normal View History

Ë
³¯ãh‘aãóÂddlZddlZddlZddlZddlmZddlZddlm Z ddl
m Z m Z m
Z
mZmZmZddlmZddlmZddlmZmZddlmZGd „d
e«ZGd d «Zy)
éN)ÚOptional)Úget_db_session)Ú
CompanyMemberÚ CompanyTableÚ FundTableÚInvestorMemberÚ
InvestorTableÚ SectorTable)Ú
ChatOpenAI)Ú BaseModel)Ú CompanyDataÚ InvestorData)ÚSessioncó<eZdZUdZdZeed<dZeed<dZ eed<y) ÚCurrencyConversionz,Schema for LLM currency conversion responsesrÚ
amount_usdÚhighÚ
confidenceÚÚnotesN)
Ú__name__Ú
__module__Ú __qualname__Ú__doc__rÚintÚ__annotations__rÚstrr©óúN/home/oluwasanmi/Documents/Work/MKD/anton_wireframe/app/services/llm_parser.pyrrs"Ù€JÓØ€JÓØ€Eˆ3„Orrc ó*eZdZdZdedeefdZdedeefdZ deded edeefd
Z
d e d edee fd
Z
d e dedefdZd e d ede fdZ dd e dededefdZ ddej.dededeeezfdZddej2defdZddefdZy)ÚInvestorProcessorcó*ttjd«ddd¬«|_|jj t
«|_|jj t«|_|jj t«|_
y)OPENROUTER_API_KEYzhttps://openrouter.ai/api/v1zopenai/gpt-4o-minir)Úapi_keyÚbase_urlÚmodelÚ temperature) r ÚosÚgetenvÚllmÚwith_structured_outputrÚcurrency_converter_llmrÚinvestor_structured_llmr
Úcompany_structured_llm)Úselfs r Ú__init__zInvestorProcessor.__init__ spÜÜ—Iô 
ˆŒð'+§h¡h×&EÑ&EÜ ó'
ˆÔ(,§x¡x×'FÑ'FÄ|Ó'TˆÔ$Ø&*§h¡h×&EÑ&EÄkÓ&RˆÕ#rÚ
amount_strÚreturncƒóK|r
|dk(s|dk(ry d|d}|jj|«ƒd{}|jdkDr |jSdS7Œ!#t$r}t d|d|«Yd}~yd}~wwxYw­w) zØ
Use LLM to convert currency amounts to USD integers.
Handles formats like:
- "EUR 850,000,000"
- "$5M"
- "GBP 10-20 million"
- "Approximately EUR 100 million"
ú
Not AvailableÚ0NzáConvert this amount to USD as an integer (whole number, no decimals).
If it's a range, use the midpoint. If already in USD, just extract the number.
Remove all commas and convert millions/billions to actual numbers.
Amount: a

Examples:
- "EUR 850,000,000" -> 935000000 (assuming EUR to USD rate ~1.10)
- "$5M" -> 5000000
- "GBP 10-20 million" -> 18000000 (midpoint 15M * 1.20 rate)
- "Approximately EUR 100 million" -> 110000000
Return only the USD integer amount with current exchange rates.rzError converting currency 'z': )r-ÚainvokerÚ ExceptionÚprint)r0r2ÚpromptÚresultÚes r Úconvert_to_usdz InvestorProcessor.convert_to_usd0èø€ñ˜Z¨?Ò:¸jÈCÒ>OØð ð ð
ˆ ð@ð Cˆ ×6×>¸F×FˆFØ(.×(9Ñ(9¸AÒ(=6× GÀ4Ð Gùäò Ü Ð
¨|¸3¸q¸cÐ ûð üsJB‘$AµAAÁBÁAÁBÁAÁ A=Á"A8Á3BÁ8A=Á=BÚjson_strcó¾|rtj|«ry tj|«}|S#tj$r}t d|«Yd}~yd}~wwxYw)z„
Manually parse the JSON profile from the CSV.
Returns a cleaned dictionary with the investor profile data.
NzError parsing JSON: )ÚpdÚisnaÚjsonÚloadsÚJSONDecodeErrorr9)r0r>Úprofiler<s r Úparse_json_profilez$InvestorProcessor.parse_json_profileQsVñ
œ2Ÿ7™7 8Ôð ä—j‘j Ó*ˆG؈NøÜ× Ü Ð¨Ð ûð úsš1±AÁAÁAÚnameÚwebsiteÚ profile_jsoncƒó–K|j|«}|sy |r|j«nd|r|j«nd|jd«|jd«ddd|jdg«|jdg«|jdg«|jd«|jdg«|jd i«ggd
œ}|jd i«}|rnt|t«r^|jd «}|rK|d
k7rF|j |«ƒd{}||d<|jd«|d<|jd«|d<|jdg«} | D]{}
t|
t«sŒ|
jd«sŒ&|dj
|
jd«|
jd«|
jd«d|
jd«dœ«Œ}|jdg«} | D]"} t| t«sŒ| jd«d| jd«d| jd«| jd«| jdg«| jdg«| jdg«dœ }
| jd «}|r.|d
k7r)|j |«ƒd{}|rt|«|
d!<| jd"«}|r.|d
k7r)|j |«ƒd{}|rt|«|
d#<|dj
|
«Œ%|S7Œ7Œq7Œ2#t$r}td$|d%|«Yd}~yd}~wwxYw­w)&z…
Process investor profile from CSV data.
Manually extracts fields and uses LLM only for currency conversion.
headquartersÚinvestorDescriptionÚinvestmentThesisFocusÚportfolioHighlightsÚlinkedDocumentsÚresearcherNotesÚmissingImportantFieldsÚsources)rGrHrKÚ descriptionÚaumÚaum_as_of_dateÚaum_source_urlÚinvestment_thesisÚportfolio_highlightsÚlinked_documentsÚresearcher_notesÚmissing_important_fieldsrRÚ team_membersÚfundsÚoverallAssetsUnderManagementÚ aumAmountr5rTÚasOfDaterUÚ sourceUrlrVÚseniorLeadershiprGr\Útitle)rGrcÚroleÚemailÚ
source_urlr]ÚfundNameÚfundSizeSourceUrlÚsourceProviderÚgeographicFocusÚinvestmentStageFocusÚ sectorFocus) Ú fund_nameÚ fund_sizeÚfund_size_source_urlÚestimated_investment_sizerfÚsource_providerÚgeographic_focusÚinvestment_stage_focusÚ sector_focusÚfundSizernÚestimatedInvestmentSizerpz&Error processing investor profile for ú: )
rFÚstripÚgetÚ
isinstanceÚdictr=Úappendrr8r9)r0rGrHrIrEÚ
investor_dataÚaum_dataÚ
2025-09-25 17:00:38 +01:00
aum_amountÚaum_usdÚsenior_leadershipÚmemberr]ÚfundÚ fund_dataÚ
fund_size_strÚ
fund_size_usdÚ est_size_strÚ est_size_usdr<s r Úprocess_investor_profilez*InvestorProcessor.process_investor_profileas8èø€ð×)¨,ÓÙØðQ ñ)-˜Ÿ
2025-09-25 17:00:38 +01:00
œ °$Ù.5˜7Ÿ=™=œ?¸4Ø '§ ¡ ¨NÓ ;Ø&Ÿ{™{Ð+@ÓØ"&Ø"&Ø%,§[¡[Ð1HÈ"Ó%MØ(/¯ © Ð4IÈ2Ó(NØ$+§K¡KÐ0AÀ2Ó$FØ$+§K¡KÐ0AÓ$BØ,3¯K©KÐ8PÐRTÓ,UØ"Ÿ;™; y°"Ó5Ø "ØñˆMð&—{{Ð#AÀ2ÓFˆœJ x´Ô%Ÿ\™\¨+Ó6
Ù Ò"?à$(×$7Ñ$7¸
Ó$C×CGØ+2M (Ø6>·l±lÀ:Ó6NMÐ"2Ñ3Ø6>·l±lÀ;Ó6OMÐ"2Ñ!(§ ¡ Ð,>ÀÓ CÐ Ø
ܘf¤dÕ·
2025-09-25 17:00:38 +01:00
±
¸6Õ0BØ! 1×8à$*§J¡J¨vÓ$6Ø%+§Z¡Z°Ó%8Ø$*§J¡J¨wÓ$7Ø%)Ø*0¯*©*°[Ó*Añ õð
2025-09-25 17:00:38 +01:00
ð—K‘K ¨Ó,ˆEØó
=ܘd¤DÕ)à%)§X¡X¨jÓ%9Ø%)Ø04·±Ð9LÓ0MØ59Ø&*§h¡h¨{Ó&;Ø+/¯8©8Ð4DÓ+EØ,0¯H©HÐ5FÈÓ,KØ26·(±(Ð;QÐSUÓ2VØ(,¯©°ÀÓ(Cñ
!%)§H¡H¨ZÓ$8¸/Ò)IØ.2×.AÑ.AÀ-Ó.P×(P˜
Ù(Ü58¸Ó5G˜I $(§8¡8Ð,EÓ#F ¸Ò(GØ-1×-@Ñ-@ÀÓ-N×'N˜ Ù'ÜEHÈÓEV˜IÐ&AÑ! *×1°)Ö<ð9
=ð< ðgDùðL)Qøð(Oùôò Ü Ð:¸4¸&ÀÀ1À#Ð ûð üs{M ˜DL"ÄLÄAL"Å5L"ÆBL"ÈB$L"Ê,LÊ-AL"Ë-L Ë.,L"ÌM ÌL"ÌL"Ì L"Ì" MÌ+MÌ<M ÍMÍM Údbr}cóÜ |jt«j|d¬«j«}|r³|}|j d«xs |j
|_|j d«xs |j |_|j d«xs |j|_|j d«xs |j|_|j d«xs |j|_ |j d«xs |j|_
|j d «xs |j|_ |j d
«xs |j|_ |j d «xs |j|_
|j d «xs |j|_|j d
«xs |j|_|j d«xs |j |_nðt|d|j d«|j d«|j d«|j d«|j d«|j d«|j d «|j d
«|j d «|j d «|j d
«|j d«¬«
}|j#|«|j%«|r=|jt&«j|j(¬«j+«|j dg«D]y}t'|j d«|j d«|j d«|j d«|j d«|j(¬«}|j#|«Œ{|r=|jt,«j|j(¬«j+«|j dg«D}t-|j(|j d«|j d«|j d«|j d«|j d«|j d«|j d«|j d«|j d«¬ «
}|j#|«Œ»|S#t.$r(} t1d!| «|j3«Yd"} ~ y"d"} ~ wwxYw)#z.Save manually parsed investor data to databaserG©rGrHrKrSrTrUrVrWrXrYrZr[rR)
rGrHrKrSrTrUrVrWrXrYrZr[rR)Ú investor_idr\rdrcrerf)rGrdrcrerfrr]rmrnrorprqrrrsrt)
rrmrnrorprfrqrrrsrtz#Error saving investor to database: N)Úqueryr Ú filter_byÚfirstryrHrKrSrTrUrVrWrXrYrZr[rRÚaddÚflushrÚidÚdeleterr8r9Úrollback)
2025-09-25 17:00:38 +01:00
r0r}Úexisting_investorÚinvestorÚ member_datarr„r<s
r Ú_save_parsed_investor_to_dbz-InvestorProcessor._save_parsed_investor_to_db¿sAðT ðœÓ 
¨fÑ 5Óð
2025-09-25 17:00:38 +01:00
ò ,Ø#0×#4Ñ#4°YÓ#?Ò#SÀ8×CSÑCSÔ Ø(5×(9Ñ(9¸.Ó(IÒ(bÈX×MbÑMbÔ%Ø'4×'8Ñ'8¸Ó'GÒ'_È8×K_ÑK_Ô,×ÓG¸8¿<¹< Ø*7×*;Ñ*;Ð<LÓ*MÒ*hÐQY×QhÑQhÔ'Ø*7×*;Ñ*;Ð<LÓ*MÒ*hÐQY×QhÑQhÔ'Ø-:×->Ñ->Ð?RÓ-SÒ-qÐW_×WqÑWqÔ*Ø0=×0AÑ0AÐBXÓ0YÒ0zÐ]e×]zÑ]zÔ-Ø,9×,=Ñ,=Ð>PÓ,QÒ,nÐU]×UnÑUnÔ)Ø,9×,=Ñ,=Ð>PÓ,QÒ,nÐU]×UnÑUnÔ)Ø4A×4EÑ4EÐF`Ó4aò5GÐem÷fGñfGÔ1Ø#0×#4Ñ#4°YÓ#?Ò#SÀ8×CSÑCSÕ ô& )×-¨iÓ8Ø!.×!2Ñ!2°>Ó!BØ -× 1Ñ 1°-Ó @Ø%×)¨%Ó0Ø#0×#4Ñ#4Ð5EÓ#FØ#0×#4Ñ#4Ð5EÓ#FØ&3×&7Ñ&7Ð8KÓ&LØ)6×):Ñ):Ð;QÓ)RØ%2×%6Ñ%6Ð7IÓ%JØ%2×%6Ñ%6Ð7IÓ%JØ-:×->Ñ->Ð?YÓ-ZØ)×-¨iÓð Ø
ñœÓ2¸x¿{¹{ÐÀÓ
 ܨӨÓ%Ÿ/™/¨'Ó%Ÿ/™/¨'Ó¨|Ó<Ø (§ ¡ ô
ðv•ð
ñœÓ-¸(¿+¹+ÐF×.¨w¸Ó
 Ü Ø (§ ¡ Ø'Ÿm™m¨KÓ'Ÿm™m¨KÓ8Ø)2¯©Ð7MÓ)NØ.7¯m©mÐ<WÓ.XØ(Ÿ}™}¨\Ó:Ø$-§M¡MÐ2CÓ$DØ%.§]¡]Ð3EÓ%FØ+4¯=©=Ð9QÓ+RØ!*§¡¨~Ó!>ô ðt• ð
ðˆOøäò Ü Ð°sÐ K‰KŒMÜûð úsR7R:Ò: S+ÓS&Ó&S+Ú sector_namecóê|jt«jtj|k(«j «}|s-t|¬«}|j |«|j
«|S)z%Get existing sector or create new onerŒ)r
ÚfilterrGrrr)r0Úsectors r Ú_get_or_create_sectorz'InvestorProcessor._get_or_create_sectorsUàœ+Ó&×-¬k×.>Ñ.>À+Ñ.MÓN×VˆÙÜ  kÔ2ˆFØ F‰F6ŒNØ H‰HŒJ؈
rc
óŽt|jj|jj|jj|jj
|jj |jj|jj|jj¬«}|j|«|j«|jD]J}t|j|j|j|j ¬«}|j|«ŒL|j"D]9}|j%||j«}|j"j'|«Œ;|j(D]@}t+|ggg¬«} |j-|| d¬«}
|j(j'|
«ŒB|S)zSave investor data to database)rGrSrTÚcheck_size_lowerÚcheck_size_upperrrÚ stage_focusÚnumber_of_investments)rGrdrer)ÚcompanyÚsectorsÚmembersÚ investorsT)Úskip_investors)r r—rGrSrTr rrrrr\rrdrer“r|Úportfolio_companiesr
Ú_save_company_to_db) r0r}r—r˜rÚ sector_datarÚcompany_schemaÚ company_datar¤s r Ú_save_investor_to_dbz&InvestorProcessor._save_investor_to_db"sô
×'×%×.××&×*×3×%×.×:Ø"/×"8Ñ"8×"NÑ"Nô 
ˆð ˆØ
Œ
ð ˆKÜ × ×!×$ŸK™Kô ˆ
F‰F6 ð)× ,ˆ×°K×4DÑ4DÓEˆFØ × Ñ × # FÕ 
,× 9ˆØØô ˆ ×.¨r°<ÐPTÐUˆ × Õ ˆrr­cóh|jt«jtj|jjk(«j «}|r|St|jj|jj |jj|jj|jj|jj¬«}|j|«|j«|jD]W}|jsŒt|j|j|j |j"¬«}|j|«ŒY|j$D]9}|j'||j«} |j$j)| «Œ;|s||j*D]m}
|jt,«jt,j|
jk(«j «} | sŒS|j*j)| «Œo|S)zSave company data to database)rGÚindustryÚlocationrSÚ founded_yearrH)rGÚlinkedinrdÚ
2025-09-25 17:00:38 +01:00
company_id)rrGrrSrHrrrrdr“r|r ) r0r­Úexisting_companyr¤r˜rrr}rs r z%InvestorProcessor._save_company_to_dbQð
H‰H”\Ó
‰V”L××)=Ñ)=×)BÑ)BÑ
2025-09-25 17:00:38 +01:00
‰U ñ
Ø Ø× ×
2025-09-25 17:00:38 +01:00

ˆð ˆØ
Œ
ð ˆKØ×ÓÜ$×(×$×&Ÿz™zô ð v•ð ð +ˆ×°K×4DÑ4DÓEˆ O‰O× "  
2025-09-25 17:00:38 +01:00
Ø!-×!7Ñ!7ò
@
ð—H‘Hœ]Ó‘VœM×.°-×2DÑ2DÑU“Wð
×%×,Ð->Õ
@ðˆrÚrowÚrow_idxÚ is_investorc ƒóÎKi}|j«D]s\}}tj|«sŒt|«j dd«j dd«j dd«}dj d|D««}|||<Œudj |j«Dcgc] \}}|d|Œc}}«} t
d |d
zd «|r$|jj|«ƒd {} n#|jj|«ƒd {} | r| j«Sy cc}}w7ŒA7Œ#t$r}