Files
Anton_wireframe/app/services/__pycache__/llm_parser.cpython-312.pyc
T

223 lines
35 KiB
Plaintext
Raw Normal View History

Ë
Tæh÷ŒãóÔddlZddlZddlZddlmZddlZddlmZddl m
Z
m Z m Z m
Z
mZmZmZddlmZddlmZddlmZmZddlmZGd „d
e«ZGd d e«ZGd
d«Zy)éN)ÚOptional)Úget_db_session)Ú
CompanyMemberÚ CompanyTableÚ FundTableÚInvestmentStageTableÚInvestorMemberÚ
InvestorTableÚ SectorTable)Ú
ChatOpenAI)Ú BaseModel)Ú CompanyDataÚ InvestorData)ÚSessioncó<eZdZUdZdZeed<dZeed<dZ eed<y) ÚCurrencyConversionz,Schema for LLM currency conversion responsesrÚ
amount_usdÚhighÚ
confidenceÚÚnotesN)
Ú__name__Ú
__module__Ú __qualname__Ú__doc__rÚintÚ__annotations__rÚstrr©óúN/home/oluwasanmi/Documents/Work/MKD/anton_wireframe/app/services/llm_parser.pyrrs"Ù€JÓØ€JÓØ€Eˆ3„Or rcóJeZdZUdZdZeed<dZeed<dZe ed<dZ
e ed<y )
ÚCheckSizeRangezFSchema for LLM check size range parsing from estimated investment sizerÚlower_bound_usdÚupper_bound_usdrrrrN) rrrrr$rrr%rrrrr r!r#r#s,Ù€OØ€OØ€JÓØ€Eˆ3„Or r#c ó¼eZdZdZdedeefdZdedeeeeeffdZ dedee
fdZ d ed
ed edee
fd Z d$d ed
ed ededee
f
dZ
dede
deefdZdede
deefdZdededefdZdededefdZdededefdZ d%dedededefdZ d&dej:dededeeezfdZd&d ej>d!efd"„Z d&d ej>d!efd#„Z!y
)'ÚInvestorProcessorcórttjd«ddd¬«|_|jj t
«|_|jj t«|_|jj t«|_
|jj t«|_ y)OPENROUTER_API_KEYzhttps://openrouter.ai/api/v1zopenai/gpt-4o-minir)Úapi_keyÚbase_urlÚmodelÚ temperature)
r ÚosÚgetenvÚllmÚwith_structured_outputrÚcurrency_converter_llmr#Úcheck_size_parser_llmrÚinvestor_structured_llmrÚcompany_structured_llm)Úselfs r!Ú__init__zInvestorProcessor.__init__)s‰ÜÜ—Iô 
ˆŒð'+§h¡h×&EÑ&EÜ ó'
ˆÔ&*§X¡X×%DÑ%DÄ^Ó%TˆÔ(,§x¡x×'FÑ'FÄ|Ó'TˆÔ$Ø&*§h¡h×&EÑ&EÄkÓ&RˆÕ#r Ú
amount_strÚreturncƒóK|r
|dk(s|dk(ry d|d}|jj|«ƒd{}|jdkDr |jSdS7Œ!#t$r}t d|d|«Yd}~yd}~wwxYw­w) zØ
Use LLM to convert currency amounts to USD integers.
Handles formats like:
- "EUR 850,000,000"
- "$5M"
- "GBP 10-20 million"
- "Approximately EUR 100 million"
ú
Not AvailableÚ0NzáConvert this amount to USD as an integer (whole number, no decimals).
If it's a range, use the midpoint. If already in USD, just extract the number.
Remove all commas and convert millions/billions to actual numbers.
Amount: a

Examples:
- "EUR 850,000,000" -> 935000000 (assuming EUR to USD rate ~1.10)
- "$5M" -> 5000000
- "GBP 10-20 million" -> 18000000 (midpoint 15M * 1.20 rate)
- "Approximately EUR 100 million" -> 110000000
Return only the USD integer amount with current exchange rates.rzError converting currency 'ú': )r2ÚainvokerÚ ExceptionÚprint)r6r8ÚpromptÚresultÚes r!Úconvert_to_usdz InvestorProcessor.convert_to_usd;èø€ñ˜Z¨?Ò:¸jÈCÒ>OØð ð ð
ˆ ð@ð Cˆ ×6×>¸F×FˆFØ(.×(9Ñ(9¸AÒ(=6× GÀ4Ð Gùäò Ü Ð
¨|¸3¸q¸cÐ ûð üsJB‘$AµAAÁBÁAÁBÁAÁ A=Á"A8Á3BÁ8A=Á=BÚestimated_investment_strcƒóFK|r
|dk(s|dk(ry d|d}|jj|«ƒd{}|jdkDr |jnd}|jdkDr |jnd}||fS7ŒB#t$r}t d|d |«Yd}~yd}~wwxYw­w)
aF
Use LLM to parse check size range from estimated investment size string.
Returns tuple of (lower_bound_usd, upper_bound_usd).
Handles formats like:
- "EUR 1,000 to 2,000"
- "$100K-$500K"
- "Between $1M and $5M"
- "Up to EUR 10 million"
- "$2M typical"
r;r<)NNz_Parse this check size/investment range into lower and upper bounds in USD as integers.
Input: a«
Instructions:
- If it's a range (e.g., "EUR 1M to 5M"), extract both bounds
- If it's a single amount (e.g., "$2M typical"), use it as both lower and upper
- If it says "up to X", use 0 as lower and X as upper
- Convert all currencies to USD using current exchange rates
- Return integers (whole numbers, no decimals)
Examples:
- "EUR 1,000 to 2,000" -> lower: 1100, upper: 2200
- "$100K-$500K" -> lower: 100000, upper: 500000
- "Between $1M and $5M" -> lower: 1000000, upper: 5000000
- "Up to EUR 10 million" -> lower: 0, upper: 11000000
- "$2M typical" -> lower: 2000000, upper: 2000000
- "GBP 500K-2M" -> lower: 600000, upper: 2400000
Return the lower and upper bounds in USD.Nrz Error parsing check size range 'r=)r3r>r$r%r?r@)r6rErArBÚlowerÚupperrCs r!Úparse_check_size_rangez(InvestorProcessor.parse_check_size_range\èø€ñ'¨?Ò'¨3Òð ðà Ð"*ð-ˆFð* ×5×=¸E×EˆFØ.4×.DÑ.DÀqÒ.HF×*ÈdˆEØ.4×.DÑ.DÀqÒ.HF×*Èdˆ˜% ðFùôò Ü Ð4Ð5MÐ4NÈcÐRSÐQTÐ ûð üs?B!$A:µA8¶AA:Á7B!Á8A:Á: BÂBÂB!ÂBÂB!Újson_strcó¾|rtj|«ry tj|«}|S#tj$r}t d|«Yd}~yd}~wwxYw)z„
Manually parse the JSON profile from the CSV.
Returns a cleaned dictionary with the investor profile data.
NzError parsing JSON: )ÚpdÚisnaÚjsonÚloadsÚJSONDecodeErrorr@)r6rJÚprofilerCs r!Úparse_json_profilez$InvestorProcessor.parse_json_profilesVñ
œ2Ÿ7™7 8Ôð ä—j‘j Ó*ˆG؈NøÜ× Ü Ð¨Ð ûð úsš1±AÁAÁAÚnameÚwebsiteÚ profile_jsoncƒóØK|j|«}|sy |r|j«nd|r|j«nd|jd«|jd«ddd|jdg«|jdg«|jdg«|jd«|jdg«|jd i«ggd
œ}|jd i«}|rnt|t«r^|jd «}|rK|d
k7rF|j |«ƒd{}||d<|jd«|d<|jd«|d<|jdg«} | D]{}
t|
t«sŒ|
jd«sŒ&|dj
|
jd«|
jd«|
jd«d|
jd«dœ«Œ}|jdg«} | D]C} t| t«sŒ| jd«d| jd«dd| jd«| jd«d| jdg«| jdg«dœ
}
| jdg«}|r$t|t«rd j|«|
d!<| jd"«}|r%|d
k7r |j |«ƒd{}|r||
d#<| jd$«}|r/|d
k7r*|j|«ƒd{\}}|||
d%<|||
d&<|dj
|
«ŒF|S7Œ!7Œi7Œ3#t$r}td'|d(|«Yd}~yd}~wwxYw­w))z…
Process investor profile from CSV data.
Manually extracts fields and uses LLM only for currency conversion.
headquartersÚinvestorDescriptionÚinvestmentThesisFocusÚportfolioHighlightsÚlinkedDocumentsÚresearcherNotesÚmissingImportantFieldsÚsources)rSrTrWÚ descriptionÚaumÚaum_as_of_dateÚaum_source_urlÚinvestment_thesisÚportfolio_highlightsÚlinked_documentsÚresearcher_notesÚmissing_important_fieldsr^Ú team_membersÚfundsÚoverallAssetsUnderManagementÚ aumAmountr;r`ÚasOfDateraÚ sourceUrlrbÚseniorLeadershiprSrhÚtitle)rSroÚroleÚemailÚ
source_urlriÚfundNameÚfundSizeSourceUrlÚsourceProviderÚinvestmentStageFocusÚ sectorFocus)
Ú fund_nameÚ fund_sizeÚfund_size_source_urlÚcheck_size_lowerÚcheck_size_upperrrÚsource_providerÚgeographic_focusÚinvestment_stage_namesÚ sector_namesÚgeographicFocusú, r~ÚfundSizeryÚestimatedInvestmentSizer{r|z&Error processing investor profile for ú: ) rRÚstripÚgetÚ
isinstanceÚdictrDÚappendÚlistÚjoinrIr?r@)r6rSrTrUrQÚ
investor_dataÚaum_dataÚ
aum_amountÚaum_usdÚsenior_leadershipÚmemberriÚfundÚ fund_dataÚ geo_focusÚ
fund_size_strÚ
fund_size_usdÚ est_size_strÚ check_lowerÚ check_upperrCs r!Úprocess_investor_profilez*InvestorProcessor.process_investor_profileŸsyèø€ð×)¨,ÓÙØð[ ñ)-˜Ÿ
œ °$Ù.5˜7Ÿ=™=œ?¸4Ø '§ ¡ ¨NÓ ;Ø&Ÿ{™{Ð+@ÓØ"&Ø"&Ø%,§[¡[Ð1HÈ"Ó%MØ(/¯ © Ð4IÈ2Ó(NØ$+§K¡KÐ0AÀ2Ó$FØ$+§K¡KÐ0AÓ$BØ,3¯K©KÐ8PÐRTÓ,UØ"Ÿ;™; y°"Ó5Ø "ØñˆMð&—{{Ð#AÀ2ÓFˆœJ x´Ô%Ÿ\™\¨+Ó6
Ù Ò"?à$(×$7Ñ$7¸
Ó$C×CGØ+2M (Ø6>·l±lÀ:Ó6NMÐ"2Ñ3Ø6>·l±lÀ;Ó6OMÐ"2Ñ!(§ ¡ Ð,>ÀÓ CÐ Ø
ܘf¤dÕ·
±
¸6Õ0BØ! 1×8à$*§J¡J¨vÓ$6Ø%+§Z¡Z°Ó%8Ø$*§J¡J¨wÓ$7Ø%)Ø*0¯*©*°[Ó*Añ õð
ð—K‘K ¨Ó,ˆó&
=ܘd¤DÕ)à%)§X¡X¨jÓ%9Ø%)Ø04·±Ð9LÓ0MØ,0Ø,0Ø&*§h¡h¨{Ó&;Ø+/¯8©8Ð4DÓ+EØ,0Ø26·(±(Ð;QÐSUÓ2VØ(,¯©°ÀÓ(Cñ !!%§¡Ð):¸BÓ ? ¤Z° ¼4Ô%@Ø8<¿ ¹ À)Ó8L˜ Ð"4Ñ%)§H¡H¨ZÓ$8¸/Ò)IØ.2×.AÑ.AÀ-Ó.P×(P˜
Ù(Ø5B˜I $(§8¡8Ð,EÓ#F ¸Ò(GØ9=×9TÑ9TØ:÷4Ñ0˜  2Ø<G˜IÐ&8Ñ2Ø<G˜IÐ&8Ñ! *×1°)Ö<ðM&
=ðP ð{DùðX)Qøð4ùôò Ü Ð:¸4¸&ÀÀ1À#Ð ûð üszM*˜DMÄL<ÄAMÅ5MÆBMÈC
MËL?Ë7MÌ
MÌ-MÌ;M*Ì<MÌ?MÍMÍ M'Í M"ÍM*Í"M'Í'M*investor_namescƒóîK|j|«}|sy |r|j«nddgdœ}|jdg«}|s|jdg«}|D]j}t|t«sŒ|jd«sŒ&|dj |jd«|jd«|jd«d œ«Œl|jd
d «} | rdgd ¢}
|
D][} t
2025-09-25 17:00:38 +01:00
j| | t j«} | sŒ+ t| jd
««}
d|
2025-09-25 17:00:38 +01:00
cxkrdkr nn|
|d<|SŒ]|S#t$rYŒkwxYw#t$r}td|d|«Yd}~yd}~wwxYw­w)
Process company profile from CSV data.
2025-09-25 17:00:38 +01:00
Only extracts founded_year and key_executives - rest is in base database.
N)rSÚ founded_yearÚkey_executivesÚ
keyExecutivesrnrSrorm)rSrorrÚcompanyDescriptionr)zfounded in (\d{4})zfounded (\d{4})uGegründet (\d{4})zestablished in (\d{4})z
since (\d{4})z \((\d{4})\)éilz%Error processing company profile for r…)
rRr†r‡rˆr‰ÚreÚsearchÚ
IGNORECASErÚgroupr?r@)r6rSrTrUrQÚ company_datarŸÚ exec_memberr_Ú
2025-09-25 17:00:38 +01:00
year_patternsÚpatternÚmatchÚyearrCs r!Úprocess_company_profilez)InvestorProcessor.process_company_profileèø€ð×)¨,ÓÙØð3 ñ)-˜Ÿ
œ °$Ø $Ø"$ñˆ%Ÿ[™[¨¸=ˆ!à!(§¡Ð-?ÀÓ!Dà
 ܘk¬4Õ0°[·_±_ÀVÕ5LØ Ð!1Ñ2×9à$/§O¡O°FÓ$;Ø%0§_¡_°WÓ%=Ø*5¯/©/¸+Ó*Fñõð
ð"Ÿ+™+Ð&:¸?ˆò!
ð %ŸI™I g¨{¼B¿M¹MÓJð%Ü#& u§{¡{°1£~Ó#6˜# 3¨tÕ3Ø?C  ¨^Ñ <Ø %ð Ð øð  Ð øô%ûô
ò Ü Ð9¸$¸¸rÀ!ÀÐ ûð üslE5˜AEÁ,EÁ>B EÄ -D?Ä8EÄ:E5Ä;EÄ>E5Ä? E ÅEÅ
E Å EÅ E2ÅE-Å(E5Å-E2Å2E5Údbr§có˜ |jt«j|d¬«j«}|r|}|j d«r|d|_nt
d|dd«y|jt«j|j¬«j«|j dg«D]Y}t|j d«|j d «|j d
«|j¬ «}|j|«Œ[|S#t$r(}t
2025-09-25 17:00:38 +01:00
d |«|j«Yd}~yd}~wwxYw)
z\Save manually parsed company data to database - only updates founded_year and key_executivesrS©rSuâš ï¸ Company 'z'' not found in base database - skippingN)Ú
company_idrŸrorr)rSrpÚlinkedinr±z"Error saving company to database: )
2025-09-25 17:00:38 +01:00
ÚqueryrÚ filter_byÚfirstr‡r@rÚidÚdeleteÚaddr?Úrollback)r6Úexisting_companyÚcompanyÚ exec_datarrCs r!Ú_save_parsed_company_to_dbz,InvestorProcessor._save_parsed_company_to_dbGs=ð& ðœÓ&×0°lÀ6Ñ6JÐK×
ñ à*Ø×# NÔ3Ø+7¸Ñ+GØ °VÑ(<Ð'=Ð=dÐðð
H‰H”]Ó ¿¹Ð )×-Ð.>ÀÓ
 Ü  &Ÿ]™]Øð 'Ÿz™zô
ðv•ð
ðˆNøäò Ü Ð6°q°cÐ K‰KŒMÜûð úsA(DÁ+B,DÄ E Ä!EÅE rcóð
|jt«j|d¬«j«}|r³|}|j d«xs |j
|_|j d«xs |j |_|j d«xs |j|_|j d«xs |j|_|j d«xs |j|_ |j d«xs |j|_
|j d «xs |j|_ |j d
«xs |j|_ |j d «xs |j|_
|j d «xs |j|_|j d
«xs |j|_|j d«xs |j |_nðt|d|j d«|j d«|j d«|j d«|j d«|j d«|j d «|j d
«|j d «|j d «|j d
«|j d«¬«
}|j#|«|j%«|r=|jt&«j|j(¬«j+«|j dg«D]y}t'|j d«|j d«|j d«|j d«|j d«|j(¬«}|j#|«Œ{|r=|jt,«j|j(¬«j+«|j dg«D]B}t-|j(|j d«|j d«|j d«|j d«|j d«|j d«|j d«|j d«¬« }|j#|«|j%«|j d g«D]/} |j/|| «}
|j0j3|
«Œ1|j d!g«D]/} |j5|| «} |j6j3| «Œ1ŒE|S#t8$r(}
t;d"|
«|j=«Yd#}
~
y#d#}
~
wwxYw)$z.Save manually parsed investor data to databaserSrTrWr_r`rarbrcrdrerfrgr^)
rSrTrWr_r`rarbrcrdrerfrgr^)Ú investor_idrhrprorqrr)rSrprorqrrr¿rirxryrzr{r|r}r~) r¿rxryrzr{r|rrr}r~rr€z#Error saving investor to database: N)r
r´r‡rTrWr_r`rarbrcrdrerfrgr^r¸Úflushr rÚ_get_or_create_investment_stageÚinvestment_stagesrŠÚ_get_or_create_sectorÚsectorsr?r@)r6rÚexisting_investorÚinvestorÚ member_datarr”r“Ú
stage_nameÚstageÚ sector_nameÚsectorrCs r!Ú_save_parsed_investor_to_dbz-InvestorProcessor._save_parsed_investor_to_dbsðr ðœÓ'×1°}ÀVÑ7LÐM×
ò,Ø#0×#4Ñ#4°YÓ#?Ò#SÀ8×CSÑCSÔ à!×% ×9NÑ9NðÔ"×% L¸×8LÑ8LðÔ -×ÓG¸8¿<¹< à!×%Ð&6ÓR¸8×;RÑ;RðÔ"×%Ð&6ÓR¸8×;RÑ;RðÔ"×%Ð&9ÓX¸h×>XÑ>XðÔ"×%Ð&<Ó×Ô
2025-09-25 17:00:38 +01:00
"×%Ð&8ÓV¸X×=VÑ=VðÔ"×%Ð&8ÓV¸X×=VÑ=VðÔ"×%Ð&@Ó×Ô$1×#4Ñ#4°YÓ#?Ò#SÀ8×CSÑCSÕ ô& )×-¨iÓ8Ø!.×!2Ñ!2°>Ó!BØ -× 1Ñ 1°-Ó @Ø%×)¨%Ó0Ø#0×#4Ñ#4Ð5EÓ#FØ#0×#4Ñ#4Ð5EÓ#FØ&3×&7Ñ&7Ð8KÓ&LØ)6×):Ñ):Ð;QÓ)RØ%2×%6Ñ%6Ð7IÓ%JØ%2×%6Ñ%6Ð7IÓ%JØ-:×->Ñ->Ø.ð*×-¨iÓð" Ø
ñœÓ2¸x¿{¹{ÐÀÓ
 ܨӨÓ%Ÿ/™/¨'Ó%Ÿ/™/¨'Ó¨|Ó<Ø (§ ¡ ô
ðv•ð
ñœÓ-¸(¿+¹+ÐF×.¨w¸Ó
0 Ü Ø (§ ¡ Ø'Ÿm™m¨KÓ'Ÿm™m¨KÓ8Ø)2¯©Ð7MÓ)NØ%.§]¡]Ð3EÓ%FØ%.§]¡]Ð3EÓ%FØ(Ÿ}™}¨\Ó:Ø$-§M¡MÐ2CÓ$DØ%.§]¡]Ð3EÓ%Fô
ðt” Ø
ð#,§-¡-Ð0HÈ"Ó"Mò9 ×ÀZÓP×1°%Õ
$-§=¡=°ÀÓ#Dò07¸¸H—LL×Õ0ð+
0ð2ˆOøäò Ü Ð°sÐ K‰KŒMÜûð úsUUÕ U5Õ
U0Õ0U5cóàddlm}|j|«j|j|k(«j «}|s*||¬«}|j
|«|j«|S)z/Get existing investment stage or create new oner)r)Ú db.modelsrÚfilterrSr¸)r6rs r!z1InvestorProcessor._get_or_create_investment_stageës`õ
H‰HÐ
‰VÐ(×Ñ
‰U ñ
Ù(¨jÔ9ˆ F‰F5ŒMØ H‰HŒJ؈ r cóê|jt«jtj|k(«j «}|s-t|¬«}|j |«|j
«|S)z%Get existing sector or create new oner°)r rSr¸)r6s r!z'InvestorProcessor._get_or_create_sectorüsUàœ+Ó&×-¬k×.>Ñ.>À+Ñ.MÓN×VˆÙÜ  kÔ2ˆFØ F‰F6ŒNØ H‰HŒJ؈
r c ódt|jj|jj|jj|jj
|jj |jj|jj¬«}|j|«|j«|jD]J}t|j|j|j|j¬«}|j|«ŒL|j D]9}|j#||j«}|j j%|«Œ;|j&D]@}t)|ggg¬«} |j+|| d¬«}
|j&j%|
«ŒB|S)zSave investor data to database)rSr_r`r{r|r~Únumber_of_investments)rSrprqr¿)ÚmembersÚ investorsT)Úskip_investors)r
rSr_r`r{r|r~r¸rhr rprqÚportfolio_companiesrÚ_save_company_to_db) r6rrÚ sector_datarËÚcompany_schemar§s r!Ú_save_investor_to_dbz&InvestorProcessor._save_investor_to_dbô
×'×%×.××&×*×3×*×3×DØ"/×"8Ñ"8×"NÑ"Nô
ˆð ˆØ
Œ
ð ˆKÜ × ×!×$ŸK™Kô ˆ
2025-09-25 17:00:38 +01:00
F‰F6 ð)× ,ˆ×°K×4DÑ4DÓEˆFØ × Ñ × # FÕ 
,× 9ˆØØô ˆ ×.¨r°<ÐPTÐUˆ × Õ ˆr cóh|jt«jtj|jjk(«j «}|r|St|jj|jj |jj|jj|jj|jj¬«}|j|«|j«|jD]W}|jsŒt|j|j|j |j"¬«}|j|«ŒY|j$D]9}|j'||j«} |j$j)| «Œ;|s||j*D]m}
|jt,«jt,j|
jk(«j «} | sŒS|j*j)| «Œo|S)zSave company data to database)rSÚindustryÚlocationr_rT)rSrp)rrSr_rTr¸rrpr