7.3 KiB
Database Schema Update - Enriched Investor Data & Funds
Overview
Updated the database schema to support enriched investor data with multiple funds per investor.
Key Changes
1. InvestorTable - New Fields
Basic Info
headquarters- Investor headquarters locationwebsite- Investor website URL (moved from nullable)
AUM (Assets Under Management)
aum- Changed from Integer to String to preserve currency (e.g., "EUR 850,000,000")aum_as_of_date- Date when AUM was measuredaum_source_url- Source URL for AUM information
Investment Information
investment_thesis- JSON array of thesis statementsportfolio_highlights- JSON array of notable portfolio companieslinked_documents- JSON array of document URLs
Research Metadata
researcher_notes- Free-text notes from researchmissing_important_fields- JSON array of field names that are missingsources- JSON object mapping field names to source URLs
Deprecated Fields (kept for backward compatibility)
check_size_lower/upper- Now handled at fund levelgeographic_focus- Now handled at fund levelstage_focus- Now handled at fund level
2. FundTable - NEW TABLE
Represents individual funds managed by an investor. One investor can have multiple funds.
Fields:
id- Primary keyinvestor_id- Foreign key to InvestorTablefund_name- Name of the fundfund_size- Size of fund (string to preserve currency)fund_size_source_url- Source URL for fund sizeestimated_investment_size- Typical investment range (e.g., "EUR 1,000 to 2,000")source_url- Source URL for fund informationsource_provider- Provider of information (e.g., "Perplexity")geographic_focus- JSON array of regions/countriesinvestment_stage_focus- JSON array of investment stagessector_focus- JSON array of sectors
Relationship:
- Many-to-One with InvestorTable
- Cascade delete (deleting investor deletes all funds)
3. InvestorMember - Enhanced
Added fields for senior leadership data:
title- Alternative to role fieldsource_url- URL where member info was found
Data Model
InvestorTable (1) -----> (Many) FundTable
|
|-----> (Many) InvestorMember
|-----> (Many) CompanyTable (portfolio_companies)
|-----> (Many) SectorTable
|-----> (Many) InvestmentStageTable
Frontend Strategy
Flattened Response
The frontend will receive a flattened view where each fund appears as a separate investor entry:
Investor A + Fund 1 → Row 1
Investor A + Fund 2 → Row 2
Investor A + Fund 3 → Row 3
Investor B + Fund 1 → Row 4
Benefits:
- ✅ No frontend schema changes needed
- ✅ Each row represents a distinct investment opportunity
- ✅ Filtering and querying work naturally
- ✅ Compatibility scoring can be done per fund
- ✅ Backend maintains proper normalization
Files Modified
Preprocessor
preprocessor/models.py- Updated schema with all new fields and FundTablepreprocessor/enrich_investors.py- NEW Script to ingest enriched data
App
app/db/models.py- Updated schema to match preprocessor
Usage
1. Run Initial Data Ingestion (if not done)
cd preprocessor
python main.py
2. Run Enrichment
cd preprocessor
python enrich_investors.py enriched_investors.csv investor_name enriched_data
CSV Format:
| investor_name | enriched_data |
|---|---|
| Anaxago | {"funds": [...], "headquarters": "...", ...} |
| VC Firm B | {...} |
3. Reinitialize Database (if needed)
# Backup first!
cp version_two.db version_two.db.backup
# Delete and reinitialize
rm version_two.db
python main.py # Run initial ingestion
python enrich_investors.py enriched_investors.csv # Run enrichment
Enrichment Script Features
✅ Upsert Logic - Creates new investors or updates existing ones ✅ Duplicate Prevention - Won't create duplicate funds or team members ✅ Flexible Matching - Matches by name or website ✅ Batch Commits - Commits every 10 investors for performance ✅ Error Handling - Continues on errors, reports at end ✅ Detailed Logging - Shows progress and summary
Next Steps
1. Create Compatibility Scorer Service
See the design doc for the CompatibilityScorer service that will:
- Calculate match scores for both filtered and queried results
- Provide detailed breakdown of scoring
- Work with fund-level criteria
2. Update API Endpoints
- Modify
GET /investorsto flatten funds - Update
GET /investors/filterto query funds table - Enhance
/queryendpoint to extract parameters and score
3. Update Frontend Schemas (Pydantic)
Add optional fields to response schemas:
compatibility_score: Optional[float]match_details: Optional[dict]- Fund-related fields in
InvestorData
Example Enriched JSON
{
"websiteURL": "http://www.anaxago.com",
"headquarters": "Paris, France",
"investorDescription": "Anaxago is an investment group...",
"overallAssetsUnderManagement": {
"aumAmount": "EUR 850,000,000",
"asOfDate": "Not Available",
"sourceUrl": "http://www.anaxago.com"
},
"investmentThesisFocus": ["Sustainable real estate", "Climate tech"],
"portfolioHighlights": ["Tilak Healthcare", "Innovorder"],
"funds": [
{
"fundName": "Crowdfunding Immobilier",
"fundSize": "Not Available",
"estimatedInvestmentSize": "EUR 1,000 to 2,000",
"geographicFocus": ["France"],
"investmentStageFocus": ["Seed", "Early Stage"],
"sectorFocus": ["Real Estate"],
"sourceUrl": "http://www.anaxago.com/investissement"
}
],
"seniorLeadership": [
{
"name": "Joachim Dupont",
"title": "Co-fondateur et président",
"sourceUrl": "https://capital.anaxago.com/equipe"
}
],
"researcherNotes": "No explicit official fund sizes found",
"missingImportantFields": ["fundSize"],
"sources": {
"funds": "http://www.anaxago.com/investissement",
"headquarters": "http://www.anaxago.com/contact"
}
}
Database Migration
If you have existing data:
# Migration script (if needed)
from models import InvestorTable, engine
from sqlalchemy import text
with engine.connect() as conn:
# Add new columns (SQLAlchemy will handle this with create_all)
# But if you need manual migration:
# Convert AUM from Integer to String
conn.execute(text("ALTER TABLE investors ADD COLUMN aum_new TEXT"))
conn.execute(text("UPDATE investors SET aum_new = CAST(aum AS TEXT) WHERE aum IS NOT NULL"))
conn.execute(text("ALTER TABLE investors DROP COLUMN aum"))
conn.execute(text("ALTER TABLE investors RENAME COLUMN aum_new TO aum"))
conn.commit()
Questions?
-
Q: What if an investor has no funds? A: They'll appear once with all fund fields as NULL
-
Q: How do we handle fund updates? A: Enrichment script updates existing funds by fund_name + investor_id
-
Q: Can we query by fund criteria? A: Yes! Join InvestorTable with FundTable and filter on fund fields
-
Q: How does compatibility scoring work? A: See the separate
CompatibilityScorerservice design