5.2 KiB
5.2 KiB
✅ Base Database Ingestion Complete!
Date: October 5, 2025
Database: version_two.db
📊 Summary Statistics
| Entity | Count |
|---|---|
| Investors | 9,315 |
| Companies | 6,877 |
| Sectors | 639 |
| Investor-Company Relationships | 22,548 |
| Investor-Sector Relationships | 75,307 |
🎯 Top Investors by Portfolio Size
- Bpifrance - 211 companies
- European Innovation Council - 183 companies
- Business Growth Fund - 84 companies
- HTGF (High-Tech Gruenderfonds) - 74 companies
- EIT InnoEnergy - 72 companies
📁 Source Files
- Companies CSV: 13,027 rows
- Investors CSV: 11,045 rows
- Investors Ingested: 9,315 (some duplicates/invalid entries filtered out)
🗃️ Database Structure
Tables Created:
- ✅
investors- Core investor data - ✅
companies- Portfolio companies - ✅
sectors- Industry sectors - ✅
funds- (Empty, will be populated during enrichment) - ✅
investor_members- (Empty, will be populated during enrichment) - ✅
company_members- Company team members - ✅
investment_stages- Investment stage definitions - ✅ Association tables for relationships
Current Data:
- ✅ Investor names and basic info (website, investment count)
- ✅ Company details (name, location, industry, description)
- ✅ Sectors extracted from company industries
- ✅ Investor → Company relationships (who invested in what)
- ✅ Investor → Sector relationships (derived from portfolio)
Missing (To Be Added via Enrichment):
- ⏳ Investor headquarters
- ⏳ AUM (Assets Under Management) details
- ⏳ Investment thesis
- ⏳ Portfolio highlights
- ⏳ Fund details (multiple funds per investor)
- ⏳ Senior leadership/team members
- ⏳ Research notes and sources
🔄 Next Steps
1. Prepare Enriched Data CSV
Your enriched CSV should have this structure:
investor_name,enriched_data
"212","{\"websiteURL\": \"...\", \"funds\": [...], ...}"
"301","{...}"
2. Run Enrichment Script
cd preprocessor
python enrich_investors.py enriched_investors.csv investor_name enriched_data
This will:
- ✅ Add fund details (multiple funds per investor)
- ✅ Update AUM information
- ✅ Add investment thesis
- ✅ Add portfolio highlights
- ✅ Add senior leadership
- ✅ Add research notes and sources
3. Verify Enriched Data
python3 << 'EOF'
from models import InvestorTable, FundTable, get_db_session
session = get_db_session()
# Check enriched data
investor = session.query(InvestorTable).filter_by(name="Anaxago").first()
if investor:
print(f"Investor: {investor.name}")
print(f"HQ: {investor.headquarters}")
print(f"AUM: {investor.aum}")
print(f"Funds: {len(investor.funds)}")
for fund in investor.funds:
print(f" - {fund.fund_name}")
session.close()
EOF
📝 Sample Queries
Get Investor with Portfolio
from models import InvestorTable, get_db_session
session = get_db_session()
investor = session.query(InvestorTable).filter_by(name="Bpifrance").first()
print(f"Investor: {investor.name}")
print(f"Website: {investor.website}")
print(f"Investments: {investor.number_of_investments}")
print(f"Portfolio Companies: {len(investor.portfolio_companies)}")
print(f"Sectors: {[s.name for s in investor.sectors[:5]]}")
session.close()
Get Companies by Sector
from models import CompanyTable, SectorTable, get_db_session
session = get_db_session()
sector = session.query(SectorTable).filter_by(name="AgTech").first()
print(f"Sector: {sector.name}")
print(f"Companies: {len(sector.companies)}")
for company in sector.companies[:5]:
print(f" - {company.name}")
session.close()
Get Investor's Sector Distribution
from models import InvestorTable, get_db_session
session = get_db_session()
investor = session.query(InvestorTable).filter_by(name="Bpifrance").first()
sectors = {}
for company in investor.portfolio_companies:
for sector in company.sectors:
sectors[sector.name] = sectors.get(sector.name, 0) + 1
# Top sectors
for sector, count in sorted(sectors.items(), key=lambda x: x[1], reverse=True)[:5]:
print(f"{sector}: {count} companies")
session.close()
⚠️ Known Issues
Investors Not Found in DB
Some companies reference investors that weren't in the investors CSV:
- The Venture Collective
- Sarah Leary
- Transpose
- ND Capital
- InvestSud
- Third Swedish National Pension Fund
- Union Tech Ventures
- Vasuki Tech Fund
- MSA Novo
- And others...
These are likely individual angel investors or smaller funds not in the main investor list. They are recorded but not linked.
🔒 Backup
A backup of the database was created before ingestion:
version_two.db.backup_YYYYMMDD_HHMMSS
📧 Support
For issues or questions:
- Check the logs for error messages
- Verify CSV file formats
- Ensure all required columns are present
- Check for duplicate entries
Status: ✅ Base database created successfully
Ready for: Enrichment phase with detailed investor data