605 lines
16 KiB
Markdown
605 lines
16 KiB
Markdown
|
|
# Fund Relationship Schema Update
|
||
|
|
|
||
|
|
## Summary of Changes
|
||
|
|
|
||
|
|
### Database Schema Changes
|
||
|
|
|
||
|
|
**FundTable Updated:**
|
||
|
|
|
||
|
|
1. `geographic_focus`: Changed from `JSON` array to `STRING` (comma-separated values)
|
||
|
|
2. `investment_stage_focus`: **REMOVED** - replaced with many-to-many relationship
|
||
|
|
3. `sector_focus`: **REMOVED** - replaced with many-to-many relationship
|
||
|
|
|
||
|
|
**New Tables:**
|
||
|
|
|
||
|
|
1. `investment_stages` - Stores investment stage names (replaces enum)
|
||
|
|
2. `fund_investment_stages` - Association table for fund ↔ stage many-to-many
|
||
|
|
3. `fund_sectors` - Association table for fund ↔ sector many-to-many
|
||
|
|
|
||
|
|
### Why These Changes?
|
||
|
|
|
||
|
|
#### 1. Geographic Focus: JSON → String
|
||
|
|
|
||
|
|
- **Before**: `["Europe", "North America", "Asia"]`
|
||
|
|
- **After**: `"Europe, North America, Asia"`
|
||
|
|
- **Reason**: Simpler to display, easier to search with `LIKE` queries
|
||
|
|
|
||
|
|
#### 2. Investment Stages: JSON → Many-to-Many Relationship
|
||
|
|
|
||
|
|
- **Before**: JSON array stored in fund table
|
||
|
|
- **After**: Proper many-to-many relationship via association table
|
||
|
|
- **Benefits**:
|
||
|
|
- Can filter funds by specific stages efficiently
|
||
|
|
- Can join stages across multiple funds
|
||
|
|
- Centralized stage management
|
||
|
|
- Better data normalization
|
||
|
|
|
||
|
|
#### 3. Sectors: JSON → Many-to-Many Relationship
|
||
|
|
|
||
|
|
- **Before**: JSON array stored in fund table
|
||
|
|
- **After**: Proper many-to-many relationship with existing `SectorTable`
|
||
|
|
- **Benefits**:
|
||
|
|
- Reuses existing sector data
|
||
|
|
- Can filter/aggregate by sector across funds
|
||
|
|
- Maintains referential integrity
|
||
|
|
- Consistent with investor-sector relationship pattern
|
||
|
|
|
||
|
|
## Migration Details
|
||
|
|
|
||
|
|
### Successfully Executed
|
||
|
|
|
||
|
|
✅ **411 fund records** migrated
|
||
|
|
✅ **377 stage relationships** created from old JSON data
|
||
|
|
✅ **1,445 sector relationships** created from old JSON data
|
||
|
|
✅ **11 investment stages** seeded: Seed, Pre-Seed, Series A, Series B, Series C, Series D+, Growth, Late Stage, IPO, Venture, Early Stage
|
||
|
|
|
||
|
|
### Data Transformation Examples
|
||
|
|
|
||
|
|
**Geographic Focus:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
# Before
|
||
|
|
fund.geographic_focus = ["Europe", "North America"] # JSON
|
||
|
|
|
||
|
|
# After
|
||
|
|
fund.geographic_focus = "Europe, North America" # String
|
||
|
|
```
|
||
|
|
|
||
|
|
**Investment Stages:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
# Before
|
||
|
|
fund.investment_stage_focus = ["Seed", "Series A"] # JSON
|
||
|
|
|
||
|
|
# After
|
||
|
|
fund.investment_stages = [
|
||
|
|
InvestmentStageTable(id=1, name="Seed"),
|
||
|
|
InvestmentStageTable(id=3, name="Series A")
|
||
|
|
] # Relationship
|
||
|
|
```
|
||
|
|
|
||
|
|
**Sectors:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
# Before
|
||
|
|
fund.sector_focus = ["Fintech", "Healthcare"] # JSON
|
||
|
|
|
||
|
|
# After
|
||
|
|
fund.sectors = [
|
||
|
|
SectorTable(id=5, name="Fintech"),
|
||
|
|
SectorTable(id=12, name="Healthcare")
|
||
|
|
] # Relationship
|
||
|
|
```
|
||
|
|
|
||
|
|
## Database Schema
|
||
|
|
|
||
|
|
### Investment Stages Table
|
||
|
|
|
||
|
|
```sql
|
||
|
|
CREATE TABLE investment_stages (
|
||
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
|
|
name VARCHAR NOT NULL UNIQUE,
|
||
|
|
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
|
||
|
|
updated_at DATETIME
|
||
|
|
);
|
||
|
|
```
|
||
|
|
|
||
|
|
### Fund Investment Stages Association
|
||
|
|
|
||
|
|
```sql
|
||
|
|
CREATE TABLE fund_investment_stages (
|
||
|
|
fund_id INTEGER NOT NULL,
|
||
|
|
stage_id INTEGER NOT NULL,
|
||
|
|
PRIMARY KEY (fund_id, stage_id),
|
||
|
|
FOREIGN KEY (fund_id) REFERENCES funds (id) ON DELETE CASCADE,
|
||
|
|
FOREIGN KEY (stage_id) REFERENCES investment_stages (id) ON DELETE CASCADE
|
||
|
|
);
|
||
|
|
```
|
||
|
|
|
||
|
|
### Fund Sectors Association
|
||
|
|
|
||
|
|
```sql
|
||
|
|
CREATE TABLE fund_sectors (
|
||
|
|
fund_id INTEGER NOT NULL,
|
||
|
|
sector_id INTEGER NOT NULL,
|
||
|
|
PRIMARY KEY (fund_id, sector_id),
|
||
|
|
FOREIGN KEY (fund_id) REFERENCES funds (id) ON DELETE CASCADE,
|
||
|
|
FOREIGN KEY (sector_id) REFERENCES sectors (id) ON DELETE CASCADE
|
||
|
|
);
|
||
|
|
```
|
||
|
|
|
||
|
|
### Updated Funds Table
|
||
|
|
|
||
|
|
```sql
|
||
|
|
CREATE TABLE funds (
|
||
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
|
|
investor_id INTEGER NOT NULL,
|
||
|
|
fund_name VARCHAR,
|
||
|
|
fund_size INTEGER,
|
||
|
|
fund_size_source_url VARCHAR,
|
||
|
|
check_size_lower INTEGER,
|
||
|
|
check_size_upper INTEGER,
|
||
|
|
source_url VARCHAR,
|
||
|
|
source_provider VARCHAR,
|
||
|
|
geographic_focus VARCHAR, -- Changed from JSON to VARCHAR
|
||
|
|
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
|
||
|
|
updated_at DATETIME,
|
||
|
|
FOREIGN KEY (investor_id) REFERENCES investors (id)
|
||
|
|
);
|
||
|
|
```
|
||
|
|
|
||
|
|
## Code Changes
|
||
|
|
|
||
|
|
### 1. Models (Both app/db/models.py and preprocessor/models.py)
|
||
|
|
|
||
|
|
**Added Association Tables:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
# Association table for fund-stage many-to-many
|
||
|
|
fund_investment_stages_association = Table(
|
||
|
|
"fund_investment_stages",
|
||
|
|
Base.metadata,
|
||
|
|
Column("fund_id", Integer, ForeignKey("funds.id")),
|
||
|
|
Column("stage_id", Integer, ForeignKey("investment_stages.id")),
|
||
|
|
)
|
||
|
|
|
||
|
|
# Association table for fund-sector many-to-many
|
||
|
|
fund_sectors_association = Table(
|
||
|
|
"fund_sectors",
|
||
|
|
Base.metadata,
|
||
|
|
Column("fund_id", Integer, ForeignKey("funds.id")),
|
||
|
|
Column("sector_id", Integer, ForeignKey("sectors.id")),
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Updated FundTable:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
class FundTable(Base, TimestampMixin):
|
||
|
|
__tablename__ = "funds"
|
||
|
|
|
||
|
|
id = Column(Integer, primary_key=True, index=True)
|
||
|
|
investor_id = Column(Integer, ForeignKey("investors.id"), nullable=False)
|
||
|
|
|
||
|
|
# Fund details
|
||
|
|
fund_name = Column(String, nullable=True)
|
||
|
|
fund_size = Column(Integer, nullable=True)
|
||
|
|
fund_size_source_url = Column(String, nullable=True)
|
||
|
|
check_size_lower = Column(Integer, nullable=True)
|
||
|
|
check_size_upper = Column(Integer, nullable=True)
|
||
|
|
source_url = Column(String, nullable=True)
|
||
|
|
source_provider = Column(String, nullable=True)
|
||
|
|
|
||
|
|
# Geographic focus as simple string
|
||
|
|
geographic_focus = Column(String, nullable=True)
|
||
|
|
|
||
|
|
# Relationships
|
||
|
|
investor = relationship("InvestorTable", back_populates="funds")
|
||
|
|
investment_stages = relationship(
|
||
|
|
"InvestmentStageTable",
|
||
|
|
secondary=fund_investment_stages_association,
|
||
|
|
back_populates="funds",
|
||
|
|
)
|
||
|
|
sectors = relationship(
|
||
|
|
"SectorTable",
|
||
|
|
secondary=fund_sectors_association,
|
||
|
|
back_populates="funds",
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
**New InvestmentStageTable:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
class InvestmentStageTable(Base, TimestampMixin):
|
||
|
|
__tablename__ = "investment_stages"
|
||
|
|
|
||
|
|
id = Column(Integer, primary_key=True, index=True)
|
||
|
|
name = Column(String, nullable=False, unique=True)
|
||
|
|
|
||
|
|
# Relationships
|
||
|
|
funds = relationship(
|
||
|
|
"FundTable",
|
||
|
|
secondary=fund_investment_stages_association,
|
||
|
|
back_populates="investment_stages",
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Updated SectorTable:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
class SectorTable(Base, TimestampMixin):
|
||
|
|
__tablename__ = "sectors"
|
||
|
|
|
||
|
|
id = Column(Integer, primary_key=True, index=True)
|
||
|
|
name = Column(String, nullable=False)
|
||
|
|
|
||
|
|
# Relationships
|
||
|
|
investors = relationship(...)
|
||
|
|
companies = relationship(...)
|
||
|
|
projects = relationship(...)
|
||
|
|
funds = relationship( # NEW
|
||
|
|
"FundTable",
|
||
|
|
secondary=fund_sectors_association,
|
||
|
|
back_populates="sectors",
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
### 2. Router Schemas (app/schemas/router_schemas.py)
|
||
|
|
|
||
|
|
**New InvestmentStageSchema:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
class InvestmentStageSchema(BaseModel):
|
||
|
|
id: int
|
||
|
|
name: str
|
||
|
|
|
||
|
|
class Config:
|
||
|
|
from_attributes = True
|
||
|
|
```
|
||
|
|
|
||
|
|
**Updated FundSchema:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
class FundSchema(BaseModel):
|
||
|
|
id: int
|
||
|
|
fund_name: str | None
|
||
|
|
fund_size: int | None
|
||
|
|
fund_size_source_url: str | None
|
||
|
|
check_size_lower: int | None
|
||
|
|
check_size_upper: int | None
|
||
|
|
source_url: str | None
|
||
|
|
source_provider: str | None
|
||
|
|
geographic_focus: str | None # Changed from List[str]
|
||
|
|
investment_stages: List[InvestmentStageSchema] | None # Changed from List[str]
|
||
|
|
sectors: List[SectorSchema] | None # Changed from List[str]
|
||
|
|
created_at: Optional[datetime] = None
|
||
|
|
updated_at: Optional[datetime] = None
|
||
|
|
|
||
|
|
class Config:
|
||
|
|
from_attributes = True
|
||
|
|
```
|
||
|
|
|
||
|
|
**Updated InvestorFundData:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
class InvestorFundData(BaseModel):
|
||
|
|
# ... investor fields ...
|
||
|
|
|
||
|
|
# Fund fields
|
||
|
|
fund_id: int | None
|
||
|
|
fund_name: str | None
|
||
|
|
fund_size: int | None
|
||
|
|
fund_size_source_url: str | None
|
||
|
|
check_size_lower: int | None
|
||
|
|
check_size_upper: int | None
|
||
|
|
geographic_focus: str | None # Changed from List[str]
|
||
|
|
fund_investment_stages: List[InvestmentStageSchema] | None # NEW name
|
||
|
|
fund_sectors: List[SectorSchema] | None # NEW name
|
||
|
|
|
||
|
|
# ... related data ...
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3. LLM Parser (app/services/llm_parser.py)
|
||
|
|
|
||
|
|
**Updated Fund Processing:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
# Process funds
|
||
|
|
funds = profile.get("funds", [])
|
||
|
|
for fund in funds:
|
||
|
|
if isinstance(fund, dict):
|
||
|
|
fund_data = {
|
||
|
|
"fund_name": fund.get("fundName"),
|
||
|
|
"fund_size": None,
|
||
|
|
"fund_size_source_url": fund.get("fundSizeSourceUrl"),
|
||
|
|
"check_size_lower": None,
|
||
|
|
"check_size_upper": None,
|
||
|
|
"source_url": fund.get("sourceUrl"),
|
||
|
|
"source_provider": fund.get("sourceProvider"),
|
||
|
|
"geographic_focus": None, # Will be converted to string
|
||
|
|
"investment_stage_names": fund.get("investmentStageFocus", []),
|
||
|
|
"sector_names": fund.get("sectorFocus", []),
|
||
|
|
}
|
||
|
|
|
||
|
|
# Convert geographic focus from array to comma-separated string
|
||
|
|
geo_focus = fund.get("geographicFocus", [])
|
||
|
|
if geo_focus and isinstance(geo_focus, list):
|
||
|
|
fund_data["geographic_focus"] = ", ".join(geo_focus)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Updated Fund Saving:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
for fund_data in investor_data.get("funds", []):
|
||
|
|
fund = FundTable(
|
||
|
|
investor_id=investor.id,
|
||
|
|
fund_name=fund_data.get("fund_name"),
|
||
|
|
fund_size=fund_data.get("fund_size"),
|
||
|
|
fund_size_source_url=fund_data.get("fund_size_source_url"),
|
||
|
|
check_size_lower=fund_data.get("check_size_lower"),
|
||
|
|
check_size_upper=fund_data.get("check_size_upper"),
|
||
|
|
source_url=fund_data.get("source_url"),
|
||
|
|
source_provider=fund_data.get("source_provider"),
|
||
|
|
geographic_focus=fund_data.get("geographic_focus"), # String
|
||
|
|
)
|
||
|
|
db.add(fund)
|
||
|
|
db.flush() # Get the fund ID
|
||
|
|
|
||
|
|
# Add investment stages (many-to-many)
|
||
|
|
for stage_name in fund_data.get("investment_stage_names", []):
|
||
|
|
stage = self._get_or_create_investment_stage(db, stage_name)
|
||
|
|
fund.investment_stages.append(stage)
|
||
|
|
|
||
|
|
# Add sectors (many-to-many)
|
||
|
|
for sector_name in fund_data.get("sector_names", []):
|
||
|
|
sector = self._get_or_create_sector(db, sector_name)
|
||
|
|
fund.sectors.append(sector)
|
||
|
|
```
|
||
|
|
|
||
|
|
**New Helper Method:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
def _get_or_create_investment_stage(
|
||
|
|
self, db: Session, stage_name: str
|
||
|
|
) -> InvestmentStageTable:
|
||
|
|
"""Get existing investment stage or create new one"""
|
||
|
|
from db.models import InvestmentStageTable
|
||
|
|
|
||
|
|
stage = (
|
||
|
|
db.query(InvestmentStageTable)
|
||
|
|
.filter(InvestmentStageTable.name == stage_name)
|
||
|
|
.first()
|
||
|
|
)
|
||
|
|
if not stage:
|
||
|
|
stage = InvestmentStageTable(name=stage_name)
|
||
|
|
db.add(stage)
|
||
|
|
db.flush()
|
||
|
|
return stage
|
||
|
|
```
|
||
|
|
|
||
|
|
### 4. Router (app/routers/investors.py)
|
||
|
|
|
||
|
|
**Updated InvestorFundData Instantiation:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
# Before
|
||
|
|
geographic_focus=fund.geographic_focus, # Was List[str]
|
||
|
|
investment_stage_focus=fund.investment_stage_focus, # Was List[str]
|
||
|
|
sector_focus=fund.sector_focus, # Was List[str]
|
||
|
|
|
||
|
|
# After
|
||
|
|
geographic_focus=fund.geographic_focus, # Now str
|
||
|
|
fund_investment_stages=fund.investment_stages, # Now relationship
|
||
|
|
fund_sectors=fund.sectors, # Now relationship
|
||
|
|
```
|
||
|
|
|
||
|
|
## API Response Changes
|
||
|
|
|
||
|
|
### Before
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"fund_id": 1,
|
||
|
|
"fund_name": "Growth Fund",
|
||
|
|
"geographic_focus": ["Europe", "North America"],
|
||
|
|
"investment_stage_focus": ["Series A", "Series B"],
|
||
|
|
"sector_focus": ["Fintech", "Healthcare"]
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### After
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"fund_id": 1,
|
||
|
|
"fund_name": "Growth Fund",
|
||
|
|
"geographic_focus": "Europe, North America",
|
||
|
|
"fund_investment_stages": [
|
||
|
|
{ "id": 3, "name": "Series A" },
|
||
|
|
{ "id": 4, "name": "Series B" }
|
||
|
|
],
|
||
|
|
"fund_sectors": [
|
||
|
|
{ "id": 5, "name": "Fintech" },
|
||
|
|
{ "id": 12, "name": "Healthcare" }
|
||
|
|
]
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Query Examples
|
||
|
|
|
||
|
|
### Find Funds by Investment Stage
|
||
|
|
|
||
|
|
```python
|
||
|
|
# SQLAlchemy
|
||
|
|
funds = db.query(FundTable).join(
|
||
|
|
FundTable.investment_stages
|
||
|
|
).filter(
|
||
|
|
InvestmentStageTable.name == "Series A"
|
||
|
|
).all()
|
||
|
|
|
||
|
|
# SQL
|
||
|
|
SELECT f.* FROM funds f
|
||
|
|
JOIN fund_investment_stages fis ON f.id = fis.fund_id
|
||
|
|
JOIN investment_stages s ON fis.stage_id = s.id
|
||
|
|
WHERE s.name = 'Series A';
|
||
|
|
```
|
||
|
|
|
||
|
|
### Find Funds by Sector
|
||
|
|
|
||
|
|
```python
|
||
|
|
# SQLAlchemy
|
||
|
|
funds = db.query(FundTable).join(
|
||
|
|
FundTable.sectors
|
||
|
|
).filter(
|
||
|
|
SectorTable.name == "Fintech"
|
||
|
|
).all()
|
||
|
|
|
||
|
|
# SQL
|
||
|
|
SELECT f.* FROM funds f
|
||
|
|
JOIN fund_sectors fs ON f.id = fs.fund_id
|
||
|
|
JOIN sectors s ON fs.sector_id = s.id
|
||
|
|
WHERE s.name = 'Fintech';
|
||
|
|
```
|
||
|
|
|
||
|
|
### Find Funds by Geographic Focus
|
||
|
|
|
||
|
|
```python
|
||
|
|
# SQLAlchemy
|
||
|
|
funds = db.query(FundTable).filter(
|
||
|
|
FundTable.geographic_focus.ilike("%Europe%")
|
||
|
|
).all()
|
||
|
|
|
||
|
|
# SQL
|
||
|
|
SELECT * FROM funds
|
||
|
|
WHERE geographic_focus LIKE '%Europe%';
|
||
|
|
```
|
||
|
|
|
||
|
|
### Complex Query: Funds Investing in Fintech at Series A in Europe
|
||
|
|
|
||
|
|
```python
|
||
|
|
funds = db.query(FundTable).join(
|
||
|
|
FundTable.investment_stages
|
||
|
|
).join(
|
||
|
|
FundTable.sectors
|
||
|
|
).filter(
|
||
|
|
InvestmentStageTable.name == "Series A",
|
||
|
|
SectorTable.name == "Fintech",
|
||
|
|
FundTable.geographic_focus.ilike("%Europe%")
|
||
|
|
).all()
|
||
|
|
```
|
||
|
|
|
||
|
|
## Benefits
|
||
|
|
|
||
|
|
### 1. Better Data Normalization ✨
|
||
|
|
|
||
|
|
- Investment stages and sectors are now properly normalized
|
||
|
|
- No duplicate data stored in JSON arrays
|
||
|
|
- Single source of truth for stage/sector names
|
||
|
|
|
||
|
|
### 2. Efficient Filtering 🔍
|
||
|
|
|
||
|
|
- Can filter funds by stages/sectors using SQL JOINs
|
||
|
|
- No need to parse JSON for queries
|
||
|
|
- Database indexes can be used effectively
|
||
|
|
|
||
|
|
### 3. Data Integrity 🛡️
|
||
|
|
|
||
|
|
- Foreign key constraints ensure referential integrity
|
||
|
|
- Can't reference non-existent stages or sectors
|
||
|
|
- Cascade deletes work properly
|
||
|
|
|
||
|
|
### 4. Easier Aggregations 📊
|
||
|
|
|
||
|
|
```sql
|
||
|
|
-- Count funds per investment stage
|
||
|
|
SELECT s.name, COUNT(DISTINCT f.id) as fund_count
|
||
|
|
FROM investment_stages s
|
||
|
|
LEFT JOIN fund_investment_stages fis ON s.id = fis.stage_id
|
||
|
|
LEFT JOIN funds f ON fis.fund_id = f.id
|
||
|
|
GROUP BY s.name;
|
||
|
|
|
||
|
|
-- Count funds per sector
|
||
|
|
SELECT s.name, COUNT(DISTINCT f.id) as fund_count
|
||
|
|
FROM sectors s
|
||
|
|
LEFT JOIN fund_sectors fs ON s.id = fs.sector_id
|
||
|
|
LEFT JOIN funds f ON fs.fund_id = f.id
|
||
|
|
GROUP BY s.name;
|
||
|
|
```
|
||
|
|
|
||
|
|
### 5. Consistent Pattern 🎯
|
||
|
|
|
||
|
|
- Follows same many-to-many pattern as:
|
||
|
|
- Investors ↔ Sectors
|
||
|
|
- Companies ↔ Sectors
|
||
|
|
- Projects ↔ Sectors
|
||
|
|
- Makes codebase more maintainable
|
||
|
|
|
||
|
|
## Frontend Updates Required
|
||
|
|
|
||
|
|
### Geographic Focus
|
||
|
|
|
||
|
|
```typescript
|
||
|
|
// OLD
|
||
|
|
const geoList = fund.geographic_focus.join(", ");
|
||
|
|
|
||
|
|
// NEW
|
||
|
|
const geoStr = fund.geographic_focus; // Already a string
|
||
|
|
```
|
||
|
|
|
||
|
|
### Investment Stages
|
||
|
|
|
||
|
|
```typescript
|
||
|
|
// OLD
|
||
|
|
const stages = fund.investment_stage_focus; // string[]
|
||
|
|
|
||
|
|
// NEW
|
||
|
|
const stages = fund.fund_investment_stages.map((s) => s.name); // InvestmentStageSchema[]
|
||
|
|
```
|
||
|
|
|
||
|
|
### Sectors
|
||
|
|
|
||
|
|
```typescript
|
||
|
|
// OLD
|
||
|
|
const sectors = fund.sector_focus; // string[]
|
||
|
|
|
||
|
|
// NEW
|
||
|
|
const sectors = fund.fund_sectors.map((s) => s.name); // SectorSchema[]
|
||
|
|
```
|
||
|
|
|
||
|
|
## Files Modified
|
||
|
|
|
||
|
|
1. ✅ `preprocessor/models.py` - Updated FundTable, added association tables
|
||
|
|
2. ✅ `app/db/models.py` - Updated FundTable, added InvestmentStageTable
|
||
|
|
3. ✅ `app/schemas/router_schemas.py` - Updated FundSchema, InvestorFundData
|
||
|
|
4. ✅ `app/services/llm_parser.py` - Updated fund processing logic
|
||
|
|
5. ✅ `app/routers/investors.py` - Updated response formatting
|
||
|
|
6. ✅ `preprocessor/migrate_fund_relationships.py` - Migration script (NEW)
|
||
|
|
|
||
|
|
## Migration Status
|
||
|
|
|
||
|
|
✅ **Database migrated**: 411 fund records updated
|
||
|
|
✅ **377 stage relationships** created from old JSON data
|
||
|
|
✅ **1,445 sector relationships** created from old JSON data
|
||
|
|
✅ **11 investment stages** seeded
|
||
|
|
✅ **All code updated**: Models, schemas, parsers, routers
|
||
|
|
✅ **No errors**: All files compile successfully
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
1. **Test the API** with new response structure
|
||
|
|
2. **Update frontend** to use new field formats
|
||
|
|
3. **Re-parse CSV** (optional) to ensure all new data uses the correct structure
|
||
|
|
4. **Update filtering UI** to leverage the new relationships
|
||
|
|
|
||
|
|
## Summary
|
||
|
|
|
||
|
|
The fund schema has been successfully refactored to:
|
||
|
|
|
||
|
|
- Store `geographic_focus` as a simple string for easier display
|
||
|
|
- Use proper many-to-many relationships for `investment_stages`
|
||
|
|
- Use proper many-to-many relationships with existing `sectors` table
|
||
|
|
- Enable efficient filtering and aggregation by stage/sector
|
||
|
|
- Maintain better data normalization and integrity
|
||
|
|
|
||
|
|
This enables powerful queries like "Show me all Fintech funds investing at Series A in Europe" with simple SQL JOINs! 🎉
|