Compare commits
4 Commits
7b58834316
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| b1b1c5ea1e | |||
| 29d9292cbd | |||
| edd0ae910b | |||
| 84cbb888e6 |
File diff suppressed because one or more lines are too long
@@ -1,29 +1,38 @@
|
||||
# LLM-Powered Investor Parser
|
||||
# LLM-Powered Investor & Company Management API
|
||||
|
||||
A comprehensive system for parsing investor data from CSV files and storing it in both SQL and vector databases for efficient retrieval and semantic search.
|
||||
A comprehensive FastAPI-based system for managing investor and company data with LLM-powered CSV parsing, semantic search, and advanced filtering capabilities.
|
||||
|
||||
## Features
|
||||
|
||||
- **CSV Data Processing**: Parses complex investor data from CSV files with nested JSON fields
|
||||
- **Dual Database Storage**: Saves structured data to SQL database and text data to vector database
|
||||
- **LLM Enhancement**: Optional OpenAI GPT integration for data cleaning and enhancement
|
||||
- **Semantic Search**: Vector similarity search for finding relevant investors
|
||||
- **Robust Error Handling**: Graceful handling of malformed JSON and missing data
|
||||
- **Command-Line Interface**: Easy-to-use CLI for batch processing and search
|
||||
- **FastAPI REST API**: Modern, auto-documented API with OpenAPI/Swagger support
|
||||
- **CSV Data Processing**: Parse complex investor data from CSV files using LLM assistance
|
||||
- **Dual Database Storage**: Structured data in SQL database and semantic search via ChromaDB
|
||||
- **Natural Language Queries**: AI-powered query processing for complex investor searches
|
||||
- **Advanced Filtering**: Filter investors and companies by multiple criteria
|
||||
- **Relationship Management**: Many-to-many relationships between investors, companies, and sectors
|
||||
- **Auto-Generated Documentation**: Interactive API docs at `/docs`
|
||||
|
||||
## Architecture
|
||||
|
||||
### Components
|
||||
|
||||
1. **Schema (`schema.py`)**: SQLAlchemy models and Pydantic validators
|
||||
2. **Database (`db.py`)**: SQL database connection and session management
|
||||
3. **Parser (`investor_parser.py`)**: Main parsing logic with LLM integration
|
||||
4. **Test Parser (`test_parser.py`)**: Simplified parser without LLM dependencies
|
||||
1. **FastAPI Application (`app/main.py`)**: Main API server with route configuration
|
||||
2. **Database Models (`app/db/models.py`)**: SQLAlchemy models for investors, companies, sectors
|
||||
3. **Pydantic Schemas (`app/py_schemas.py`)**: Request/response validation and serialization
|
||||
4. **API Routes**:
|
||||
- `app/api/investors.py`: Investor CRUD operations and filtering
|
||||
- `app/api/companies.py`: Company CRUD operations and filtering
|
||||
5. **Services**:
|
||||
- `app/services/openrouter.py`: LLM-powered CSV processing
|
||||
- `app/services/querying.py`: Natural language query processing
|
||||
6. **Database (`app/db/`)**: Database connection, models, and schemas
|
||||
|
||||
### Data Flow
|
||||
|
||||
```
|
||||
CSV File → JSON Parsing → Data Extraction → LLM Enhancement → SQL Storage → Vector Storage
|
||||
CSV Upload → LLM Processing → Data Extraction → SQL Storage → Vector Storage → API Endpoints
|
||||
↓
|
||||
Natural Language Query → AI Analysis → Database Filtering → Structured Response
|
||||
```
|
||||
|
||||
## Installation
|
||||
@@ -31,7 +40,7 @@ CSV File → JSON Parsing → Data Extraction → LLM Enhancement → SQL Storag
|
||||
### Prerequisites
|
||||
|
||||
- Python 3.12+
|
||||
- UV package manager (or pip)
|
||||
- FastAPI and dependencies
|
||||
|
||||
### Setup
|
||||
|
||||
@@ -41,104 +50,244 @@ CSV File → JSON Parsing → Data Extraction → LLM Enhancement → SQL Storag
|
||||
cd /path/to/anton_wireframe
|
||||
```
|
||||
|
||||
2. Create and activate virtual environment using UV:
|
||||
2. Install dependencies:
|
||||
|
||||
```bash
|
||||
uv venv
|
||||
source .venv/bin/activate # On Linux/Mac
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
3. Install dependencies:
|
||||
|
||||
```bash
|
||||
uv pip install pandas sqlalchemy chromadb openai python-dotenv pydantic
|
||||
```
|
||||
|
||||
4. Configure environment variables (optional for LLM features):
|
||||
3. Configure environment variables:
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# Edit .env and add your OpenAI API key
|
||||
# Edit .env and add your OpenRouter API key for LLM features
|
||||
```
|
||||
|
||||
4. Initialize the database:
|
||||
|
||||
```bash
|
||||
cd app
|
||||
python -c "from db.db import init_database; init_database()"
|
||||
```
|
||||
|
||||
5. Start the API server:
|
||||
|
||||
```bash
|
||||
cd app
|
||||
uvicorn main:app --reload --host localhost --port 8000
|
||||
```
|
||||
|
||||
The API will be available at:
|
||||
|
||||
- **API Base**: http://localhost:8000
|
||||
- **Interactive Docs**: http://localhost:8000/docs
|
||||
- **ReDoc**: http://localhost:8000/redoc
|
||||
|
||||
## Database Schema
|
||||
|
||||
### SQL Database (SQLite)
|
||||
|
||||
The `investors` table contains:
|
||||
#### Investors Table
|
||||
|
||||
- **Basic Info**: name, website, headquarters
|
||||
- **Investment Focus**: investor_description, investment_thesis_focus
|
||||
- **Financial Data**: AUM amount, date, source URL
|
||||
- **Fund Information**: JSON array of fund details
|
||||
- **Raw Data**: Original CSV fields for reference
|
||||
- **Basic Info**: name, description, geographic_focus
|
||||
- **Investment Data**: aum, check_size_lower, check_size_upper
|
||||
- **Stage Focus**: investment stage (SEED, SERIES_A, etc.)
|
||||
- **Relationships**: Many-to-many with companies and sectors
|
||||
- **Team**: One-to-many with team members
|
||||
- **Metadata**: created_at, updated_at timestamps
|
||||
|
||||
#### Companies Table
|
||||
|
||||
- **Basic Info**: name, industry, location
|
||||
- **Details**: founded_year, website
|
||||
- **Relationships**: Many-to-many with investors
|
||||
- **Metadata**: created_at, updated_at timestamps
|
||||
|
||||
#### Association Tables
|
||||
|
||||
- **investor_companies**: Links investors to their portfolio companies
|
||||
- **investor_sectors**: Links investors to their focus sectors
|
||||
- **investor_team**: Team member details for each investor
|
||||
|
||||
#### Supporting Tables
|
||||
|
||||
- **sectors**: Investment focus areas (fintech, healthcare, etc.)
|
||||
|
||||
### Vector Database (ChromaDB)
|
||||
|
||||
Stores embeddings of:
|
||||
Stores embeddings for semantic search of:
|
||||
|
||||
- Investor descriptions
|
||||
- Investment thesis focus areas
|
||||
- Combined text for semantic search
|
||||
- Combined investor profiles
|
||||
|
||||
## Usage
|
||||
## API Usage
|
||||
|
||||
### Command Line Interface
|
||||
### Interactive Documentation
|
||||
|
||||
#### Process CSV File (Simple Mode)
|
||||
Visit http://localhost:8000/docs for the auto-generated Swagger UI where you can:
|
||||
|
||||
- Explore all endpoints
|
||||
- Test API calls directly
|
||||
- View request/response schemas
|
||||
- See example requests
|
||||
|
||||
### Core Endpoints
|
||||
|
||||
#### Investor Management
|
||||
|
||||
```bash
|
||||
python investor_parser.py --file "path/to/investors.csv" --limit 50
|
||||
# Get all investors with relationships
|
||||
GET /investors
|
||||
|
||||
# Filter investors by criteria
|
||||
GET /investors/filter?stage=GROWTH&geography=US§or=fintech&min_check_size=1000000
|
||||
|
||||
# Get specific investor
|
||||
GET /investors/{investor_id}
|
||||
|
||||
# Create new investor
|
||||
POST /investors
|
||||
{
|
||||
"name": "Example VC",
|
||||
"description": "Early stage fintech investor",
|
||||
"aum": 50000000,
|
||||
"check_size_lower": 100000,
|
||||
"check_size_upper": 2000000,
|
||||
"geographic_focus": "US",
|
||||
"stage_focus": "SEED",
|
||||
"number_of_investments": 25
|
||||
}
|
||||
|
||||
# Update investor
|
||||
PUT /investors/{investor_id}
|
||||
|
||||
# Delete investor
|
||||
DELETE /investors/{investor_id}
|
||||
```
|
||||
|
||||
#### Process CSV File (LLM-Enhanced Mode)
|
||||
#### Company Management
|
||||
|
||||
```bash
|
||||
python investor_parser.py --file "path/to/investors.csv" --limit 50 --use-llm
|
||||
# Get all companies with investor relationships
|
||||
GET /companies
|
||||
|
||||
# Filter companies by criteria
|
||||
GET /companies/filter?industry=fintech&location=San Francisco&founded_after=2015
|
||||
|
||||
# Get specific company
|
||||
GET /companies/{company_id}
|
||||
|
||||
# Create new company
|
||||
POST /companies
|
||||
{
|
||||
"name": "Example Startup",
|
||||
"industry": "fintech",
|
||||
"location": "San Francisco",
|
||||
"founded_year": 2020,
|
||||
"website": "https://example.com"
|
||||
}
|
||||
|
||||
# Update company
|
||||
PUT /companies/{company_id}
|
||||
|
||||
# Delete company
|
||||
DELETE /companies/{company_id}
|
||||
```
|
||||
|
||||
#### Search Investors
|
||||
#### CSV Processing
|
||||
|
||||
```bash
|
||||
python investor_parser.py --search "bioeconomy sustainable agriculture" --search-limit 10
|
||||
# Upload and process CSV file
|
||||
POST /parse-csv
|
||||
Content-Type: multipart/form-data
|
||||
File: investors.csv
|
||||
```
|
||||
|
||||
#### View Help
|
||||
#### Natural Language Queries
|
||||
|
||||
```bash
|
||||
python investor_parser.py --help
|
||||
# Query investors using natural language
|
||||
POST /query
|
||||
{
|
||||
"question": "Show me growth stage fintech investors in Silicon Valley with check sizes over $1 million"
|
||||
}
|
||||
```
|
||||
|
||||
### Python API
|
||||
### Advanced Filtering Examples
|
||||
|
||||
#### Basic Usage
|
||||
#### Investor Filters
|
||||
|
||||
```python
|
||||
from investor_parser import InvestorParser
|
||||
```bash
|
||||
# Early stage investors in Europe
|
||||
GET /investors/filter?stage=SEED&geography=Europe
|
||||
|
||||
# Initialize parser (with or without LLM)
|
||||
parser = InvestorParser(use_llm=True)
|
||||
# High AUM growth investors
|
||||
GET /investors/filter?stage=GROWTH&min_aum=100000000
|
||||
|
||||
# Process CSV file
|
||||
processed, errors = parser.process_csv_file("investors.csv", limit=100)
|
||||
# Healthcare investors with large checks
|
||||
GET /investors/filter?sector=healthcare&min_check_size=5000000
|
||||
|
||||
# Search investors
|
||||
results = parser.search_investors("venture capital fintech", limit=5)
|
||||
# Specific geographic focus
|
||||
GET /investors/filter?geography=Silicon Valley
|
||||
```
|
||||
|
||||
#### Direct Database Access
|
||||
#### Company Filters
|
||||
|
||||
```python
|
||||
from db import get_session
|
||||
from schema import Investor
|
||||
from sqlalchemy import select
|
||||
```bash
|
||||
# Recent fintech companies
|
||||
GET /companies/filter?industry=fintech&founded_after=2020
|
||||
|
||||
# Query database
|
||||
with get_session() as session:
|
||||
investors = session.execute(select(Investor)).scalars().all()
|
||||
for investor in investors:
|
||||
print(f"{investor.name}: {investor.website}")
|
||||
# Companies with websites
|
||||
GET /companies/filter?has_website=true
|
||||
|
||||
# Companies backed by specific investor
|
||||
GET /companies/filter?investor_name=Sequoia
|
||||
|
||||
# Location-based filtering
|
||||
GET /companies/filter?location=New York
|
||||
```
|
||||
|
||||
### Response Format
|
||||
|
||||
All endpoints return structured JSON with full relationship data:
|
||||
|
||||
```json
|
||||
{
|
||||
"investor": {
|
||||
"id": 1,
|
||||
"name": "Example VC",
|
||||
"description": "Early stage investor",
|
||||
"aum": 50000000,
|
||||
"check_size_lower": 100000,
|
||||
"check_size_upper": 2000000,
|
||||
"geographic_focus": "US",
|
||||
"stage_focus": "SEED",
|
||||
"number_of_investments": 25
|
||||
},
|
||||
"portfolio_companies": [
|
||||
{
|
||||
"id": 1,
|
||||
"name": "StartupCo",
|
||||
"industry": "fintech",
|
||||
"location": "San Francisco"
|
||||
}
|
||||
],
|
||||
"team_members": [
|
||||
{
|
||||
"id": 1,
|
||||
"name": "John Partner",
|
||||
"role": "Managing Partner",
|
||||
"email": "john@examplevc.com"
|
||||
}
|
||||
],
|
||||
"sectors": [
|
||||
{
|
||||
"id": 1,
|
||||
"name": "fintech"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Data Processing Pipeline
|
||||
@@ -185,148 +334,234 @@ When `--use-llm` is enabled:
|
||||
### Environment Variables (.env)
|
||||
|
||||
```bash
|
||||
# OpenAI API Configuration (required for LLM features)
|
||||
OPENAI_API_KEY=your_openai_api_key_here
|
||||
# OpenRouter API Configuration (required for LLM features)
|
||||
OPENROUTER_API_KEY=your_openrouter_api_key_here
|
||||
|
||||
# Database Configuration
|
||||
# Database Configuration (optional, defaults to SQLite)
|
||||
DATABASE_URL=sqlite:///investors.db
|
||||
|
||||
# FastAPI Configuration
|
||||
API_HOST=localhost
|
||||
API_PORT=8000
|
||||
```
|
||||
|
||||
### LLM Configuration
|
||||
|
||||
- Model: GPT-3.5-turbo (configurable)
|
||||
- Temperature: 0.3 for enhancement, 0 for JSON cleaning
|
||||
- Max tokens: Automatically managed
|
||||
- Fallback: Graceful degradation when API unavailable
|
||||
- **Provider**: OpenRouter (supports multiple models)
|
||||
- **Default Model**: google/gemini-2.5-flash-lite
|
||||
- **Temperature**: 0.3 for enhancement, 0 for structured data
|
||||
- **Fallback**: Graceful degradation when API unavailable
|
||||
|
||||
## Search Capabilities
|
||||
## Natural Language Query Processing
|
||||
|
||||
### Vector Search Examples
|
||||
The system supports intelligent natural language queries that automatically extract filters and search criteria:
|
||||
|
||||
### Query Examples
|
||||
|
||||
```bash
|
||||
# Find sustainable/ESG investors
|
||||
python investor_parser.py --search "sustainability ESG impact investing"
|
||||
# Stage-based queries
|
||||
"Show me seed stage investors"
|
||||
"Find growth stage VCs"
|
||||
|
||||
# Find fintech investors
|
||||
python investor_parser.py --search "financial technology digital payments"
|
||||
# Geographic queries
|
||||
"Investors in Silicon Valley"
|
||||
"European venture capital firms"
|
||||
|
||||
# Find biotech/healthcare investors
|
||||
python investor_parser.py --search "biotechnology healthcare pharmaceuticals"
|
||||
# Sector-specific queries
|
||||
"Fintech investors"
|
||||
"Healthcare and biotech VCs"
|
||||
|
||||
# Find early-stage investors
|
||||
python investor_parser.py --search "seed series A early stage venture"
|
||||
# Size-based queries
|
||||
"Investors with $5M+ check sizes"
|
||||
"High AUM growth investors"
|
||||
|
||||
# Combined queries
|
||||
"Growth stage fintech investors in the US with check sizes over $1 million"
|
||||
"European healthcare investors focusing on early stage"
|
||||
```
|
||||
|
||||
### Search Results Include
|
||||
### Query Processing Features
|
||||
|
||||
- Investor name and website
|
||||
- Headquarters location
|
||||
- Number of focus areas
|
||||
- Similarity score (lower = more similar)
|
||||
- **Automatic Filter Extraction**: Detects investment stages, geographies, sectors, and check sizes
|
||||
- **Semantic Understanding**: Uses AI to interpret complex queries
|
||||
- **Database Integration**: Combines AI analysis with efficient SQL filtering
|
||||
- **Complete Relationships**: Returns full investor data with portfolio companies, team members, and sectors
|
||||
|
||||
### Query Response
|
||||
|
||||
The `/query` endpoint returns a structured `InvestorList` with complete relationship data, making it easy to get comprehensive information about matching investors.
|
||||
|
||||
## Error Handling
|
||||
|
||||
### API Error Responses
|
||||
|
||||
The API provides clear HTTP status codes and error messages:
|
||||
|
||||
```json
|
||||
// 404 Not Found
|
||||
{
|
||||
"detail": "Investor not found"
|
||||
}
|
||||
|
||||
// 422 Validation Error
|
||||
{
|
||||
"detail": [
|
||||
{
|
||||
"loc": ["body", "stage_focus"],
|
||||
"msg": "value is not a valid enumeration member",
|
||||
"type": "type_error.enum"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Robust Processing
|
||||
|
||||
- Malformed JSON handling with LLM backup
|
||||
- Missing data graceful degradation
|
||||
- Individual row error isolation
|
||||
- Comprehensive logging
|
||||
- **Data Validation**: Pydantic models ensure data integrity
|
||||
- **Relationship Management**: Automatic handling of foreign key constraints
|
||||
- **LLM Fallbacks**: Graceful degradation when AI services unavailable
|
||||
- **Transaction Safety**: Database rollbacks on errors
|
||||
- **Comprehensive Logging**: Detailed error tracking and debugging
|
||||
|
||||
### Common Issues and Solutions
|
||||
|
||||
1. **Invalid JSON in CSV**
|
||||
1. **Invalid Enum Values**
|
||||
|
||||
- Solution: Enable LLM mode for automatic cleaning
|
||||
- Fallback: Empty object insertion
|
||||
- Solution: Use uppercase enum values (SEED, GROWTH, etc.)
|
||||
- Check: Investment stages must match defined enum
|
||||
|
||||
2. **Missing OpenAI API Key**
|
||||
2. **Missing OpenRouter API Key**
|
||||
|
||||
- Solution: System automatically disables LLM features
|
||||
- Falls back to basic parsing mode
|
||||
- Solution: Set OPENROUTER_API_KEY in environment
|
||||
- Fallback: CSV processing continues without LLM enhancement
|
||||
|
||||
3. **Database Connection Issues**
|
||||
- Solution: Uses SQLite by default (no external dependencies)
|
||||
- Configurable via DATABASE_URL
|
||||
|
||||
- Solution: Verify DATABASE_URL configuration
|
||||
- Default: Uses SQLite (no external dependencies)
|
||||
|
||||
4. **Relationship Errors**
|
||||
- Solution: Ensure proper foreign key relationships
|
||||
- Check: Use existing sector/company IDs or create new ones
|
||||
|
||||
## Performance
|
||||
|
||||
### Benchmarks (Approximate)
|
||||
|
||||
- **Simple Mode**: ~2-5 seconds per row
|
||||
- **LLM Mode**: ~5-15 seconds per row (depends on API latency)
|
||||
- **Search**: <100ms for vector similarity queries
|
||||
- **API Response Time**: <200ms for standard queries
|
||||
- **Database Queries**: <50ms for filtered searches with relationships
|
||||
- **CSV Processing**: ~5-15 seconds per row (depends on LLM API latency)
|
||||
- **Natural Language Queries**: ~2-5 seconds (AI processing + database query)
|
||||
- **Vector Search**: <100ms for semantic similarity queries
|
||||
|
||||
### Optimization Tips
|
||||
### Optimization Features
|
||||
|
||||
1. Use `--limit` for testing and development
|
||||
2. Process in batches for large datasets
|
||||
3. Enable LLM mode only when data quality is crucial
|
||||
4. Use local vector database for faster searches
|
||||
1. **Eager Loading**: Efficient relationship loading with `selectinload()`
|
||||
2. **Query Optimization**: Smart filtering to reduce database load
|
||||
3. **Caching**: Database connection pooling and session management
|
||||
4. **Pagination**: Built-in limits to prevent overwhelming responses
|
||||
5. **Async Processing**: FastAPI async capabilities for better performance
|
||||
|
||||
### Production Recommendations
|
||||
|
||||
1. **Database**: Consider PostgreSQL for production workloads
|
||||
2. **Caching**: Add Redis for frequently accessed data
|
||||
3. **Load Balancing**: Deploy multiple API instances behind a load balancer
|
||||
4. **Monitoring**: Implement logging and metrics collection
|
||||
5. **Rate Limiting**: Add API rate limiting for public endpoints
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
anton_wireframe/
|
||||
├── schema.py # Database models and validators
|
||||
├── db.py # Database connection management
|
||||
├── investor_parser.py # Main parser with CLI
|
||||
├── test_parser.py # Simplified parser for testing
|
||||
├── .env # Environment configuration
|
||||
├── investors.db # SQLite database (created automatically)
|
||||
├── chroma_db/ # Vector database directory
|
||||
└── README.md # This documentation
|
||||
├── app/
|
||||
│ ├── main.py # FastAPI application and main endpoints
|
||||
│ ├── py_schemas.py # Pydantic models for validation
|
||||
│ ├── settings.py # Configuration management
|
||||
│ ├── api/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── investors.py # Investor CRUD and filtering endpoints
|
||||
│ │ └── companies.py # Company CRUD and filtering endpoints
|
||||
│ ├── db/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── db.py # Database connection and session management
|
||||
│ │ ├── models.py # SQLAlchemy database models
|
||||
│ │ └── new_schema.py # Additional schema definitions
|
||||
│ └── services/
|
||||
│ ├── __init__.py
|
||||
│ ├── openrouter.py # LLM-powered CSV processing
|
||||
│ ├── querying.py # Natural language query processing
|
||||
│ └── langgraph_agent.py # AI agent configuration
|
||||
├── chroma_db/ # Vector database directory
|
||||
├── requirements.txt # Python dependencies
|
||||
├── README.md # This documentation
|
||||
└── .env # Environment configuration
|
||||
```
|
||||
|
||||
## Example Output
|
||||
## Example Usage Scenarios
|
||||
|
||||
### Processing Log
|
||||
|
||||
```
|
||||
2025-08-27 19:45:46,614 - INFO - Database initialized successfully!
|
||||
2025-08-27 19:45:46,690 - INFO - Starting to process CSV file: investors.csv
|
||||
2025-08-27 19:45:46,690 - INFO - Loaded 82 rows from CSV
|
||||
2025-08-27 19:45:46,690 - INFO - Processing limited to 20 rows
|
||||
2025-08-27 19:45:46,691 - INFO - Processing row 1/20: European Circular Bioeconomy Fund
|
||||
2025-08-27 19:45:46,692 - INFO - Creating new investor: European Circular Bioeconomy Fund
|
||||
2025-08-27 19:45:46,693 - INFO - Added investor European Circular Bioeconomy Fund to vector database
|
||||
...
|
||||
2025-08-27 19:45:50,828 - INFO - Processing complete! Processed: 20, Errors: 0
|
||||
```
|
||||
|
||||
### Search Results
|
||||
### 1. Upload and Process Investor Data
|
||||
|
||||
```bash
|
||||
$ python investor_parser.py --search "circular bioeconomy"
|
||||
|
||||
Found 4 similar investors:
|
||||
1. European Circular Bioeconomy Fund
|
||||
Website: https://www.ecbf.vc
|
||||
HQ: ECBF Management GmbH, Poppelsdorfer Allee 175, 53115 Bonn, Germany
|
||||
Focus areas: 6
|
||||
Similarity score: 0.979
|
||||
|
||||
2. Astanor
|
||||
Website: https://www.astanor.com/
|
||||
HQ:
|
||||
Focus areas: 5
|
||||
Similarity score: 1.080
|
||||
# Upload CSV file via API
|
||||
curl -X POST "http://localhost:8000/parse-csv" \
|
||||
-H "Content-Type: multipart/form-data" \
|
||||
-F "file=@investors.csv"
|
||||
```
|
||||
|
||||
## Contributing
|
||||
### 2. Find Specific Investors
|
||||
|
||||
### Development Setup
|
||||
```bash
|
||||
# Natural language search
|
||||
curl -X POST "http://localhost:8000/query" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"question": "Show me growth stage fintech investors in Silicon Valley with check sizes over $2 million"}'
|
||||
|
||||
1. Install development dependencies
|
||||
2. Run tests: `python test_parser.py`
|
||||
3. Lint code: Follow PEP 8 standards
|
||||
4. Test with sample data before processing full datasets
|
||||
# Structured filtering
|
||||
curl "http://localhost:8000/investors/filter?stage=GROWTH§or=fintech&geography=Silicon%20Valley&min_check_size=2000000"
|
||||
```
|
||||
|
||||
### Adding Features
|
||||
### 3. Company Research
|
||||
|
||||
- New data extractors: Extend `extract_structured_data()`
|
||||
- New LLM prompts: Modify `enhance_with_llm()`
|
||||
- New search capabilities: Extend ChromaDB integration
|
||||
```bash
|
||||
# Find companies in specific sector
|
||||
curl "http://localhost:8000/companies/filter?industry=fintech&founded_after=2020"
|
||||
|
||||
# Find companies backed by specific investor
|
||||
curl "http://localhost:8000/companies/filter?investor_name=Sequoia"
|
||||
```
|
||||
|
||||
### 4. Investment Analysis
|
||||
|
||||
```bash
|
||||
# Get investor with full portfolio
|
||||
curl "http://localhost:8000/investors/1"
|
||||
|
||||
# Find all companies in a specific location
|
||||
curl "http://localhost:8000/companies/filter?location=San%20Francisco"
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
### Running in Development Mode
|
||||
|
||||
```bash
|
||||
cd app
|
||||
uvicorn main:app --reload --host localhost --port 8000
|
||||
```
|
||||
|
||||
### Testing the API
|
||||
|
||||
1. **Interactive Testing**: Visit http://localhost:8000/docs
|
||||
2. **Manual Testing**: Use curl or Postman with the examples above
|
||||
3. **Database Inspection**: Use SQLite browser to inspect `investors_2.db`
|
||||
|
||||
### Adding New Features
|
||||
|
||||
1. **New Endpoints**: Add routes to `api/investors.py` or `api/companies.py`
|
||||
2. **New Models**: Update `db/models.py` and `py_schemas.py`
|
||||
3. **New Filters**: Extend filtering logic in route handlers
|
||||
4. **New LLM Features**: Modify `services/openrouter.py` or `services/querying.py`
|
||||
|
||||
## License
|
||||
|
||||
|
||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
+205
-5
@@ -1,8 +1,208 @@
|
||||
from fastapi.routing import apirouter
|
||||
from typing import List, Optional
|
||||
|
||||
router = apirouter()
|
||||
from db.db import get_db
|
||||
from db.models import CompanyTable, InvestorTable
|
||||
from fastapi import APIRouter, Depends, HTTPException, Query
|
||||
from py_schemas import CompanySchema
|
||||
from pydantic import BaseModel
|
||||
from sqlalchemy.orm import Session, selectinload
|
||||
|
||||
@router.get("/companies")
|
||||
def read_companies():
|
||||
return {"message": "list of companies"}
|
||||
router = APIRouter(tags=["Company Routes"])
|
||||
|
||||
|
||||
# Request schemas for creating/updating
|
||||
class CompanyCreate(BaseModel):
|
||||
name: str
|
||||
industry: str
|
||||
location: str
|
||||
founded_year: Optional[int] = None
|
||||
website: Optional[str] = None
|
||||
|
||||
|
||||
class CompanyUpdate(BaseModel):
|
||||
name: Optional[str] = None
|
||||
industry: Optional[str] = None
|
||||
location: Optional[str] = None
|
||||
founded_year: Optional[int] = None
|
||||
website: Optional[str] = None
|
||||
|
||||
|
||||
# Response schema with relationships
|
||||
class CompanyData(BaseModel):
|
||||
"""Comprehensive company data schema"""
|
||||
|
||||
company: CompanySchema
|
||||
investors: List["InvestorBasic"] = []
|
||||
|
||||
class Config:
|
||||
from_attributes = True
|
||||
|
||||
|
||||
class InvestorBasic(BaseModel):
|
||||
"""Basic investor info for company responses"""
|
||||
|
||||
id: int
|
||||
name: str
|
||||
geographic_focus: str
|
||||
stage_focus: str
|
||||
check_size_lower: int
|
||||
check_size_upper: int
|
||||
|
||||
class Config:
|
||||
from_attributes = True
|
||||
|
||||
|
||||
@router.get("/companies", response_model=List[CompanyData])
|
||||
def read_companies(db: Session = Depends(get_db)):
|
||||
"""Get all companies with their investor relationships"""
|
||||
companies = (
|
||||
db.query(CompanyTable).options(selectinload(CompanyTable.investors)).all()
|
||||
)
|
||||
|
||||
# Transform CompanyTable objects to CompanyData format
|
||||
company_data_list = []
|
||||
for company in companies:
|
||||
company_data = CompanyData(company=company, investors=company.investors)
|
||||
company_data_list.append(company_data)
|
||||
|
||||
return company_data_list
|
||||
|
||||
|
||||
@router.get("/companies/filter", response_model=List[CompanyData])
|
||||
def filter_companies(
|
||||
industry: Optional[str] = Query(
|
||||
None, description="Filter by industry (partial match)"
|
||||
),
|
||||
location: Optional[str] = Query(
|
||||
None, description="Filter by location (partial match)"
|
||||
),
|
||||
founded_after: Optional[int] = Query(None, description="Founded after year"),
|
||||
founded_before: Optional[int] = Query(None, description="Founded before year"),
|
||||
has_website: Optional[bool] = Query(
|
||||
None, description="Filter companies with/without website"
|
||||
),
|
||||
investor_name: Optional[str] = Query(
|
||||
None, description="Filter by investor name (partial match)"
|
||||
),
|
||||
db: Session = Depends(get_db),
|
||||
):
|
||||
"""Filter companies based on various criteria"""
|
||||
|
||||
# Start with base query
|
||||
query = db.query(CompanyTable).options(selectinload(CompanyTable.investors))
|
||||
|
||||
# Apply filters
|
||||
if industry:
|
||||
query = query.filter(CompanyTable.industry.ilike(f"%{industry}%"))
|
||||
|
||||
if location:
|
||||
query = query.filter(CompanyTable.location.ilike(f"%{location}%"))
|
||||
|
||||
if founded_after is not None:
|
||||
query = query.filter(CompanyTable.founded_year >= founded_after)
|
||||
|
||||
if founded_before is not None:
|
||||
query = query.filter(CompanyTable.founded_year <= founded_before)
|
||||
|
||||
if has_website is not None:
|
||||
if has_website:
|
||||
query = query.filter(CompanyTable.website.isnot(None))
|
||||
else:
|
||||
query = query.filter(CompanyTable.website.is_(None))
|
||||
|
||||
# Filter by investor if provided
|
||||
if investor_name:
|
||||
query = query.join(CompanyTable.investors).filter(
|
||||
InvestorTable.name.ilike(f"%{investor_name}%")
|
||||
)
|
||||
|
||||
companies = query.all()
|
||||
|
||||
# Transform to CompanyData format
|
||||
company_data_list = []
|
||||
for company in companies:
|
||||
company_data = CompanyData(company=company, investors=company.investors)
|
||||
company_data_list.append(company_data)
|
||||
|
||||
return company_data_list
|
||||
|
||||
|
||||
@router.get("/companies/{company_id}", response_model=CompanyData)
|
||||
def read_company(company_id: int, db: Session = Depends(get_db)):
|
||||
"""Get a specific company by ID with its investors"""
|
||||
company = (
|
||||
db.query(CompanyTable)
|
||||
.options(selectinload(CompanyTable.investors))
|
||||
.filter(CompanyTable.id == company_id)
|
||||
.first()
|
||||
)
|
||||
|
||||
if not company:
|
||||
raise HTTPException(status_code=404, detail="Company not found")
|
||||
|
||||
# Transform to CompanyData format
|
||||
return CompanyData(company=company, investors=company.investors)
|
||||
|
||||
|
||||
@router.post("/companies", response_model=CompanyData)
|
||||
def create_company(company: CompanyCreate, db: Session = Depends(get_db)):
|
||||
"""Create a new company"""
|
||||
db_company = CompanyTable(**company.dict())
|
||||
db.add(db_company)
|
||||
db.commit()
|
||||
db.refresh(db_company)
|
||||
|
||||
# Reload with relationships
|
||||
company_with_relations = (
|
||||
db.query(CompanyTable)
|
||||
.options(selectinload(CompanyTable.investors))
|
||||
.filter(CompanyTable.id == db_company.id)
|
||||
.first()
|
||||
)
|
||||
|
||||
# Transform to CompanyData format
|
||||
return CompanyData(
|
||||
company=company_with_relations, investors=company_with_relations.investors
|
||||
)
|
||||
|
||||
|
||||
@router.put("/companies/{company_id}", response_model=CompanyData)
|
||||
def update_company(
|
||||
company_id: int, company: CompanyUpdate, db: Session = Depends(get_db)
|
||||
):
|
||||
"""Update an existing company"""
|
||||
db_company = db.query(CompanyTable).filter(CompanyTable.id == company_id).first()
|
||||
if not db_company:
|
||||
raise HTTPException(status_code=404, detail="Company not found")
|
||||
|
||||
update_data = company.dict(exclude_unset=True)
|
||||
for field, value in update_data.items():
|
||||
setattr(db_company, field, value)
|
||||
|
||||
db.commit()
|
||||
db.refresh(db_company)
|
||||
|
||||
# Reload with relationships
|
||||
company_with_relations = (
|
||||
db.query(CompanyTable)
|
||||
.options(selectinload(CompanyTable.investors))
|
||||
.filter(CompanyTable.id == company_id)
|
||||
.first()
|
||||
)
|
||||
|
||||
# Transform to CompanyData format
|
||||
return CompanyData(
|
||||
company=company_with_relations, investors=company_with_relations.investors
|
||||
)
|
||||
|
||||
|
||||
@router.delete("/companies/{company_id}")
|
||||
def delete_company(company_id: int, db: Session = Depends(get_db)):
|
||||
"""Delete a company"""
|
||||
db_company = db.query(CompanyTable).filter(CompanyTable.id == company_id).first()
|
||||
if not db_company:
|
||||
raise HTTPException(status_code=404, detail="Company not found")
|
||||
|
||||
db.delete(db_company)
|
||||
db.commit()
|
||||
return {"message": "Company deleted successfully"}
|
||||
|
||||
+230
-5
@@ -1,8 +1,233 @@
|
||||
from fastapi import APIRouter
|
||||
from typing import List, Optional
|
||||
|
||||
router = APIRouter()
|
||||
from db.db import get_db
|
||||
from db.models import InvestorTable, SectorTable
|
||||
from fastapi import APIRouter, Depends, HTTPException, Query
|
||||
from py_schemas import InvestmentStage, InvestorData
|
||||
from pydantic import BaseModel
|
||||
from sqlalchemy.orm import Session, selectinload
|
||||
|
||||
@router.get("/investors")
|
||||
def read_investors():
|
||||
return {"message": "list of investors"}
|
||||
router = APIRouter(tags=["Investor Routes"])
|
||||
|
||||
|
||||
# Request schemas for creating/updating
|
||||
class InvestorCreate(BaseModel):
|
||||
name: str
|
||||
description: str = None
|
||||
aum: int
|
||||
check_size_lower: int
|
||||
check_size_upper: int
|
||||
geographic_focus: str
|
||||
stage_focus: InvestmentStage
|
||||
number_of_investments: int = 0
|
||||
|
||||
|
||||
class InvestorUpdate(BaseModel):
|
||||
name: str = None
|
||||
description: str = None
|
||||
aum: int = None
|
||||
check_size_lower: int = None
|
||||
check_size_upper: int = None
|
||||
geographic_focus: str = None
|
||||
stage_focus: InvestmentStage = None
|
||||
number_of_investments: int = None
|
||||
|
||||
|
||||
@router.get("/investors", response_model=List[InvestorData])
|
||||
def read_investors(db: Session = Depends(get_db)):
|
||||
"""Get all investors with their related data"""
|
||||
investors = (
|
||||
db.query(InvestorTable)
|
||||
.options(
|
||||
selectinload(InvestorTable.portfolio_companies),
|
||||
selectinload(InvestorTable.team_members),
|
||||
selectinload(InvestorTable.sectors),
|
||||
)
|
||||
.all()
|
||||
)
|
||||
|
||||
# Transform InvestorTable objects to InvestorData format
|
||||
investor_data_list = []
|
||||
for investor in investors:
|
||||
investor_data = InvestorData(
|
||||
investor=investor, # This maps to InvestorSchema
|
||||
portfolio_companies=investor.portfolio_companies,
|
||||
team_members=investor.team_members,
|
||||
sectors=investor.sectors,
|
||||
)
|
||||
investor_data_list.append(investor_data)
|
||||
|
||||
return investor_data_list
|
||||
|
||||
|
||||
@router.get("/investors/filter", response_model=List[InvestorData])
|
||||
def filter_investors(
|
||||
stage: Optional[InvestmentStage] = Query(
|
||||
None, description="Filter by investment stage"
|
||||
),
|
||||
min_check_size: Optional[int] = Query(None, description="Minimum check size"),
|
||||
max_check_size: Optional[int] = Query(None, description="Maximum check size"),
|
||||
geography: Optional[str] = Query(
|
||||
None, description="Geographic focus (partial match)"
|
||||
),
|
||||
sector: Optional[str] = Query(None, description="Sector name (partial match)"),
|
||||
min_aum: Optional[int] = Query(None, description="Minimum AUM"),
|
||||
max_aum: Optional[int] = Query(None, description="Maximum AUM"),
|
||||
db: Session = Depends(get_db),
|
||||
):
|
||||
"""Filter investors based on various criteria"""
|
||||
|
||||
# Start with base query
|
||||
query = db.query(InvestorTable).options(
|
||||
selectinload(InvestorTable.portfolio_companies),
|
||||
selectinload(InvestorTable.team_members),
|
||||
selectinload(InvestorTable.sectors),
|
||||
)
|
||||
|
||||
# Apply filters
|
||||
if stage:
|
||||
query = query.filter(InvestorTable.stage_focus == stage)
|
||||
|
||||
if min_check_size is not None:
|
||||
query = query.filter(InvestorTable.check_size_lower >= min_check_size)
|
||||
|
||||
if max_check_size is not None:
|
||||
query = query.filter(InvestorTable.check_size_upper <= max_check_size)
|
||||
|
||||
if geography:
|
||||
query = query.filter(InvestorTable.geographic_focus.ilike(f"%{geography}%"))
|
||||
|
||||
if min_aum is not None:
|
||||
query = query.filter(InvestorTable.aum >= min_aum)
|
||||
|
||||
if max_aum is not None:
|
||||
query = query.filter(InvestorTable.aum <= max_aum)
|
||||
|
||||
# Filter by sector if provided
|
||||
if sector:
|
||||
query = query.join(InvestorTable.sectors).filter(
|
||||
SectorTable.name.ilike(f"%{sector}%")
|
||||
)
|
||||
|
||||
investors = query.all()
|
||||
|
||||
# Transform to InvestorData format
|
||||
investor_data_list = []
|
||||
for investor in investors:
|
||||
investor_data = InvestorData(
|
||||
investor=investor,
|
||||
portfolio_companies=investor.portfolio_companies,
|
||||
team_members=investor.team_members,
|
||||
sectors=investor.sectors,
|
||||
)
|
||||
investor_data_list.append(investor_data)
|
||||
|
||||
return investor_data_list
|
||||
|
||||
|
||||
@router.get("/investors/{investor_id}", response_model=InvestorData)
|
||||
def read_investor(investor_id: int, db: Session = Depends(get_db)):
|
||||
"""Get a specific investor by ID"""
|
||||
investor = (
|
||||
db.query(InvestorTable)
|
||||
.options(
|
||||
selectinload(InvestorTable.portfolio_companies),
|
||||
selectinload(InvestorTable.team_members),
|
||||
selectinload(InvestorTable.sectors),
|
||||
)
|
||||
.filter(InvestorTable.id == investor_id)
|
||||
.first()
|
||||
)
|
||||
|
||||
if not investor:
|
||||
raise HTTPException(status_code=404, detail="Investor not found")
|
||||
|
||||
# Transform to InvestorData format
|
||||
return InvestorData(
|
||||
investor=investor,
|
||||
portfolio_companies=investor.portfolio_companies,
|
||||
team_members=investor.team_members,
|
||||
sectors=investor.sectors,
|
||||
)
|
||||
|
||||
|
||||
@router.post("/investors", response_model=InvestorData)
|
||||
def create_investor(investor: InvestorCreate, db: Session = Depends(get_db)):
|
||||
"""Create a new investor"""
|
||||
db_investor = InvestorTable(**investor.dict())
|
||||
db.add(db_investor)
|
||||
db.commit()
|
||||
db.refresh(db_investor)
|
||||
|
||||
# Reload with relationships
|
||||
investor_with_relations = (
|
||||
db.query(InvestorTable)
|
||||
.options(
|
||||
selectinload(InvestorTable.portfolio_companies),
|
||||
selectinload(InvestorTable.team_members),
|
||||
selectinload(InvestorTable.sectors),
|
||||
)
|
||||
.filter(InvestorTable.id == db_investor.id)
|
||||
.first()
|
||||
)
|
||||
|
||||
# Transform to InvestorData format
|
||||
return InvestorData(
|
||||
investor=investor_with_relations,
|
||||
portfolio_companies=investor_with_relations.portfolio_companies,
|
||||
team_members=investor_with_relations.team_members,
|
||||
sectors=investor_with_relations.sectors,
|
||||
)
|
||||
|
||||
|
||||
@router.put("/investors/{investor_id}", response_model=InvestorData)
|
||||
def update_investor(
|
||||
investor_id: int, investor: InvestorUpdate, db: Session = Depends(get_db)
|
||||
):
|
||||
"""Update an existing investor"""
|
||||
db_investor = (
|
||||
db.query(InvestorTable).filter(InvestorTable.id == investor_id).first()
|
||||
)
|
||||
if not db_investor:
|
||||
raise HTTPException(status_code=404, detail="Investor not found")
|
||||
|
||||
update_data = investor.dict(exclude_unset=True)
|
||||
for field, value in update_data.items():
|
||||
setattr(db_investor, field, value)
|
||||
|
||||
db.commit()
|
||||
db.refresh(db_investor)
|
||||
|
||||
# Reload with relationships
|
||||
investor_with_relations = (
|
||||
db.query(InvestorTable)
|
||||
.options(
|
||||
selectinload(InvestorTable.portfolio_companies),
|
||||
selectinload(InvestorTable.team_members),
|
||||
selectinload(InvestorTable.sectors),
|
||||
)
|
||||
.filter(InvestorTable.id == investor_id)
|
||||
.first()
|
||||
)
|
||||
|
||||
# Transform to InvestorData format
|
||||
return InvestorData(
|
||||
investor=investor_with_relations,
|
||||
portfolio_companies=investor_with_relations.portfolio_companies,
|
||||
team_members=investor_with_relations.team_members,
|
||||
sectors=investor_with_relations.sectors,
|
||||
)
|
||||
|
||||
|
||||
@router.delete("/investors/{investor_id}")
|
||||
def delete_investor(investor_id: int, db: Session = Depends(get_db)):
|
||||
"""Delete an investor"""
|
||||
db_investor = (
|
||||
db.query(InvestorTable).filter(InvestorTable.id == investor_id).first()
|
||||
)
|
||||
if not db_investor:
|
||||
raise HTTPException(status_code=404, detail="Investor not found")
|
||||
|
||||
db.delete(db_investor)
|
||||
db.commit()
|
||||
return {"message": "Investor deleted successfully"}
|
||||
|
||||
@@ -0,0 +1,46 @@
|
||||
from sqlalchemy.orm import Session
|
||||
from db.models import InvestorTable
|
||||
from db.db import get_db
|
||||
|
||||
def update_stage_focus_values():
|
||||
"""Update existing stage_focus values from lowercase to uppercase"""
|
||||
db = next(get_db())
|
||||
|
||||
try:
|
||||
# Mapping of old lowercase values to new uppercase values
|
||||
stage_mappings = {
|
||||
'seed': 'SEED',
|
||||
'series_a': 'SERIES_A',
|
||||
'series_b': 'SERIES_B',
|
||||
'series_c': 'SERIES_C',
|
||||
'growth': 'GROWTH',
|
||||
'late_stage': 'LATE_STAGE'
|
||||
}
|
||||
|
||||
updated_count = 0
|
||||
|
||||
for old_value, new_value in stage_mappings.items():
|
||||
# Update records with the old value
|
||||
result = db.query(InvestorTable).filter(
|
||||
InvestorTable.stage_focus == old_value
|
||||
).update(
|
||||
{InvestorTable.stage_focus: new_value},
|
||||
synchronize_session=False
|
||||
)
|
||||
|
||||
updated_count += result
|
||||
print(f"Updated {result} records from '{old_value}' to '{new_value}'")
|
||||
|
||||
db.commit()
|
||||
print(f"Successfully updated {updated_count} total records")
|
||||
|
||||
except Exception as e:
|
||||
db.rollback()
|
||||
print(f"Error updating stage_focus values: {e}")
|
||||
raise
|
||||
finally:
|
||||
db.close()
|
||||
|
||||
# Run the update
|
||||
if __name__ == "__main__":
|
||||
update_stage_focus_values()
|
||||
Binary file not shown.
Binary file not shown.
+1
-1
@@ -9,7 +9,7 @@ from sqlalchemy.orm import Session, sessionmaker
|
||||
Base = declarative_base()
|
||||
|
||||
# Database configuration
|
||||
DATABASE_URL = os.getenv("DATABASE_URL", "sqlite:///investors_2.db")
|
||||
DATABASE_URL = os.getenv("DATABASE_URL", "sqlite:///investors.db")
|
||||
|
||||
# Create engine
|
||||
engine = create_engine(DATABASE_URL, echo=False)
|
||||
|
||||
+6
-7
@@ -9,13 +9,12 @@ from db.db import Base
|
||||
|
||||
|
||||
class InvestmentStage(enum.Enum):
|
||||
SEED = "seed"
|
||||
SERIES_A = "series_a"
|
||||
SERIES_B = "series_b"
|
||||
SERIES_C = "series_c"
|
||||
GROWTH = "growth"
|
||||
LATE_STAGE = "late_stage"
|
||||
|
||||
SEED = "SEED"
|
||||
SERIES_A = "SERIES_A"
|
||||
SERIES_B = "SERIES_B"
|
||||
SERIES_C = "SERIES_C"
|
||||
GROWTH = "GROWTH"
|
||||
LATE_STAGE = "LATE_STAGE"
|
||||
|
||||
# Association table for many-to-many relationship between investors and companies
|
||||
investor_company_association = Table(
|
||||
|
||||
+34
-10
@@ -1,23 +1,36 @@
|
||||
import io
|
||||
|
||||
import pandas as pd
|
||||
from api import investors
|
||||
from api import companies, investors
|
||||
from db.db import db_dependency, init_database
|
||||
from fastapi import FastAPI, File, UploadFile
|
||||
from services.openrouter import InvestorProcessor
|
||||
from py_schemas import InvestorList
|
||||
from pydantic import BaseModel
|
||||
from services.openrouter_v2 import InvestorProcessor
|
||||
from services.querying import QueryProcessor
|
||||
|
||||
app = FastAPI()
|
||||
app.include_router(investors.router)
|
||||
init_database()
|
||||
|
||||
|
||||
# Request models
|
||||
class QueryRequest(BaseModel):
|
||||
question: str
|
||||
|
||||
class Config:
|
||||
json_schema_extra = {
|
||||
"example": {
|
||||
"question": "Show me growth stage fintech investors in the US with check sizes over $1 million"
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@app.get("/")
|
||||
def read_root():
|
||||
def health():
|
||||
return {"Hello": "World"}
|
||||
|
||||
|
||||
@app.post("/parse-csv")
|
||||
@app.post("/parse-csv", tags=["CSV Upload"], response_model=list[dict])
|
||||
async def parse_csv(db: db_dependency, file: UploadFile = File(...)):
|
||||
# Read uploaded CSV with pandas
|
||||
content = await file.read()
|
||||
@@ -28,16 +41,27 @@ async def parse_csv(db: db_dependency, file: UploadFile = File(...)):
|
||||
results = await processor.process_csv(df)
|
||||
|
||||
# Convert Pydantic objects to dictionaries
|
||||
return {"results": [r.dict() for r in results]}
|
||||
return [r.model_dump() for r in results]
|
||||
|
||||
|
||||
@app.post("/query")
|
||||
async def query_investors(db: db_dependency, question: str):
|
||||
@app.post("/query", response_model=InvestorList, tags=["Querying"])
|
||||
async def query_investors(db: db_dependency, request: QueryRequest):
|
||||
"""
|
||||
Query investors using natural language.
|
||||
|
||||
Supports queries like:
|
||||
- "Show me seed stage investors"
|
||||
- "Find fintech investors in Silicon Valley"
|
||||
- "Growth stage investors with $5M+ check sizes"
|
||||
- "Healthcare investors in Europe"
|
||||
"""
|
||||
processor = QueryProcessor(sql_session=db)
|
||||
results = processor.process_query(question)
|
||||
return {"results": results}
|
||||
results = processor.process_query(request.question)
|
||||
return results
|
||||
|
||||
|
||||
app.include_router(investors.router)
|
||||
app.include_router(companies.router)
|
||||
if __name__ == "__main__":
|
||||
import uvicorn
|
||||
|
||||
|
||||
+12
-10
@@ -1,16 +1,17 @@
|
||||
from pydantic import BaseModel
|
||||
from datetime import datetime
|
||||
from typing import List, Optional
|
||||
from enum import Enum
|
||||
from typing import List, Optional
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
|
||||
class InvestmentStage(str, Enum):
|
||||
SEED = "seed"
|
||||
SERIES_A = "series_a"
|
||||
SERIES_B = "series_b"
|
||||
SERIES_C = "series_c"
|
||||
GROWTH = "growth"
|
||||
LATE_STAGE = "late_stage"
|
||||
SEED = "SEED"
|
||||
SERIES_A = "SERIES_A"
|
||||
SERIES_B = "SERIES_B"
|
||||
SERIES_C = "SERIES_C"
|
||||
GROWTH = "GROWTH"
|
||||
LATE_STAGE = "LATE_STAGE"
|
||||
|
||||
|
||||
class SectorSchema(BaseModel):
|
||||
@@ -64,6 +65,7 @@ class InvestorSchema(BaseModel):
|
||||
|
||||
class InvestorData(BaseModel):
|
||||
"""Comprehensive investor data schema for LLM processing"""
|
||||
|
||||
investor: InvestorSchema
|
||||
portfolio_companies: List[CompanySchema] = []
|
||||
team_members: List[InvestorTeamMemberSchema] = []
|
||||
@@ -71,7 +73,7 @@ class InvestorData(BaseModel):
|
||||
|
||||
class Config:
|
||||
from_attributes = True
|
||||
|
||||
|
||||
|
||||
class InvestorList(BaseModel):
|
||||
investors: List[InvestorData]
|
||||
investors: List[InvestorData]
|
||||
|
||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
@@ -9,7 +9,7 @@ from dotenv import load_dotenv
|
||||
from openai import OpenAI
|
||||
|
||||
from db import get_session, init_database
|
||||
from schema import CSVRow, Investor
|
||||
from py_schemas import CSVRow, Investor
|
||||
|
||||
# Load environment variables
|
||||
load_dotenv()
|
||||
|
||||
@@ -0,0 +1,290 @@
|
||||
import asyncio
|
||||
from typing import List, Optional
|
||||
|
||||
import chromadb
|
||||
import pandas as pd
|
||||
from db.models import CompanyTable, InvestorTable, InvestorTeamMember, SectorTable
|
||||
from langchain_core.prompts import PromptTemplate
|
||||
from langchain_openai import ChatOpenAI
|
||||
from py_schemas import InvestorData
|
||||
from pydantic import BaseModel
|
||||
from settings import settings
|
||||
|
||||
|
||||
class InvestorOutput(BaseModel):
|
||||
"""Schema for LLM structured output"""
|
||||
|
||||
investor_data: InvestorData
|
||||
|
||||
|
||||
class InvestorProcessor:
|
||||
def __init__(
|
||||
self,
|
||||
sql_session: Optional[object] = None,
|
||||
vector_db_client: Optional[object] = None,
|
||||
):
|
||||
self.template = """You are an expert data extraction assistant. Extract investor information from the provided CSV data and return it as a structured record.
|
||||
|
||||
Given the following CSV data row:
|
||||
{question}
|
||||
|
||||
Extract and structure the following fields for the investor:
|
||||
- name: The investor's full name
|
||||
- description: Description of the investor
|
||||
- aum: Assets under management (as integer, use 0 if not available)
|
||||
- check_size_lower: Lower bound of investment check size (as integer)
|
||||
- check_size_upper: Upper bound of investment check size (as integer)
|
||||
- geographic_focus: Geographic region focus
|
||||
- stage_focus: Investment stage focus (must be one of: seed, series_a, series_b, series_c, growth, late_stage)
|
||||
- number_of_investments: Number of investments made (default 0)
|
||||
|
||||
Also extract related data:
|
||||
- portfolio_companies: List of companies they've invested in
|
||||
- team_members: List of team members with name, role, email
|
||||
- sectors: List of sectors they focus on
|
||||
|
||||
Important:
|
||||
- If a field is not available, use appropriate defaults
|
||||
- stage_focus must be one of the valid enum values
|
||||
- Return clean, valid JSON only
|
||||
|
||||
Return the data as a single comprehensive investor data record."""
|
||||
|
||||
self.prompt = PromptTemplate(
|
||||
template=self.template, input_variables=["question"]
|
||||
)
|
||||
|
||||
self.llm = ChatOpenAI(
|
||||
api_key=settings.OPENROUTER_API_KEY,
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
model="google/gemini-2.5-flash-lite",
|
||||
temperature=0,
|
||||
)
|
||||
|
||||
self.structured_llm = self.llm.with_structured_output(InvestorOutput)
|
||||
self.sql_session = sql_session
|
||||
self.vector_db_client = vector_db_client
|
||||
|
||||
self.vector_db_client = chromadb.PersistentClient(path="./chroma_db")
|
||||
self.collection = self.vector_db_client.get_or_create_collection(
|
||||
name="investor_descriptions",
|
||||
metadata={
|
||||
"description": "Investor descriptions and investment thesis focus"
|
||||
},
|
||||
)
|
||||
|
||||
async def _process_row(
|
||||
self, row: pd.Series, row_idx: int
|
||||
) -> Optional[InvestorData]:
|
||||
"""Process a single row of data"""
|
||||
# Clean values to remove control characters
|
||||
cleaned_row = {}
|
||||
for key, value in row.items():
|
||||
if pd.notna(value):
|
||||
# Convert to string and clean control characters
|
||||
clean_value = (
|
||||
str(value)
|
||||
.replace("\n", " ")
|
||||
.replace("\r", " ")
|
||||
.replace("\t", " ")
|
||||
)
|
||||
# Remove other control characters
|
||||
clean_value = "".join(
|
||||
char
|
||||
for char in clean_value
|
||||
if ord(char) >= 32 or char in ["\n", "\r", "\t"]
|
||||
)
|
||||
cleaned_row[key] = clean_value
|
||||
|
||||
row_str = ", ".join(
|
||||
[f"{key}: {value}" for key, value in cleaned_row.items()]
|
||||
)
|
||||
|
||||
try:
|
||||
print(f"Processing row {row_idx + 1}...")
|
||||
result = await self.structured_llm.ainvoke(row_str)
|
||||
if result.investor_data:
|
||||
return result.investor_data
|
||||
return None
|
||||
except Exception as e:
|
||||
print(f"Error processing row {row_idx + 1}: {e}")
|
||||
return None
|
||||
|
||||
async def _save_to_sql(self, investor_data_list: List[InvestorData]) -> None:
|
||||
"""Save investors and related data to SQL database"""
|
||||
if not self.sql_session:
|
||||
return
|
||||
|
||||
try:
|
||||
for investor_data in investor_data_list:
|
||||
# Save investor
|
||||
db_investor = InvestorTable(
|
||||
name=investor_data.investor.name,
|
||||
description=investor_data.investor.description,
|
||||
aum=investor_data.investor.aum,
|
||||
check_size_lower=investor_data.investor.check_size_lower,
|
||||
check_size_upper=investor_data.investor.check_size_upper,
|
||||
geographic_focus=investor_data.investor.geographic_focus,
|
||||
stage_focus=investor_data.investor.stage_focus,
|
||||
number_of_investments=investor_data.investor.number_of_investments,
|
||||
)
|
||||
self.sql_session.add(db_investor)
|
||||
self.sql_session.flush() # Get the ID
|
||||
|
||||
# Save sectors and create associations
|
||||
for sector_data in investor_data.sectors:
|
||||
# Check if sector exists, create if not
|
||||
existing_sector = (
|
||||
self.sql_session.query(SectorTable)
|
||||
.filter(SectorTable.name == sector_data.name)
|
||||
.first()
|
||||
)
|
||||
|
||||
if not existing_sector:
|
||||
db_sector = SectorTable(name=sector_data.name)
|
||||
self.sql_session.add(db_sector)
|
||||
self.sql_session.flush()
|
||||
# Add sector to investor's sectors
|
||||
db_investor.sectors.append(db_sector)
|
||||
else:
|
||||
# Add existing sector to investor if not already there
|
||||
if existing_sector not in db_investor.sectors:
|
||||
db_investor.sectors.append(existing_sector)
|
||||
|
||||
# Save companies and create portfolio associations
|
||||
for company_data in investor_data.portfolio_companies:
|
||||
# Check if company exists, create if not
|
||||
existing_company = (
|
||||
self.sql_session.query(CompanyTable)
|
||||
.filter(CompanyTable.name == company_data.name)
|
||||
.first()
|
||||
)
|
||||
|
||||
if not existing_company:
|
||||
db_company = CompanyTable(
|
||||
name=company_data.name,
|
||||
industry=company_data.industry,
|
||||
location=company_data.location,
|
||||
founded_year=company_data.founded_year,
|
||||
website=company_data.website,
|
||||
)
|
||||
self.sql_session.add(db_company)
|
||||
self.sql_session.flush()
|
||||
|
||||
# Add to investor's portfolio
|
||||
db_investor.portfolio_companies.append(db_company)
|
||||
else:
|
||||
# Add existing company to portfolio if not already there
|
||||
if existing_company not in db_investor.portfolio_companies:
|
||||
db_investor.portfolio_companies.append(existing_company)
|
||||
|
||||
# Save team members
|
||||
for team_member_data in investor_data.team_members:
|
||||
# Check if team member exists
|
||||
existing_member = (
|
||||
self.sql_session.query(InvestorTeamMember)
|
||||
.filter(InvestorTeamMember.email == team_member_data.email)
|
||||
.first()
|
||||
)
|
||||
|
||||
if not existing_member:
|
||||
db_team_member = InvestorTeamMember(
|
||||
name=team_member_data.name,
|
||||
role=team_member_data.role,
|
||||
email=team_member_data.email,
|
||||
investor_id=db_investor.id,
|
||||
)
|
||||
self.sql_session.add(db_team_member)
|
||||
|
||||
self.sql_session.commit()
|
||||
print(f"Successfully saved {len(investor_data_list)} investors to database")
|
||||
|
||||
except Exception as e:
|
||||
self.sql_session.rollback()
|
||||
print(f"Error saving to SQL database: {e}")
|
||||
raise
|
||||
|
||||
async def _save_to_vector_db(self, investor_data_list: List[InvestorData]) -> None:
|
||||
"""Save investors to vector database"""
|
||||
if not self.vector_db_client:
|
||||
return
|
||||
|
||||
documents = []
|
||||
metadatas = []
|
||||
ids = []
|
||||
|
||||
for i, investor_data in enumerate(investor_data_list):
|
||||
investor = investor_data.investor
|
||||
sectors = ", ".join([s.name for s in investor_data.sectors])
|
||||
companies = ", ".join([c.name for c in investor_data.portfolio_companies])
|
||||
|
||||
doc_text = f"""
|
||||
Investor: {investor.name}
|
||||
Description: {investor.description or "N/A"}
|
||||
AUM: ${investor.aum:,}
|
||||
Check Size: ${investor.check_size_lower:,} - ${investor.check_size_upper:,}
|
||||
Geographic Focus: {investor.geographic_focus}
|
||||
Stage Focus: {investor.stage_focus.value}
|
||||
Sectors: {sectors}
|
||||
Portfolio Companies: {companies}
|
||||
""".strip()
|
||||
|
||||
documents.append(doc_text)
|
||||
metadatas.append(
|
||||
{
|
||||
"name": investor.name,
|
||||
"stage_focus": investor.stage_focus.value,
|
||||
"geographic_focus": investor.geographic_focus,
|
||||
"aum": investor.aum,
|
||||
}
|
||||
)
|
||||
ids.append(
|
||||
f"investor_{i}_{investor.name.replace(' ', '_').replace('/', '_')}"
|
||||
)
|
||||
|
||||
if documents:
|
||||
try:
|
||||
self.collection.add(documents=documents, metadatas=metadatas, ids=ids)
|
||||
print(
|
||||
f"Successfully saved {len(documents)} investors to vector database"
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"Error saving to vector database: {e}")
|
||||
|
||||
async def process_csv(
|
||||
self, df: pd.DataFrame, max_concurrent: int = 10
|
||||
) -> List[InvestorData]:
|
||||
"""Process CSV data one row at a time and save to databases"""
|
||||
results = []
|
||||
|
||||
# Create semaphore for concurrency control
|
||||
semaphore = asyncio.Semaphore(max_concurrent)
|
||||
|
||||
async def process_row_with_semaphore(row_data):
|
||||
row, row_idx = row_data
|
||||
async with semaphore:
|
||||
return await self._process_row(row, row_idx)
|
||||
|
||||
# Create row tasks
|
||||
row_tasks = []
|
||||
for idx, row in df.iterrows():
|
||||
row_tasks.append((row, idx))
|
||||
|
||||
# Execute all rows concurrently
|
||||
row_results = await asyncio.gather(
|
||||
*[process_row_with_semaphore(row_data) for row_data in row_tasks],
|
||||
return_exceptions=True,
|
||||
)
|
||||
|
||||
# Collect results, filtering out exceptions and None values
|
||||
for row_result in row_results:
|
||||
if not isinstance(row_result, Exception) and row_result is not None:
|
||||
results.append(row_result)
|
||||
|
||||
# Save to databases
|
||||
if results:
|
||||
print(f"Successfully processed {len(results)} investors")
|
||||
await self._save_to_sql(results)
|
||||
await self._save_to_vector_db(results)
|
||||
|
||||
return results
|
||||
+201
-4
@@ -1,13 +1,15 @@
|
||||
from typing import Optional
|
||||
from typing import List, Optional
|
||||
|
||||
import chromadb
|
||||
from db.models import InvestorTable
|
||||
from langchain import hub
|
||||
from langchain_community.agent_toolkits import SQLDatabaseToolkit
|
||||
from langchain_community.utilities import SQLDatabase
|
||||
from langchain_openai import ChatOpenAI
|
||||
from langgraph.prebuilt import create_react_agent
|
||||
from py_schemas import InvestorList
|
||||
from py_schemas import InvestorData, InvestorList
|
||||
from settings import settings
|
||||
from sqlalchemy.orm import selectinload
|
||||
|
||||
# Connect to SQLite
|
||||
|
||||
@@ -25,6 +27,7 @@ class QueryProcessor:
|
||||
sql_session: Optional[object] = None,
|
||||
vector_db_client: Optional[object] = None,
|
||||
):
|
||||
self.sql_session = sql_session
|
||||
self.llm = ChatOpenAI(
|
||||
api_key=settings.OPENROUTER_API_KEY,
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
@@ -36,7 +39,6 @@ class QueryProcessor:
|
||||
model=self.llm,
|
||||
tools=self.toolkit.get_tools() + [self.query_vector_database],
|
||||
prompt=system_message,
|
||||
response_format=InvestorList,
|
||||
)
|
||||
self.vector_db_client = vector_db_client
|
||||
|
||||
@@ -77,7 +79,202 @@ class QueryProcessor:
|
||||
|
||||
def process_query(self, question: str) -> InvestorList:
|
||||
"""Process a query using the LLM and return structured investor data."""
|
||||
# Extract filters from the query first
|
||||
filters = self._extract_filters_from_query(question)
|
||||
|
||||
# Get AI response for additional context
|
||||
response = self.agent.invoke(
|
||||
{"messages": [("user", question)]},
|
||||
)
|
||||
return response
|
||||
|
||||
# Extract the actual message content
|
||||
ai_response = (
|
||||
response["messages"][-1].content if response.get("messages") else ""
|
||||
)
|
||||
|
||||
# Try to extract investor IDs or names from the AI response
|
||||
investor_ids = self._extract_investor_info_from_response(ai_response)
|
||||
|
||||
# Fetch filtered investor data with relationships from database
|
||||
return self._fetch_investors_with_relationships(investor_ids, filters)
|
||||
|
||||
def _extract_investor_info_from_response(self, ai_response: str) -> List[int]:
|
||||
"""Extract investor IDs from AI response. This is a simple implementation."""
|
||||
# This is a basic implementation - you might want to make it more sophisticated
|
||||
# based on how your AI formats responses
|
||||
investor_ids = []
|
||||
|
||||
# If the AI can't provide structured data, fall back to getting all investors
|
||||
# that match basic criteria
|
||||
try:
|
||||
# Try to extract numbers that might be IDs
|
||||
import re
|
||||
|
||||
ids = re.findall(r"\bid:\s*(\d+)", ai_response.lower())
|
||||
investor_ids = [int(id_str) for id_str in ids]
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return investor_ids if investor_ids else []
|
||||
|
||||
def _extract_filters_from_query(self, question: str) -> dict:
|
||||
"""Extract filter criteria from natural language query."""
|
||||
question_lower = question.lower()
|
||||
filters = {}
|
||||
|
||||
# Extract stage filters
|
||||
if any(
|
||||
stage in question_lower
|
||||
for stage in [
|
||||
"seed",
|
||||
"series a",
|
||||
"series b",
|
||||
"series c",
|
||||
"growth",
|
||||
"late stage",
|
||||
]
|
||||
):
|
||||
if "seed" in question_lower:
|
||||
filters["stage"] = "SEED"
|
||||
elif "series a" in question_lower:
|
||||
filters["stage"] = "SERIES_A"
|
||||
elif "series b" in question_lower:
|
||||
filters["stage"] = "SERIES_B"
|
||||
elif "series c" in question_lower:
|
||||
filters["stage"] = "SERIES_C"
|
||||
elif "growth" in question_lower:
|
||||
filters["stage"] = "GROWTH"
|
||||
elif "late stage" in question_lower:
|
||||
filters["stage"] = "LATE_STAGE"
|
||||
|
||||
# Extract geographic filters
|
||||
if any(
|
||||
geo in question_lower
|
||||
for geo in [
|
||||
"us",
|
||||
"usa",
|
||||
"united states",
|
||||
"europe",
|
||||
"asia",
|
||||
"silicon valley",
|
||||
"bay area",
|
||||
]
|
||||
):
|
||||
if (
|
||||
"us" in question_lower
|
||||
or "usa" in question_lower
|
||||
or "united states" in question_lower
|
||||
):
|
||||
filters["geography"] = "US"
|
||||
elif "europe" in question_lower:
|
||||
filters["geography"] = "Europe"
|
||||
elif "asia" in question_lower:
|
||||
filters["geography"] = "Asia"
|
||||
elif "silicon valley" in question_lower or "bay area" in question_lower:
|
||||
filters["geography"] = "Silicon Valley"
|
||||
|
||||
# Extract sector filters
|
||||
sectors = [
|
||||
"fintech",
|
||||
"healthcare",
|
||||
"saas",
|
||||
"ai",
|
||||
"biotech",
|
||||
"consumer",
|
||||
"enterprise",
|
||||
"crypto",
|
||||
"blockchain",
|
||||
]
|
||||
for sector in sectors:
|
||||
if sector in question_lower:
|
||||
filters["sector"] = sector
|
||||
break
|
||||
|
||||
# Extract check size filters (simple patterns)
|
||||
import re
|
||||
|
||||
amounts = re.findall(
|
||||
r"\$?(\d+(?:,\d{3})*(?:\.\d+)?)\s*(?:million|m|k|thousand)", question_lower
|
||||
)
|
||||
if amounts:
|
||||
amount = amounts[0].replace(",", "")
|
||||
if "million" in question_lower or "m" in question_lower:
|
||||
filters["min_check_size"] = int(float(amount) * 1000000)
|
||||
elif "thousand" in question_lower or "k" in question_lower:
|
||||
filters["min_check_size"] = int(float(amount) * 1000)
|
||||
|
||||
return filters
|
||||
|
||||
def _fetch_investors_with_relationships(
|
||||
self, investor_ids: List[int] = None, filters: dict = None
|
||||
) -> InvestorList:
|
||||
"""Fetch investors with all their relationships from the database."""
|
||||
if not self.sql_session:
|
||||
return InvestorList(investors=[])
|
||||
|
||||
# Import here to avoid circular imports
|
||||
from db.models import SectorTable
|
||||
|
||||
# Build query with all relationships loaded
|
||||
query = self.sql_session.query(InvestorTable).options(
|
||||
selectinload(InvestorTable.portfolio_companies),
|
||||
selectinload(InvestorTable.team_members),
|
||||
selectinload(InvestorTable.sectors),
|
||||
)
|
||||
|
||||
# Apply filters if provided
|
||||
if filters:
|
||||
if "stage" in filters:
|
||||
from db.models import InvestmentStage
|
||||
|
||||
stage_enum = getattr(InvestmentStage, filters["stage"])
|
||||
query = query.filter(InvestorTable.stage_focus == stage_enum)
|
||||
|
||||
if "geography" in filters:
|
||||
query = query.filter(
|
||||
InvestorTable.geographic_focus.ilike(f"%{filters['geography']}%")
|
||||
)
|
||||
|
||||
if "min_check_size" in filters:
|
||||
query = query.filter(
|
||||
InvestorTable.check_size_lower >= filters["min_check_size"]
|
||||
)
|
||||
|
||||
if "max_check_size" in filters:
|
||||
query = query.filter(
|
||||
InvestorTable.check_size_upper <= filters["max_check_size"]
|
||||
)
|
||||
|
||||
if "min_aum" in filters:
|
||||
query = query.filter(InvestorTable.aum >= filters["min_aum"])
|
||||
|
||||
if "max_aum" in filters:
|
||||
query = query.filter(InvestorTable.aum <= filters["max_aum"])
|
||||
|
||||
if "sector" in filters:
|
||||
query = query.join(InvestorTable.sectors).filter(
|
||||
SectorTable.name.ilike(f"%{filters['sector']}%")
|
||||
)
|
||||
|
||||
# Filter by IDs if provided
|
||||
if investor_ids:
|
||||
query = query.filter(InvestorTable.id.in_(investor_ids))
|
||||
else:
|
||||
# If no specific IDs and no filters, limit to prevent overwhelming response
|
||||
if not filters:
|
||||
query = query.limit(10)
|
||||
|
||||
investors = query.all()
|
||||
|
||||
# Transform to InvestorData format
|
||||
investor_data_list = []
|
||||
for investor in investors:
|
||||
investor_data = InvestorData(
|
||||
investor=investor,
|
||||
portfolio_companies=investor.portfolio_companies,
|
||||
team_members=investor.team_members,
|
||||
sectors=investor.sectors,
|
||||
)
|
||||
investor_data_list.append(investor_data)
|
||||
|
||||
return InvestorList(investors=investor_data_list)
|
||||
|
||||
+139
-16
@@ -1,16 +1,139 @@
|
||||
# Core dependencies
|
||||
pandas>=2.0.0
|
||||
sqlalchemy>=2.0.0
|
||||
pydantic>=2.0.0
|
||||
|
||||
# Vector database
|
||||
chromadb>=0.4.0
|
||||
|
||||
# LLM integration
|
||||
openai>=1.0.0
|
||||
|
||||
# Environment management
|
||||
python-dotenv>=1.0.0
|
||||
|
||||
# Additional dependencies for data processing
|
||||
typing-extensions>=4.0.0
|
||||
aiohappyeyeballs==2.6.1
|
||||
aiohttp==3.12.15
|
||||
aiosignal==1.4.0
|
||||
annotated-types==0.7.0
|
||||
anyio==4.10.0
|
||||
attrs==25.3.0
|
||||
backoff==2.2.1
|
||||
bcrypt==4.3.0
|
||||
build==1.3.0
|
||||
cachetools==5.5.2
|
||||
certifi==2025.8.3
|
||||
charset-normalizer==3.4.3
|
||||
chromadb==1.0.20
|
||||
click==8.2.1
|
||||
coloredlogs==15.0.1
|
||||
dataclasses-json==0.6.7
|
||||
distro==1.9.0
|
||||
dnspython==2.7.0
|
||||
durationpy==0.10
|
||||
email-validator==2.3.0
|
||||
fastapi==0.116.1
|
||||
fastapi-cli==0.0.8
|
||||
fastapi-cloud-cli==0.1.5
|
||||
filelock==3.19.1
|
||||
flatbuffers==25.2.10
|
||||
frozenlist==1.7.0
|
||||
fsspec==2025.7.0
|
||||
google-auth==2.40.3
|
||||
googleapis-common-protos==1.70.0
|
||||
greenlet==3.2.4
|
||||
grpcio==1.74.0
|
||||
h11==0.16.0
|
||||
hf-xet==1.1.8
|
||||
httpcore==1.0.9
|
||||
httptools==0.6.4
|
||||
httpx==0.28.1
|
||||
httpx-sse==0.4.1
|
||||
huggingface-hub==0.34.4
|
||||
humanfriendly==10.0
|
||||
idna==3.10
|
||||
importlib-metadata==8.7.0
|
||||
importlib-resources==6.5.2
|
||||
itsdangerous==2.2.0
|
||||
jinja2==3.1.6
|
||||
jiter==0.10.0
|
||||
jsonpatch==1.33
|
||||
jsonpointer==3.0.0
|
||||
jsonschema==4.25.1
|
||||
jsonschema-specifications==2025.4.1
|
||||
kubernetes==33.1.0
|
||||
langchain==0.3.27
|
||||
langchain-community==0.3.29
|
||||
langchain-core==0.3.75
|
||||
langchain-openai==0.3.32
|
||||
langchain-text-splitters==0.3.10
|
||||
langgraph==0.6.6
|
||||
langgraph-checkpoint==2.1.1
|
||||
langgraph-prebuilt==0.6.4
|
||||
langgraph-sdk==0.2.4
|
||||
langsmith==0.4.20
|
||||
markdown-it-py==4.0.0
|
||||
markupsafe==3.0.2
|
||||
marshmallow==3.26.1
|
||||
mdurl==0.1.2
|
||||
mmh3==5.2.0
|
||||
mpmath==1.3.0
|
||||
multidict==6.6.4
|
||||
mypy-extensions==1.1.0
|
||||
numpy==2.3.2
|
||||
oauthlib==3.3.1
|
||||
onnxruntime==1.22.1
|
||||
openai==1.102.0
|
||||
opentelemetry-api==1.36.0
|
||||
opentelemetry-exporter-otlp-proto-common==1.36.0
|
||||
opentelemetry-exporter-otlp-proto-grpc==1.36.0
|
||||
opentelemetry-proto==1.36.0
|
||||
opentelemetry-sdk==1.36.0
|
||||
opentelemetry-semantic-conventions==0.57b0
|
||||
orjson==3.11.3
|
||||
ormsgpack==1.10.0
|
||||
overrides==7.7.0
|
||||
packaging==25.0
|
||||
pandas==2.3.2
|
||||
pip==25.2
|
||||
posthog==5.4.0
|
||||
propcache==0.3.2
|
||||
protobuf==6.32.0
|
||||
pyasn1==0.6.1
|
||||
pyasn1-modules==0.4.2
|
||||
pybase64==1.4.2
|
||||
pydantic==2.11.7
|
||||
pydantic-core==2.33.2
|
||||
pydantic-extra-types==2.10.5
|
||||
pydantic-settings==2.10.1
|
||||
pygments==2.19.2
|
||||
pypika==0.48.9
|
||||
pyproject-hooks==1.2.0
|
||||
python-dateutil==2.9.0.post0
|
||||
python-dotenv==1.1.1
|
||||
python-multipart==0.0.20
|
||||
pytz==2025.2
|
||||
pyyaml==6.0.2
|
||||
referencing==0.36.2
|
||||
regex==2025.7.34
|
||||
requests==2.32.5
|
||||
requests-oauthlib==2.0.0
|
||||
requests-toolbelt==1.0.0
|
||||
rich==14.1.0
|
||||
rich-toolkit==0.15.0
|
||||
rignore==0.6.4
|
||||
rpds-py==0.27.1
|
||||
rsa==4.9.1
|
||||
sentry-sdk==2.35.1
|
||||
shellingham==1.5.4
|
||||
six==1.17.0
|
||||
sniffio==1.3.1
|
||||
sqlalchemy==2.0.43
|
||||
starlette==0.47.3
|
||||
sympy==1.14.0
|
||||
tenacity==9.1.2
|
||||
tiktoken==0.11.0
|
||||
tokenizers==0.21.4
|
||||
tqdm==4.67.1
|
||||
typer==0.16.1
|
||||
typing-extensions==4.15.0
|
||||
typing-inspect==0.9.0
|
||||
typing-inspection==0.4.1
|
||||
tzdata==2025.2
|
||||
ujson==5.11.0
|
||||
urllib3==2.5.0
|
||||
uvicorn==0.35.0
|
||||
uvloop==0.21.0
|
||||
watchfiles==1.1.0
|
||||
websocket-client==1.8.0
|
||||
websockets==15.0.1
|
||||
xxhash==3.5.0
|
||||
yarl==1.20.1
|
||||
zipp==3.23.0
|
||||
zstandard==0.24.0
|
||||
|
||||
Reference in New Issue
Block a user