Commit Graph

26 Commits

Author SHA1 Message Date
bolade 84e3c7b72a feat: Implement database ingestion for investors and companies
- Added main ingestion logic in main.py to process CSV files for investors and companies.
- Implemented data cleaning functions for names, strings, integers, and websites.
- Established relationships between investors, companies, and sectors using SQLAlchemy ORM.
- Created models for investors, companies, sectors, and their relationships in models.py.
- Set up logging for error tracking during data processing.
- Initialized database and created necessary tables.
2025-10-07 20:01:19 +01:00
bolade a9589e54f3 feat: Refactor Fund schema to use many-to-many relationships for investment stages and sectors
- Updated FundTable to replace JSON fields for investment stages and sectors with relationships.
- Introduced InvestmentStageTable and fund_investment_stages association table.
- Created fund_sectors association table for many-to-many relationship with sectors.
- Changed geographic_focus from JSON array to a simple string.
- Migrated existing data to new schema, ensuring data integrity and normalization.
- Updated related schemas, routers, and services to reflect new structure.
- Added migration script to handle data transformation and schema updates.
- Implemented tests to verify new relationships and data integrity.
2025-10-07 15:57:29 +01:00
bolade d341cacb9a Refactor investor and fund schemas to support new check size range
- Removed deprecated `stage_focus` column from `InvestorTable` and `InvestorSchema`.
- Updated `FundTable` to change `fund_size` from VARCHAR to INTEGER and added `check_size_lower` and `check_size_upper` columns.
- Modified API routes to return investor-fund combinations as separate entries.
- Created new `InvestorFundData` schema for combined investor-fund responses.
- Implemented LLM parsing for check size range from estimated investment size.
- Updated database migration script to reflect schema changes and ensure data integrity.
- Removed obsolete verification and test scripts related to the old schema.
2025-10-07 15:24:36 +01:00
bolade c0fbbdd917 Implement manual JSON parsing for company profiles; enhance data extraction and processing efficiency; add comprehensive test script for validation 2025-10-07 12:07:43 +01:00
bolade 1f3f08e80d Remove deprecated stage_focus column and update database path for consistency; add schema verification script and document schema mismatch fixes 2025-10-07 11:31:16 +01:00
bolade cd7172ed9f Add test script for manual JSON parser with LLM currency conversion
- Implemented a new test script `test_parser.py` to validate the functionality of the manual JSON parser.
- The script loads investor data from a CSV file and processes a sample of three investors.
- Results include detailed information about each investor, their funds, team members, and investment thesis.
- Added error handling for missing API key in the environment variables.
2025-10-06 14:07:28 +01:00
bolade c199f5423a Refactor code structure for improved readability and maintainability 2025-10-06 12:57:08 +01:00
bolade a2b3ceedbe Added funds table 2025-10-05 19:16:03 +01:00
bolade 3842171549 Update .gitignore to exclude preprocessor directory; refactor find_similar_investors function to improve similarity scoring based on investor characteristics and add limit parameter for results. 2025-10-01 23:29:29 +01:00
bolade 17bc5acbc8 Refactor investor similarity search to utilize AI for improved query generation; adjust DataFrame parsing to skip initial rows for better data handling. 2025-09-29 15:58:09 +01:00
bolade 6caea96658 Update server host and port configuration for deployment 2025-09-27 11:16:18 +01:00
bolade 6d902345c0 Refactor investor and company schemas to allow optional fields; update filtering logic in read_companies function and add find_similar_investors endpoint; change LLM model in InvestorProcessor and QueryProcessor for improved performance. 2025-09-27 10:45:08 +01:00
bolade d36367fbe9 Add project management functionality with CRUD operations and associations; introduce project schemas and update main application routing. 2025-09-27 08:53:59 +01:00
bolade abac19c6ae Update .gitignore to exclude __pycache__ directories and modify schemas to allow optional fields for better flexibility; adjust batch size in InvestorProcessor for improved processing efficiency. 2025-09-26 15:56:29 +01:00
bolade f2bbcb96f3 Refactor database models and schemas to allow nullable fields; update init_database function for improved initialization. 2025-09-26 15:24:42 +01:00
bolade 0f7beca5e1 made version 2 2025-09-25 17:00:38 +01:00
bolade b1b1c5ea1e Made improvements to parsing 2025-09-11 16:23:22 +01:00
bolade 29d9292cbd Fix database URL in db.py and update import path for schemas in llm_parser.py 2025-09-11 15:46:39 +01:00
bolade edd0ae910b Refactor investor and company management API with FastAPI integration
- Updated README.md to reflect new features and architecture.
- Implemented company management routes in app/api/companies.py.
- Enhanced main FastAPI application in app/main.py to include company routes and query processing.
- Improved querying capabilities in app/services/querying.py with natural language processing for investor searches.
- Updated requirements.txt to include necessary dependencies for FastAPI and related libraries.
- Added comprehensive error handling and response formatting for API endpoints.
2025-09-03 10:32:19 +01:00
bolade 84cbb888e6 Refactor investor-related schemas and models; implement investor CRUD operations and update stage_focus values to uppercase 2025-09-03 09:41:19 +01:00
bolade 7b58834316 Refactor investor-related schemas and models; update database configuration and enhance investor processing logic 2025-09-02 15:51:35 +01:00
bolade 65b5df3a43 Add CompanyTable model and refactor query handling; update requirements for new dependencies 2025-09-02 12:22:50 +01:00
bolade 74931f235e Refactor imports and enhance query functionality with LangGraph integration; update requirements for new dependencies 2025-08-30 13:56:19 +01:00
bolade ba0ed169ce Implement investor processing and querying functionality
- Added InvestorProcessor class for processing CSV data in batches and saving to SQL and vector databases.
- Introduced QueryProcessor class for querying investor information from SQL and vector databases.
- Integrated OpenAI's ChatGPT for structured output generation.
- Implemented data cleaning and control character removal in CSV processing.
- Added asynchronous processing capabilities for batch handling.
- Established connection to ChromaDB for vector storage of investor descriptions.
- Defined structured output schemas using Pydantic for investor data validation.
- Enhanced settings management for API key and database configurations.
2025-08-29 18:42:55 +01:00
bolade 4c99638d94 Remove deprecated demo, ingest, schema, and test parser files; add new LLM parser implementation and settings configuration 2025-08-28 23:09:14 +01:00
bolade bbf6af58f0 Implement LLM-powered Investor Parser with CSV processing, SQL and vector database integration
- Added FastAPI application with a simple root endpoint.
- Developed LLMInvestorParser class for processing investor data from CSV files.
- Integrated OpenAI API for LLM enhancements and JSON cleaning.
- Implemented structured data extraction and saving to SQL database.
- Added functionality to save investor descriptions to ChromaDB for vector similarity search.
- Created command-line interface for processing files and searching investors.
- Added schema definitions for Investor and related data models using SQLAlchemy and Pydantic.
- Implemented logging for better traceability and error handling.
- Included requirements.txt for dependency management.
2025-08-28 22:51:58 +01:00