Commit Graph

7 Commits

Author SHA1 Message Date
bolade 84e3c7b72a feat: Implement database ingestion for investors and companies
- Added main ingestion logic in main.py to process CSV files for investors and companies.
- Implemented data cleaning functions for names, strings, integers, and websites.
- Established relationships between investors, companies, and sectors using SQLAlchemy ORM.
- Created models for investors, companies, sectors, and their relationships in models.py.
- Set up logging for error tracking during data processing.
- Initialized database and created necessary tables.
2025-10-07 20:01:19 +01:00
bolade 1f3f08e80d Remove deprecated stage_focus column and update database path for consistency; add schema verification script and document schema mismatch fixes 2025-10-07 11:31:16 +01:00
bolade f2bbcb96f3 Refactor database models and schemas to allow nullable fields; update init_database function for improved initialization. 2025-09-26 15:24:42 +01:00
bolade 0f7beca5e1 made version 2 2025-09-25 17:00:38 +01:00
bolade b1b1c5ea1e Made improvements to parsing 2025-09-11 16:23:22 +01:00
bolade 7b58834316 Refactor investor-related schemas and models; update database configuration and enhance investor processing logic 2025-09-02 15:51:35 +01:00
bolade ba0ed169ce Implement investor processing and querying functionality
- Added InvestorProcessor class for processing CSV data in batches and saving to SQL and vector databases.
- Introduced QueryProcessor class for querying investor information from SQL and vector databases.
- Integrated OpenAI's ChatGPT for structured output generation.
- Implemented data cleaning and control character removal in CSV processing.
- Added asynchronous processing capabilities for batch handling.
- Established connection to ChromaDB for vector storage of investor descriptions.
- Defined structured output schemas using Pydantic for investor data validation.
- Enhanced settings management for API key and database configurations.
2025-08-29 18:42:55 +01:00