feat: Implement stock listing extraction and database population
- Added `extract_listings.py` for extracting stock listings from TSX, TSXV, CSE, and CBOE using Playwright. - Created `main.py` to orchestrate the entire stock intelligence system, including extraction, database import, financial scraping, news scraping, and report generation. - Developed `populate_database.py` to populate the database with existing JSON data. - Introduced `scrape_nasdaq_tsx_only.py` for focused scraping of NASDAQ and TSX stocks. - Added `setup.py` for initial setup and testing of the system. - Created `watchlist.txt` template for user-defined stock tracking. - Generated `final_test_output.txt` to log the results of the test run.
This commit is contained in:
+179
@@ -0,0 +1,179 @@
|
||||
"""
|
||||
Visual diagram of the data flow
|
||||
"""
|
||||
|
||||
FLOW_DIAGRAM = """
|
||||
═══════════════════════════════════════════════════════════════════════
|
||||
STOCK INTELLIGENCE SYSTEM - DATA FLOW
|
||||
═══════════════════════════════════════════════════════════════════════
|
||||
|
||||
STEP 1: EXTRACT STOCK LISTINGS
|
||||
═══════════════════════════════
|
||||
┌─────────────────────┐
|
||||
│ Exchange Websites │
|
||||
│ - tsx.com │
|
||||
│ - thecse.com │
|
||||
│ - cboe.com │
|
||||
└──────────┬──────────┘
|
||||
│ (Playwright scrapes)
|
||||
▼
|
||||
┌─────────────┐
|
||||
│ JSON Files │ → data/listings/tsx_tsxv_listings.json
|
||||
│ with stock │ → data/listings/cse_listings.json
|
||||
│ listings │ → data/listings/cboe_listings.json
|
||||
└──────┬──────┘ → data/listings/all_listings_combined.json
|
||||
│
|
||||
|
||||
STEP 2: IMPORT TO DATABASE
|
||||
═══════════════════════════════
|
||||
│
|
||||
▼
|
||||
┌────────────────┐
|
||||
│ SQLite DB │
|
||||
│ stocks.db │ Tables:
|
||||
│ │ - stocks_master (ABC, 123 companies)
|
||||
│ │ - coverage_report
|
||||
└────────┬───────┘
|
||||
│
|
||||
|
||||
STEP 3: SCRAPE FINANCIALS
|
||||
═══════════════════════════════
|
||||
│
|
||||
┌────────┴────────────────────────┐
|
||||
│ For each stock in database: │
|
||||
│ ticker = "ABC" │
|
||||
└────────┬────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ Yahoo Finance │
|
||||
│ finance.yahoo.com│
|
||||
│ /quote/ABC.TO │
|
||||
└────────┬─────────┘
|
||||
│ (Scrape)
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ JSON File │ → data/financials/ABC_yahoo.json
|
||||
│ - Price │
|
||||
│ - Market Cap │
|
||||
│ - Financials │
|
||||
│ - Ratios │
|
||||
└────────┬─────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ Update Database │
|
||||
│ has_financials=1 │
|
||||
└────────┬─────────┘
|
||||
|
||||
STEP 4: SCRAPE NEWS & PRESS RELEASES
|
||||
═══════════════════════════════════════
|
||||
│
|
||||
┌────────┴────────────────────────┐
|
||||
│ For each stock: │
|
||||
│ company = "ABC Corp" │
|
||||
└────────┬────────────────────────┘
|
||||
│
|
||||
┌────────┴──────────┬─────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌──────────┐ ┌──────────────┐ ┌─────────────┐
|
||||
│ Google │ │GlobeNewswire │ │Newswire.ca │
|
||||
│ News │ │.com/ABC │ │/search/ABC │
|
||||
└────┬─────┘ └──────┬───────┘ └──────┬──────┘
|
||||
│ │ │
|
||||
│ (Scrape) │ (Scrape) │ (Scrape)
|
||||
│ │ │
|
||||
└─────────────────┴────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────┐
|
||||
│ JSON File │ → data/news/ABC_news_pr.json
|
||||
│ - News (15) │
|
||||
│ - PR (8) │
|
||||
└────────┬───────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────┐
|
||||
│ Update DB │
|
||||
│ has_news=1 │
|
||||
│ has_pr=1 │
|
||||
└────────┬───────┘
|
||||
|
||||
STEP 5: GENERATE REPORTS
|
||||
═══════════════════════════════
|
||||
│
|
||||
┌────────┴───────────────────────┐
|
||||
│ Combine all data for each stock│
|
||||
└────────┬───────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────┐
|
||||
│ Text Report │ → data/reports/ABC_report.txt
|
||||
│ │
|
||||
│ [TICKER INFO] │
|
||||
│ [FINANCIALS] │
|
||||
│ [NEWS] │
|
||||
│ [PRESS REL] │
|
||||
└────────────────┘
|
||||
|
||||
═══════════════════════════════════════════════════════════════════════
|
||||
FINAL OUTPUT
|
||||
═══════════════════════════════════════════════════════════════════════
|
||||
|
||||
data/
|
||||
├── listings/
|
||||
│ └── all_listings_combined.json [List of all stocks]
|
||||
│
|
||||
├── financials/
|
||||
│ ├── ABC_yahoo.json [Financial data per stock]
|
||||
│ ├── XYZ_yahoo.json
|
||||
│ └── ...
|
||||
│
|
||||
├── news/
|
||||
│ ├── ABC_news_pr.json [News & PR per stock]
|
||||
│ ├── XYZ_news_pr.json
|
||||
│ └── ...
|
||||
│
|
||||
├── reports/
|
||||
│ ├── ABC_report.txt [Final report per stock]
|
||||
│ ├── XYZ_report.txt
|
||||
│ └── ...
|
||||
│
|
||||
└── stocks.db [SQLite database with everything]
|
||||
|
||||
═══════════════════════════════════════════════════════════════════════
|
||||
DATA RELATIONSHIPS
|
||||
═══════════════════════════════════════════════════════════════════════
|
||||
|
||||
stocks_master (Main table)
|
||||
↓
|
||||
├── financial_statements (Links via stock_id)
|
||||
├── financial_metrics (Links via stock_id)
|
||||
├── news_articles (Links via stock_id)
|
||||
├── press_releases (Links via stock_id)
|
||||
├── filings (Links via stock_id)
|
||||
├── agm_info (Links via stock_id)
|
||||
└── tax_disclosures (Links via stock_id)
|
||||
|
||||
coverage_report (Tracks completeness)
|
||||
↓
|
||||
Links to stocks_master via ticker
|
||||
|
||||
═══════════════════════════════════════════════════════════════════════
|
||||
|
||||
EXECUTION FLOW:
|
||||
|
||||
1. User runs: python main.py
|
||||
2. System extracts listings from exchanges
|
||||
3. System imports to database
|
||||
4. System scrapes Yahoo Finance for each stock
|
||||
5. System scrapes news for each stock
|
||||
6. System generates reports
|
||||
7. Done! All data in data/ folder
|
||||
|
||||
═══════════════════════════════════════════════════════════════════════
|
||||
"""
|
||||
|
||||
if __name__ == "__main__":
|
||||
print(FLOW_DIAGRAM)
|
||||
Reference in New Issue
Block a user