80ee708348
- Added `extract_listings.py` for extracting stock listings from TSX, TSXV, CSE, and CBOE using Playwright. - Created `main.py` to orchestrate the entire stock intelligence system, including extraction, database import, financial scraping, news scraping, and report generation. - Developed `populate_database.py` to populate the database with existing JSON data. - Introduced `scrape_nasdaq_tsx_only.py` for focused scraping of NASDAQ and TSX stocks. - Added `setup.py` for initial setup and testing of the system. - Created `watchlist.txt` template for user-defined stock tracking. - Generated `final_test_output.txt` to log the results of the test run.
180 lines
8.6 KiB
Python
180 lines
8.6 KiB
Python
"""
|
|
Visual diagram of the data flow
|
|
"""
|
|
|
|
FLOW_DIAGRAM = """
|
|
═══════════════════════════════════════════════════════════════════════
|
|
STOCK INTELLIGENCE SYSTEM - DATA FLOW
|
|
═══════════════════════════════════════════════════════════════════════
|
|
|
|
STEP 1: EXTRACT STOCK LISTINGS
|
|
═══════════════════════════════
|
|
┌─────────────────────┐
|
|
│ Exchange Websites │
|
|
│ - tsx.com │
|
|
│ - thecse.com │
|
|
│ - cboe.com │
|
|
└──────────┬──────────┘
|
|
│ (Playwright scrapes)
|
|
▼
|
|
┌─────────────┐
|
|
│ JSON Files │ → data/listings/tsx_tsxv_listings.json
|
|
│ with stock │ → data/listings/cse_listings.json
|
|
│ listings │ → data/listings/cboe_listings.json
|
|
└──────┬──────┘ → data/listings/all_listings_combined.json
|
|
│
|
|
|
|
STEP 2: IMPORT TO DATABASE
|
|
═══════════════════════════════
|
|
│
|
|
▼
|
|
┌────────────────┐
|
|
│ SQLite DB │
|
|
│ stocks.db │ Tables:
|
|
│ │ - stocks_master (ABC, 123 companies)
|
|
│ │ - coverage_report
|
|
└────────┬───────┘
|
|
│
|
|
|
|
STEP 3: SCRAPE FINANCIALS
|
|
═══════════════════════════════
|
|
│
|
|
┌────────┴────────────────────────┐
|
|
│ For each stock in database: │
|
|
│ ticker = "ABC" │
|
|
└────────┬────────────────────────┘
|
|
│
|
|
▼
|
|
┌──────────────────┐
|
|
│ Yahoo Finance │
|
|
│ finance.yahoo.com│
|
|
│ /quote/ABC.TO │
|
|
└────────┬─────────┘
|
|
│ (Scrape)
|
|
▼
|
|
┌──────────────────┐
|
|
│ JSON File │ → data/financials/ABC_yahoo.json
|
|
│ - Price │
|
|
│ - Market Cap │
|
|
│ - Financials │
|
|
│ - Ratios │
|
|
└────────┬─────────┘
|
|
│
|
|
▼
|
|
┌──────────────────┐
|
|
│ Update Database │
|
|
│ has_financials=1 │
|
|
└────────┬─────────┘
|
|
|
|
STEP 4: SCRAPE NEWS & PRESS RELEASES
|
|
═══════════════════════════════════════
|
|
│
|
|
┌────────┴────────────────────────┐
|
|
│ For each stock: │
|
|
│ company = "ABC Corp" │
|
|
└────────┬────────────────────────┘
|
|
│
|
|
┌────────┴──────────┬─────────────────────┐
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
┌──────────┐ ┌──────────────┐ ┌─────────────┐
|
|
│ Google │ │GlobeNewswire │ │Newswire.ca │
|
|
│ News │ │.com/ABC │ │/search/ABC │
|
|
└────┬─────┘ └──────┬───────┘ └──────┬──────┘
|
|
│ │ │
|
|
│ (Scrape) │ (Scrape) │ (Scrape)
|
|
│ │ │
|
|
└─────────────────┴────────────────────┘
|
|
│
|
|
▼
|
|
┌────────────────┐
|
|
│ JSON File │ → data/news/ABC_news_pr.json
|
|
│ - News (15) │
|
|
│ - PR (8) │
|
|
└────────┬───────┘
|
|
│
|
|
▼
|
|
┌────────────────┐
|
|
│ Update DB │
|
|
│ has_news=1 │
|
|
│ has_pr=1 │
|
|
└────────┬───────┘
|
|
|
|
STEP 5: GENERATE REPORTS
|
|
═══════════════════════════════
|
|
│
|
|
┌────────┴───────────────────────┐
|
|
│ Combine all data for each stock│
|
|
└────────┬───────────────────────┘
|
|
│
|
|
▼
|
|
┌────────────────┐
|
|
│ Text Report │ → data/reports/ABC_report.txt
|
|
│ │
|
|
│ [TICKER INFO] │
|
|
│ [FINANCIALS] │
|
|
│ [NEWS] │
|
|
│ [PRESS REL] │
|
|
└────────────────┘
|
|
|
|
═══════════════════════════════════════════════════════════════════════
|
|
FINAL OUTPUT
|
|
═══════════════════════════════════════════════════════════════════════
|
|
|
|
data/
|
|
├── listings/
|
|
│ └── all_listings_combined.json [List of all stocks]
|
|
│
|
|
├── financials/
|
|
│ ├── ABC_yahoo.json [Financial data per stock]
|
|
│ ├── XYZ_yahoo.json
|
|
│ └── ...
|
|
│
|
|
├── news/
|
|
│ ├── ABC_news_pr.json [News & PR per stock]
|
|
│ ├── XYZ_news_pr.json
|
|
│ └── ...
|
|
│
|
|
├── reports/
|
|
│ ├── ABC_report.txt [Final report per stock]
|
|
│ ├── XYZ_report.txt
|
|
│ └── ...
|
|
│
|
|
└── stocks.db [SQLite database with everything]
|
|
|
|
═══════════════════════════════════════════════════════════════════════
|
|
DATA RELATIONSHIPS
|
|
═══════════════════════════════════════════════════════════════════════
|
|
|
|
stocks_master (Main table)
|
|
↓
|
|
├── financial_statements (Links via stock_id)
|
|
├── financial_metrics (Links via stock_id)
|
|
├── news_articles (Links via stock_id)
|
|
├── press_releases (Links via stock_id)
|
|
├── filings (Links via stock_id)
|
|
├── agm_info (Links via stock_id)
|
|
└── tax_disclosures (Links via stock_id)
|
|
|
|
coverage_report (Tracks completeness)
|
|
↓
|
|
Links to stocks_master via ticker
|
|
|
|
═══════════════════════════════════════════════════════════════════════
|
|
|
|
EXECUTION FLOW:
|
|
|
|
1. User runs: python main.py
|
|
2. System extracts listings from exchanges
|
|
3. System imports to database
|
|
4. System scrapes Yahoo Finance for each stock
|
|
5. System scrapes news for each stock
|
|
6. System generates reports
|
|
7. Done! All data in data/ folder
|
|
|
|
═══════════════════════════════════════════════════════════════════════
|
|
"""
|
|
|
|
if __name__ == "__main__":
|
|
print(FLOW_DIAGRAM)
|