Files
microcap_scrapping/FLOW_DIAGRAM.py
T
Aherobo Ovie Victor 80ee708348 feat: Implement stock listing extraction and database population
- Added `extract_listings.py` for extracting stock listings from TSX, TSXV, CSE, and CBOE using Playwright.
- Created `main.py` to orchestrate the entire stock intelligence system, including extraction, database import, financial scraping, news scraping, and report generation.
- Developed `populate_database.py` to populate the database with existing JSON data.
- Introduced `scrape_nasdaq_tsx_only.py` for focused scraping of NASDAQ and TSX stocks.
- Added `setup.py` for initial setup and testing of the system.
- Created `watchlist.txt` template for user-defined stock tracking.
- Generated `final_test_output.txt` to log the results of the test run.
2025-11-06 12:34:01 +01:00

180 lines
8.6 KiB
Python

"""
Visual diagram of the data flow
"""
FLOW_DIAGRAM = """
═══════════════════════════════════════════════════════════════════════
STOCK INTELLIGENCE SYSTEM - DATA FLOW
═══════════════════════════════════════════════════════════════════════
STEP 1: EXTRACT STOCK LISTINGS
═══════════════════════════════
┌─────────────────────┐
│ Exchange Websites │
│ - tsx.com │
│ - thecse.com │
│ - cboe.com │
└──────────┬──────────┘
│ (Playwright scrapes)
┌─────────────┐
│ JSON Files │ → data/listings/tsx_tsxv_listings.json
│ with stock │ → data/listings/cse_listings.json
│ listings │ → data/listings/cboe_listings.json
└──────┬──────┘ → data/listings/all_listings_combined.json
STEP 2: IMPORT TO DATABASE
═══════════════════════════════
┌────────────────┐
│ SQLite DB │
│ stocks.db │ Tables:
│ │ - stocks_master (ABC, 123 companies)
│ │ - coverage_report
└────────┬───────┘
STEP 3: SCRAPE FINANCIALS
═══════════════════════════════
┌────────┴────────────────────────┐
│ For each stock in database: │
│ ticker = "ABC"
└────────┬────────────────────────┘
┌──────────────────┐
│ Yahoo Finance │
│ finance.yahoo.com│
│ /quote/ABC.TO │
└────────┬─────────┘
│ (Scrape)
┌──────────────────┐
│ JSON File │ → data/financials/ABC_yahoo.json
│ - Price │
│ - Market Cap │
│ - Financials │
│ - Ratios │
└────────┬─────────┘
┌──────────────────┐
│ Update Database │
│ has_financials=1 │
└────────┬─────────┘
STEP 4: SCRAPE NEWS & PRESS RELEASES
═══════════════════════════════════════
┌────────┴────────────────────────┐
│ For each stock: │
│ company = "ABC Corp"
└────────┬────────────────────────┘
┌────────┴──────────┬─────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────────┐ ┌─────────────┐
│ Google │ │GlobeNewswire │ │Newswire.ca │
│ News │ │.com/ABC │ │/search/ABC │
└────┬─────┘ └──────┬───────┘ └──────┬──────┘
│ │ │
│ (Scrape) │ (Scrape) │ (Scrape)
│ │ │
└─────────────────┴────────────────────┘
┌────────────────┐
│ JSON File │ → data/news/ABC_news_pr.json
│ - News (15) │
│ - PR (8) │
└────────┬───────┘
┌────────────────┐
│ Update DB │
│ has_news=1 │
│ has_pr=1 │
└────────┬───────┘
STEP 5: GENERATE REPORTS
═══════════════════════════════
┌────────┴───────────────────────┐
│ Combine all data for each stock│
└────────┬───────────────────────┘
┌────────────────┐
│ Text Report │ → data/reports/ABC_report.txt
│ │
│ [TICKER INFO] │
│ [FINANCIALS] │
│ [NEWS] │
│ [PRESS REL] │
└────────────────┘
═══════════════════════════════════════════════════════════════════════
FINAL OUTPUT
═══════════════════════════════════════════════════════════════════════
data/
├── listings/
│ └── all_listings_combined.json [List of all stocks]
├── financials/
│ ├── ABC_yahoo.json [Financial data per stock]
│ ├── XYZ_yahoo.json
│ └── ...
├── news/
│ ├── ABC_news_pr.json [News & PR per stock]
│ ├── XYZ_news_pr.json
│ └── ...
├── reports/
│ ├── ABC_report.txt [Final report per stock]
│ ├── XYZ_report.txt
│ └── ...
└── stocks.db [SQLite database with everything]
═══════════════════════════════════════════════════════════════════════
DATA RELATIONSHIPS
═══════════════════════════════════════════════════════════════════════
stocks_master (Main table)
├── financial_statements (Links via stock_id)
├── financial_metrics (Links via stock_id)
├── news_articles (Links via stock_id)
├── press_releases (Links via stock_id)
├── filings (Links via stock_id)
├── agm_info (Links via stock_id)
└── tax_disclosures (Links via stock_id)
coverage_report (Tracks completeness)
Links to stocks_master via ticker
═══════════════════════════════════════════════════════════════════════
EXECUTION FLOW:
1. User runs: python main.py
2. System extracts listings from exchanges
3. System imports to database
4. System scrapes Yahoo Finance for each stock
5. System scrapes news for each stock
6. System generates reports
7. Done! All data in data/ folder
═══════════════════════════════════════════════════════════════════════
"""
if __name__ == "__main__":
print(FLOW_DIAGRAM)