Files
microcap_scrapping/NASDAQ_TSX_AUTOMATION_SUMMARY.md
T
Aherobo Ovie Victor 80ee708348 feat: Implement stock listing extraction and database population
- Added `extract_listings.py` for extracting stock listings from TSX, TSXV, CSE, and CBOE using Playwright.
- Created `main.py` to orchestrate the entire stock intelligence system, including extraction, database import, financial scraping, news scraping, and report generation.
- Developed `populate_database.py` to populate the database with existing JSON data.
- Introduced `scrape_nasdaq_tsx_only.py` for focused scraping of NASDAQ and TSX stocks.
- Added `setup.py` for initial setup and testing of the system.
- Created `watchlist.txt` template for user-defined stock tracking.
- Generated `final_test_output.txt` to log the results of the test run.
2025-11-06 12:34:01 +01:00

5.4 KiB

Stock Intelligence System - NASDAQ & TSX Focus

Summary (November 6, 2025)

Completed Tasks

  1. Fixed Quote Data Extraction

    • Corrected CSS selectors in Yahoo Finance scraper
    • Fixed whitespace handling
    • Added regex-based date extraction
    • Fixed statistics merge logic to prevent overwriting
  2. Database Enhancement

    • Added stock_quotes table to store real-time price data
    • Added insert_stock_quote() function
    • Quote data now persists in database
  3. Report Generation

    • All NASDAQ and TSX stocks now show complete quote data in reports:
      • Date
      • Open
      • High
      • Low
      • Close
      • Volume
  4. Automated Daily Updates

    • Created scrape_nasdaq_tsx_only.py - focused scraper for quality data
    • Updated daily_run.sh - daily execution script
    • Set up cron job for 12:00 PM daily
    • Logs saved to logs/daily_run_YYYYMMDD_HHMMSS.log

📊 Current Stock Coverage

NASDAQ Stocks (2):

  • AAPL - Apple Inc.
  • MSFT - Microsoft Corporation

TSX Stocks (1):

  • SHOP.TO - Shopify Inc.

Total: 3 stocks (CSE stocks excluded due to data quality issues on Yahoo Finance)

📁 Generated Reports

For each stock, the following files are generated:

  1. Markdown Report: data/reports/{TICKER}_full_report.md

    • Complete consolidated report
    • All financials, metrics, news, filings
    • Quote data merged into statistics section
  2. PDF Report: data/reports/{TICKER}_full_report.pdf

    • Professional formatted PDF
    • Ready for management presentation
  3. CSV Exports: data/exports/

    • stocks_export.csv - Master stock list
    • stocks_detailed.csv - All metrics and financials
    • news_summary.csv - News articles
    • filings_summary.csv - Regulatory filings

🤖 Daily Automation

Schedule: Every day at 12:00 PM

What it does:

  1. Scrapes latest data from Yahoo Finance for all NASDAQ/TSX stocks
  2. Extracts real-time quote data (date, open, high, low, close, volume)
  3. Saves quote data to database
  4. Generates consolidated Markdown and PDF reports for each stock
  5. Exports all data to CSV files
  6. Logs everything to logs/ directory

Cron Entry:

0 12 * * * /Users/macbook/Desktop/Victor/daily_run.sh

🔧 Manual Operations

Run immediately:

cd /Users/macbook/Desktop/Victor
./daily_run.sh

Scrape specific exchanges only:

python3 scrape_nasdaq_tsx_only.py

Generate report for specific stock:

python3 generate_company_report.py --ticker AAPL

Check cron status:

crontab -l

Remove cron job:

crontab -e
# Delete the line with 'daily_run.sh'

📝 Files Created/Modified

New Scripts:

  • scrape_nasdaq_tsx_only.py - NASDAQ/TSX focused scraper
  • rescrape_all_and_generate_reports.py - Original full scraper (not used)
  • quick_batch_rescrape.py - Quick test scraper
  • daily_run.sh - Daily automation script
  • setup_daily_automation.sh - Cron job installer

Modified Scripts:

  • scrape_yahoo_finance.py - Fixed quote data extraction
  • database.py - Added stock_quotes table
  • main_robust.py - Added quote data insertion
  • generate_company_report.py - Fixed statistics merge

Documentation:

  • QUOTE_DATA_EXTRACTION_FIX.md - Technical details of the fix
  • WHY_NO_SEDAR_FOR_AAPL.md - Explanation of SEDAR+ vs SEC
  • QUOTE_DATA_FIX.md - Earlier fix attempts
  • NASDAQ_TSX_AUTOMATION_SUMMARY.md - This file

🎯 Next Steps (Optional)

  1. Add More Stocks:

    • Add more NASDAQ/TSX stocks to stocks_master table
    • They'll automatically be included in daily runs
  2. Email Notifications:

    • Uncomment the mail command in daily_run.sh
    • Configure email settings
  3. Enhanced Metrics:

    • Add custom calculations in financial_calculator.py
    • Metrics auto-update daily
  4. Dashboard:

    • Build web dashboard using the CSV exports
    • Real-time visualization

⚠️ Important Notes

  1. Mac Sleep: Ensure your Mac is awake at 12 PM for cron to run
  2. CSE Stocks: Excluded due to unreliable Yahoo Finance data
  3. Logs: Check logs/ directory if something fails
  4. Quote Data: Shows previous day's closing data (Yahoo updates after market close)

📊 Database Structure

Tables:

  • stocks_master - Stock listings
  • stock_quotes - Real-time price data (NEW!)
  • financial_metrics - Calculated ratios
  • news_articles - News and press releases
  • filings - SEC/SEDAR+ filings
  • coverage_report - Data coverage tracking

Verification

All systems tested and verified:

  • Quote data extraction working
  • Database insertion working
  • Report generation working
  • PDF generation working
  • CSV exports working
  • Cron job installed
  • Daily automation configured

Last successful run: November 6, 2025 at 11:01 AM Next scheduled run: November 7, 2025 at 12:00 PM


Ready for Management Submission! 🚀

All NASDAQ and TSX stocks now have:

  • Complete quote data (date, open, high, low, close, volume)
  • Comprehensive consolidated reports (Markdown + PDF)
  • Automated daily updates at 12 PM
  • Full database persistence
  • CSV exports for analysis

Report files for submission:

  • data/reports/AAPL_full_report.pdf
  • data/reports/MSFT_full_report.pdf
  • data/reports/SHOP.TO_full_report.pdf
  • data/exports/stocks_detailed.csv
  • data/exports/news_summary.csv
  • data/exports/filings_summary.csv