feat: Implement stock listing extraction and database population
- Added `extract_listings.py` for extracting stock listings from TSX, TSXV, CSE, and CBOE using Playwright. - Created `main.py` to orchestrate the entire stock intelligence system, including extraction, database import, financial scraping, news scraping, and report generation. - Developed `populate_database.py` to populate the database with existing JSON data. - Introduced `scrape_nasdaq_tsx_only.py` for focused scraping of NASDAQ and TSX stocks. - Added `setup.py` for initial setup and testing of the system. - Created `watchlist.txt` template for user-defined stock tracking. - Generated `final_test_output.txt` to log the results of the test run.
This commit is contained in:
@@ -0,0 +1,196 @@
|
||||
# Stock Intelligence System - NASDAQ & TSX Focus
|
||||
|
||||
## Summary (November 6, 2025)
|
||||
|
||||
### ✅ Completed Tasks
|
||||
|
||||
1. **Fixed Quote Data Extraction**
|
||||
- Corrected CSS selectors in Yahoo Finance scraper
|
||||
- Fixed whitespace handling
|
||||
- Added regex-based date extraction
|
||||
- Fixed statistics merge logic to prevent overwriting
|
||||
|
||||
2. **Database Enhancement**
|
||||
- Added `stock_quotes` table to store real-time price data
|
||||
- Added `insert_stock_quote()` function
|
||||
- Quote data now persists in database
|
||||
|
||||
3. **Report Generation**
|
||||
- All NASDAQ and TSX stocks now show complete quote data in reports:
|
||||
- ✅ Date
|
||||
- ✅ Open
|
||||
- ✅ High
|
||||
- ✅ Low
|
||||
- ✅ Close
|
||||
- ✅ Volume
|
||||
|
||||
4. **Automated Daily Updates**
|
||||
- Created `scrape_nasdaq_tsx_only.py` - focused scraper for quality data
|
||||
- Updated `daily_run.sh` - daily execution script
|
||||
- Set up cron job for **12:00 PM daily**
|
||||
- Logs saved to `logs/daily_run_YYYYMMDD_HHMMSS.log`
|
||||
|
||||
### 📊 Current Stock Coverage
|
||||
|
||||
**NASDAQ Stocks (2):**
|
||||
- AAPL - Apple Inc.
|
||||
- MSFT - Microsoft Corporation
|
||||
|
||||
**TSX Stocks (1):**
|
||||
- SHOP.TO - Shopify Inc.
|
||||
|
||||
**Total: 3 stocks** (CSE stocks excluded due to data quality issues on Yahoo Finance)
|
||||
|
||||
### 📁 Generated Reports
|
||||
|
||||
For each stock, the following files are generated:
|
||||
|
||||
1. **Markdown Report**: `data/reports/{TICKER}_full_report.md`
|
||||
- Complete consolidated report
|
||||
- All financials, metrics, news, filings
|
||||
- Quote data merged into statistics section
|
||||
|
||||
2. **PDF Report**: `data/reports/{TICKER}_full_report.pdf`
|
||||
- Professional formatted PDF
|
||||
- Ready for management presentation
|
||||
|
||||
3. **CSV Exports**: `data/exports/`
|
||||
- `stocks_export.csv` - Master stock list
|
||||
- `stocks_detailed.csv` - All metrics and financials
|
||||
- `news_summary.csv` - News articles
|
||||
- `filings_summary.csv` - Regulatory filings
|
||||
|
||||
### 🤖 Daily Automation
|
||||
|
||||
**Schedule:** Every day at 12:00 PM
|
||||
|
||||
**What it does:**
|
||||
1. Scrapes latest data from Yahoo Finance for all NASDAQ/TSX stocks
|
||||
2. Extracts real-time quote data (date, open, high, low, close, volume)
|
||||
3. Saves quote data to database
|
||||
4. Generates consolidated Markdown and PDF reports for each stock
|
||||
5. Exports all data to CSV files
|
||||
6. Logs everything to `logs/` directory
|
||||
|
||||
**Cron Entry:**
|
||||
```bash
|
||||
0 12 * * * /Users/macbook/Desktop/Victor/daily_run.sh
|
||||
```
|
||||
|
||||
### 🔧 Manual Operations
|
||||
|
||||
**Run immediately:**
|
||||
```bash
|
||||
cd /Users/macbook/Desktop/Victor
|
||||
./daily_run.sh
|
||||
```
|
||||
|
||||
**Scrape specific exchanges only:**
|
||||
```bash
|
||||
python3 scrape_nasdaq_tsx_only.py
|
||||
```
|
||||
|
||||
**Generate report for specific stock:**
|
||||
```bash
|
||||
python3 generate_company_report.py --ticker AAPL
|
||||
```
|
||||
|
||||
**Check cron status:**
|
||||
```bash
|
||||
crontab -l
|
||||
```
|
||||
|
||||
**Remove cron job:**
|
||||
```bash
|
||||
crontab -e
|
||||
# Delete the line with 'daily_run.sh'
|
||||
```
|
||||
|
||||
### 📝 Files Created/Modified
|
||||
|
||||
**New Scripts:**
|
||||
- `scrape_nasdaq_tsx_only.py` - NASDAQ/TSX focused scraper
|
||||
- `rescrape_all_and_generate_reports.py` - Original full scraper (not used)
|
||||
- `quick_batch_rescrape.py` - Quick test scraper
|
||||
- `daily_run.sh` - Daily automation script
|
||||
- `setup_daily_automation.sh` - Cron job installer
|
||||
|
||||
**Modified Scripts:**
|
||||
- `scrape_yahoo_finance.py` - Fixed quote data extraction
|
||||
- `database.py` - Added stock_quotes table
|
||||
- `main_robust.py` - Added quote data insertion
|
||||
- `generate_company_report.py` - Fixed statistics merge
|
||||
|
||||
**Documentation:**
|
||||
- `QUOTE_DATA_EXTRACTION_FIX.md` - Technical details of the fix
|
||||
- `WHY_NO_SEDAR_FOR_AAPL.md` - Explanation of SEDAR+ vs SEC
|
||||
- `QUOTE_DATA_FIX.md` - Earlier fix attempts
|
||||
- `NASDAQ_TSX_AUTOMATION_SUMMARY.md` - This file
|
||||
|
||||
### 🎯 Next Steps (Optional)
|
||||
|
||||
1. **Add More Stocks:**
|
||||
- Add more NASDAQ/TSX stocks to `stocks_master` table
|
||||
- They'll automatically be included in daily runs
|
||||
|
||||
2. **Email Notifications:**
|
||||
- Uncomment the mail command in `daily_run.sh`
|
||||
- Configure email settings
|
||||
|
||||
3. **Enhanced Metrics:**
|
||||
- Add custom calculations in `financial_calculator.py`
|
||||
- Metrics auto-update daily
|
||||
|
||||
4. **Dashboard:**
|
||||
- Build web dashboard using the CSV exports
|
||||
- Real-time visualization
|
||||
|
||||
### ⚠️ Important Notes
|
||||
|
||||
1. **Mac Sleep:** Ensure your Mac is awake at 12 PM for cron to run
|
||||
2. **CSE Stocks:** Excluded due to unreliable Yahoo Finance data
|
||||
3. **Logs:** Check `logs/` directory if something fails
|
||||
4. **Quote Data:** Shows previous day's closing data (Yahoo updates after market close)
|
||||
|
||||
### 📊 Database Structure
|
||||
|
||||
**Tables:**
|
||||
- `stocks_master` - Stock listings
|
||||
- `stock_quotes` - Real-time price data (NEW!)
|
||||
- `financial_metrics` - Calculated ratios
|
||||
- `news_articles` - News and press releases
|
||||
- `filings` - SEC/SEDAR+ filings
|
||||
- `coverage_report` - Data coverage tracking
|
||||
|
||||
### ✅ Verification
|
||||
|
||||
All systems tested and verified:
|
||||
- ✅ Quote data extraction working
|
||||
- ✅ Database insertion working
|
||||
- ✅ Report generation working
|
||||
- ✅ PDF generation working
|
||||
- ✅ CSV exports working
|
||||
- ✅ Cron job installed
|
||||
- ✅ Daily automation configured
|
||||
|
||||
**Last successful run:** November 6, 2025 at 11:01 AM
|
||||
**Next scheduled run:** November 7, 2025 at 12:00 PM
|
||||
|
||||
---
|
||||
|
||||
## Ready for Management Submission! 🚀
|
||||
|
||||
All NASDAQ and TSX stocks now have:
|
||||
- Complete quote data (date, open, high, low, close, volume)
|
||||
- Comprehensive consolidated reports (Markdown + PDF)
|
||||
- Automated daily updates at 12 PM
|
||||
- Full database persistence
|
||||
- CSV exports for analysis
|
||||
|
||||
**Report files for submission:**
|
||||
- `data/reports/AAPL_full_report.pdf`
|
||||
- `data/reports/MSFT_full_report.pdf`
|
||||
- `data/reports/SHOP.TO_full_report.pdf`
|
||||
- `data/exports/stocks_detailed.csv`
|
||||
- `data/exports/news_summary.csv`
|
||||
- `data/exports/filings_summary.csv`
|
||||
Reference in New Issue
Block a user