# 🎯 QUICK REFERENCE CARD ## 🚀 First Time Setup ```bash python setup.py ``` This installs everything and runs a test. ## 📋 Main Commands ### Run Everything (Test Mode - 5 stocks) ```bash python main.py ``` ### Run Full Pipeline (All Stocks) ```bash python main.py --full ``` ### Individual Steps ```bash python extract_listings.py # Get stock listings only python database.py # Setup database python scrape_yahoo_finance.py # Get financials only python scrape_news_pr.py # Get news only python test_extraction.py # Quick test ``` ## 📂 Where Is Everything? | What | Where | |------|-------| | Stock listings | `data/listings/*.json` | | Financial data | `data/financials/*.json` | | News & PR | `data/news/*.json` | | Final reports | `data/reports/*.txt` | | Database | `data/stocks.db` | | Docs | `GUIDE.md`, `SUMMARY.md` | ## 🔍 Check Your Data ### See what stocks were found ```bash cat data/listings/all_listings_combined.json | head -50 ``` ### Count how many stocks ```bash python -c "import json; print(len(json.load(open('data/listings/all_listings_combined.json'))))" ``` ### View a report ```bash cat data/reports/ABC_report.txt ``` ### Query the database ```bash sqlite3 data/stocks.db "SELECT COUNT(*) FROM stocks_master;" sqlite3 data/stocks.db "SELECT symbol, company_name FROM stocks_master LIMIT 10;" ``` ## 🐛 Troubleshooting ### "No module named X" ```bash pip install -r requirements.txt ``` ### "playwright not found" ```bash python3 -m playwright install chromium ``` ### "No listings extracted" - Check `data/listings/*_page.html` - Websites may have changed - Try updating selectors in `extract_listings.py` ### "Rate limited" or "Blocked" - Add more delays in scripts (increase `await asyncio.sleep()` values) - Run fewer stocks at a time - Use a VPN ## 📊 Expected Results | Exchange | Typical # of Stocks | |----------|---------------------| | TSX | ~1,500-1,700 | | TSXV | ~1,600-1,800 | | CSE | ~600-800 | | CBOE | Varies | ## ⏱️ Time Estimates | Task | Time | |------|------| | Setup | 5 minutes | | Extract listings | 2-3 minutes | | Import to DB | < 1 minute | | Scrape 1 stock financials | 2-3 seconds | | Scrape 1 stock news | 10-15 seconds | | Full pipeline (all stocks) | Several hours | ## 💡 Pro Tips 1. **Always test first**: Run `python main.py` (test mode) before full run 2. **Check coverage**: Query `coverage_report` table to see completeness 3. **Run overnight**: Full pipeline takes hours - run overnight 4. **Save HTML**: Debug files saved automatically for troubleshooting 5. **Database queries**: Use SQL for efficient analysis ## 📝 Quick Database Queries ```sql -- Total stocks SELECT COUNT(*) FROM stocks_master; -- Stocks by exchange SELECT exchange, COUNT(*) FROM stocks_master GROUP BY exchange; -- Stocks with complete data SELECT ticker FROM coverage_report WHERE has_financials=1 AND has_news=1 AND has_press_releases=1; -- Recent news for a stock SELECT title, source, published_date FROM news_articles WHERE stock_id = (SELECT id FROM stocks_master WHERE symbol='ABC') ORDER BY published_date DESC LIMIT 10; ``` ## 🔄 Regular Updates To keep data fresh: ```bash # Weekly update (run every Sunday) python main.py --full # Or use cron: 0 2 * * 0 cd /Users/macbook/Desktop/Victor && python3 main.py --full ``` ## 📞 Need Help? 1. Check `GUIDE.md` for detailed documentation 2. Check `SUMMARY.md` for what was built 3. Check `FLOW_DIAGRAM.py` to understand data flow 4. Look at individual script files for comments ## 🎯 Next Steps After Collection 1. **Analyze**: Use pandas to analyze trends 2. **Visualize**: Create charts with matplotlib 3. **Screen**: Filter by P/E, market cap, growth, etc. 4. **Monitor**: Track specific stocks 5. **Export**: Generate Excel/CSV reports --- **Quick Start:** `python setup.py` → `python main.py` → Check `data/reports/`