80ee708348
- Added `extract_listings.py` for extracting stock listings from TSX, TSXV, CSE, and CBOE using Playwright. - Created `main.py` to orchestrate the entire stock intelligence system, including extraction, database import, financial scraping, news scraping, and report generation. - Developed `populate_database.py` to populate the database with existing JSON data. - Introduced `scrape_nasdaq_tsx_only.py` for focused scraping of NASDAQ and TSX stocks. - Added `setup.py` for initial setup and testing of the system. - Created `watchlist.txt` template for user-defined stock tracking. - Generated `final_test_output.txt` to log the results of the test run.
3.9 KiB
3.9 KiB
🎯 QUICK REFERENCE CARD
🚀 First Time Setup
python setup.py
This installs everything and runs a test.
📋 Main Commands
Run Everything (Test Mode - 5 stocks)
python main.py
Run Full Pipeline (All Stocks)
python main.py --full
Individual Steps
python extract_listings.py # Get stock listings only
python database.py # Setup database
python scrape_yahoo_finance.py # Get financials only
python scrape_news_pr.py # Get news only
python test_extraction.py # Quick test
📂 Where Is Everything?
| What | Where |
|---|---|
| Stock listings | data/listings/*.json |
| Financial data | data/financials/*.json |
| News & PR | data/news/*.json |
| Final reports | data/reports/*.txt |
| Database | data/stocks.db |
| Docs | GUIDE.md, SUMMARY.md |
🔍 Check Your Data
See what stocks were found
cat data/listings/all_listings_combined.json | head -50
Count how many stocks
python -c "import json; print(len(json.load(open('data/listings/all_listings_combined.json'))))"
View a report
cat data/reports/ABC_report.txt
Query the database
sqlite3 data/stocks.db "SELECT COUNT(*) FROM stocks_master;"
sqlite3 data/stocks.db "SELECT symbol, company_name FROM stocks_master LIMIT 10;"
🐛 Troubleshooting
"No module named X"
pip install -r requirements.txt
"playwright not found"
python3 -m playwright install chromium
"No listings extracted"
- Check
data/listings/*_page.html - Websites may have changed
- Try updating selectors in
extract_listings.py
"Rate limited" or "Blocked"
- Add more delays in scripts (increase
await asyncio.sleep()values) - Run fewer stocks at a time
- Use a VPN
📊 Expected Results
| Exchange | Typical # of Stocks |
|---|---|
| TSX | ~1,500-1,700 |
| TSXV | ~1,600-1,800 |
| CSE | ~600-800 |
| CBOE | Varies |
⏱️ Time Estimates
| Task | Time |
|---|---|
| Setup | 5 minutes |
| Extract listings | 2-3 minutes |
| Import to DB | < 1 minute |
| Scrape 1 stock financials | 2-3 seconds |
| Scrape 1 stock news | 10-15 seconds |
| Full pipeline (all stocks) | Several hours |
💡 Pro Tips
- Always test first: Run
python main.py(test mode) before full run - Check coverage: Query
coverage_reporttable to see completeness - Run overnight: Full pipeline takes hours - run overnight
- Save HTML: Debug files saved automatically for troubleshooting
- Database queries: Use SQL for efficient analysis
📝 Quick Database Queries
-- Total stocks
SELECT COUNT(*) FROM stocks_master;
-- Stocks by exchange
SELECT exchange, COUNT(*) FROM stocks_master GROUP BY exchange;
-- Stocks with complete data
SELECT ticker FROM coverage_report
WHERE has_financials=1 AND has_news=1 AND has_press_releases=1;
-- Recent news for a stock
SELECT title, source, published_date FROM news_articles
WHERE stock_id = (SELECT id FROM stocks_master WHERE symbol='ABC')
ORDER BY published_date DESC LIMIT 10;
🔄 Regular Updates
To keep data fresh:
# Weekly update (run every Sunday)
python main.py --full
# Or use cron:
0 2 * * 0 cd /Users/macbook/Desktop/Victor && python3 main.py --full
📞 Need Help?
- Check
GUIDE.mdfor detailed documentation - Check
SUMMARY.mdfor what was built - Check
FLOW_DIAGRAM.pyto understand data flow - Look at individual script files for comments
🎯 Next Steps After Collection
- Analyze: Use pandas to analyze trends
- Visualize: Create charts with matplotlib
- Screen: Filter by P/E, market cap, growth, etc.
- Monitor: Track specific stocks
- Export: Generate Excel/CSV reports
Quick Start: python setup.py → python main.py → Check data/reports/