163 lines
3.9 KiB
Markdown
163 lines
3.9 KiB
Markdown
|
|
# 🎯 QUICK REFERENCE CARD
|
||
|
|
|
||
|
|
## 🚀 First Time Setup
|
||
|
|
```bash
|
||
|
|
python setup.py
|
||
|
|
```
|
||
|
|
This installs everything and runs a test.
|
||
|
|
|
||
|
|
## 📋 Main Commands
|
||
|
|
|
||
|
|
### Run Everything (Test Mode - 5 stocks)
|
||
|
|
```bash
|
||
|
|
python main.py
|
||
|
|
```
|
||
|
|
|
||
|
|
### Run Full Pipeline (All Stocks)
|
||
|
|
```bash
|
||
|
|
python main.py --full
|
||
|
|
```
|
||
|
|
|
||
|
|
### Individual Steps
|
||
|
|
```bash
|
||
|
|
python extract_listings.py # Get stock listings only
|
||
|
|
python database.py # Setup database
|
||
|
|
python scrape_yahoo_finance.py # Get financials only
|
||
|
|
python scrape_news_pr.py # Get news only
|
||
|
|
python test_extraction.py # Quick test
|
||
|
|
```
|
||
|
|
|
||
|
|
## 📂 Where Is Everything?
|
||
|
|
|
||
|
|
| What | Where |
|
||
|
|
|------|-------|
|
||
|
|
| Stock listings | `data/listings/*.json` |
|
||
|
|
| Financial data | `data/financials/*.json` |
|
||
|
|
| News & PR | `data/news/*.json` |
|
||
|
|
| Final reports | `data/reports/*.txt` |
|
||
|
|
| Database | `data/stocks.db` |
|
||
|
|
| Docs | `GUIDE.md`, `SUMMARY.md` |
|
||
|
|
|
||
|
|
## 🔍 Check Your Data
|
||
|
|
|
||
|
|
### See what stocks were found
|
||
|
|
```bash
|
||
|
|
cat data/listings/all_listings_combined.json | head -50
|
||
|
|
```
|
||
|
|
|
||
|
|
### Count how many stocks
|
||
|
|
```bash
|
||
|
|
python -c "import json; print(len(json.load(open('data/listings/all_listings_combined.json'))))"
|
||
|
|
```
|
||
|
|
|
||
|
|
### View a report
|
||
|
|
```bash
|
||
|
|
cat data/reports/ABC_report.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
### Query the database
|
||
|
|
```bash
|
||
|
|
sqlite3 data/stocks.db "SELECT COUNT(*) FROM stocks_master;"
|
||
|
|
sqlite3 data/stocks.db "SELECT symbol, company_name FROM stocks_master LIMIT 10;"
|
||
|
|
```
|
||
|
|
|
||
|
|
## 🐛 Troubleshooting
|
||
|
|
|
||
|
|
### "No module named X"
|
||
|
|
```bash
|
||
|
|
pip install -r requirements.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
### "playwright not found"
|
||
|
|
```bash
|
||
|
|
python3 -m playwright install chromium
|
||
|
|
```
|
||
|
|
|
||
|
|
### "No listings extracted"
|
||
|
|
- Check `data/listings/*_page.html`
|
||
|
|
- Websites may have changed
|
||
|
|
- Try updating selectors in `extract_listings.py`
|
||
|
|
|
||
|
|
### "Rate limited" or "Blocked"
|
||
|
|
- Add more delays in scripts (increase `await asyncio.sleep()` values)
|
||
|
|
- Run fewer stocks at a time
|
||
|
|
- Use a VPN
|
||
|
|
|
||
|
|
## 📊 Expected Results
|
||
|
|
|
||
|
|
| Exchange | Typical # of Stocks |
|
||
|
|
|----------|---------------------|
|
||
|
|
| TSX | ~1,500-1,700 |
|
||
|
|
| TSXV | ~1,600-1,800 |
|
||
|
|
| CSE | ~600-800 |
|
||
|
|
| CBOE | Varies |
|
||
|
|
|
||
|
|
## ⏱️ Time Estimates
|
||
|
|
|
||
|
|
| Task | Time |
|
||
|
|
|------|------|
|
||
|
|
| Setup | 5 minutes |
|
||
|
|
| Extract listings | 2-3 minutes |
|
||
|
|
| Import to DB | < 1 minute |
|
||
|
|
| Scrape 1 stock financials | 2-3 seconds |
|
||
|
|
| Scrape 1 stock news | 10-15 seconds |
|
||
|
|
| Full pipeline (all stocks) | Several hours |
|
||
|
|
|
||
|
|
## 💡 Pro Tips
|
||
|
|
|
||
|
|
1. **Always test first**: Run `python main.py` (test mode) before full run
|
||
|
|
2. **Check coverage**: Query `coverage_report` table to see completeness
|
||
|
|
3. **Run overnight**: Full pipeline takes hours - run overnight
|
||
|
|
4. **Save HTML**: Debug files saved automatically for troubleshooting
|
||
|
|
5. **Database queries**: Use SQL for efficient analysis
|
||
|
|
|
||
|
|
## 📝 Quick Database Queries
|
||
|
|
|
||
|
|
```sql
|
||
|
|
-- Total stocks
|
||
|
|
SELECT COUNT(*) FROM stocks_master;
|
||
|
|
|
||
|
|
-- Stocks by exchange
|
||
|
|
SELECT exchange, COUNT(*) FROM stocks_master GROUP BY exchange;
|
||
|
|
|
||
|
|
-- Stocks with complete data
|
||
|
|
SELECT ticker FROM coverage_report
|
||
|
|
WHERE has_financials=1 AND has_news=1 AND has_press_releases=1;
|
||
|
|
|
||
|
|
-- Recent news for a stock
|
||
|
|
SELECT title, source, published_date FROM news_articles
|
||
|
|
WHERE stock_id = (SELECT id FROM stocks_master WHERE symbol='ABC')
|
||
|
|
ORDER BY published_date DESC LIMIT 10;
|
||
|
|
```
|
||
|
|
|
||
|
|
## 🔄 Regular Updates
|
||
|
|
|
||
|
|
To keep data fresh:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Weekly update (run every Sunday)
|
||
|
|
python main.py --full
|
||
|
|
|
||
|
|
# Or use cron:
|
||
|
|
0 2 * * 0 cd /Users/macbook/Desktop/Victor && python3 main.py --full
|
||
|
|
```
|
||
|
|
|
||
|
|
## 📞 Need Help?
|
||
|
|
|
||
|
|
1. Check `GUIDE.md` for detailed documentation
|
||
|
|
2. Check `SUMMARY.md` for what was built
|
||
|
|
3. Check `FLOW_DIAGRAM.py` to understand data flow
|
||
|
|
4. Look at individual script files for comments
|
||
|
|
|
||
|
|
## 🎯 Next Steps After Collection
|
||
|
|
|
||
|
|
1. **Analyze**: Use pandas to analyze trends
|
||
|
|
2. **Visualize**: Create charts with matplotlib
|
||
|
|
3. **Screen**: Filter by P/E, market cap, growth, etc.
|
||
|
|
4. **Monitor**: Track specific stocks
|
||
|
|
5. **Export**: Generate Excel/CSV reports
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Quick Start:** `python setup.py` → `python main.py` → Check `data/reports/`
|