QUICKREF.md

# 🎯 QUICK REFERENCE CARD

## 🚀 First Time Setup
```bash
python setup.py
```
This installs everything and runs a test.

## 📋 Main Commands

### Run Everything (Test Mode - 5 stocks)
```bash
python main.py
```

### Run Full Pipeline (All Stocks)
```bash
python main.py --full
```

### Individual Steps
```bash
python extract_listings.py      # Get stock listings only
python database.py              # Setup database
python scrape_yahoo_finance.py  # Get financials only
python scrape_news_pr.py       # Get news only
python test_extraction.py       # Quick test
```

## 📂 Where Is Everything?

| What | Where |
|------|-------|
| Stock listings | `data/listings/*.json` |
| Financial data | `data/financials/*.json` |
| News & PR | `data/news/*.json` |
| Final reports | `data/reports/*.txt` |
| Database | `data/stocks.db` |
| Docs | `GUIDE.md`, `SUMMARY.md` |

## 🔍 Check Your Data

### See what stocks were found
```bash
cat data/listings/all_listings_combined.json | head -50
```

### Count how many stocks
```bash
python -c "import json; print(len(json.load(open('data/listings/all_listings_combined.json'))))"
```

### View a report
```bash
cat data/reports/ABC_report.txt
```

### Query the database
```bash
sqlite3 data/stocks.db "SELECT COUNT(*) FROM stocks_master;"
sqlite3 data/stocks.db "SELECT symbol, company_name FROM stocks_master LIMIT 10;"
```

## 🐛 Troubleshooting

### "No module named X"
```bash
pip install -r requirements.txt
```

### "playwright not found"
```bash
python3 -m playwright install chromium
```

### "No listings extracted"
- Check `data/listings/*_page.html`
- Websites may have changed
- Try updating selectors in `extract_listings.py`

### "Rate limited" or "Blocked"
- Add more delays in scripts (increase `await asyncio.sleep()` values)
- Run fewer stocks at a time
- Use a VPN

## 📊 Expected Results

| Exchange | Typical # of Stocks |
|----------|---------------------|
| TSX      | ~1,500-1,700       |
| TSXV     | ~1,600-1,800       |
| CSE      | ~600-800           |
| CBOE     | Varies             |

## ⏱️ Time Estimates

| Task | Time |
|------|------|
| Setup | 5 minutes |
| Extract listings | 2-3 minutes |
| Import to DB | < 1 minute |
| Scrape 1 stock financials | 2-3 seconds |
| Scrape 1 stock news | 10-15 seconds |
| Full pipeline (all stocks) | Several hours |

## 💡 Pro Tips

1. **Always test first**: Run `python main.py` (test mode) before full run
2. **Check coverage**: Query `coverage_report` table to see completeness
3. **Run overnight**: Full pipeline takes hours - run overnight
4. **Save HTML**: Debug files saved automatically for troubleshooting
5. **Database queries**: Use SQL for efficient analysis

## 📝 Quick Database Queries

```sql
-- Total stocks
SELECT COUNT(*) FROM stocks_master;

-- Stocks by exchange
SELECT exchange, COUNT(*) FROM stocks_master GROUP BY exchange;

-- Stocks with complete data
SELECT ticker FROM coverage_report 
WHERE has_financials=1 AND has_news=1 AND has_press_releases=1;

-- Recent news for a stock
SELECT title, source, published_date FROM news_articles 
WHERE stock_id = (SELECT id FROM stocks_master WHERE symbol='ABC')
ORDER BY published_date DESC LIMIT 10;
```

## 🔄 Regular Updates

To keep data fresh:

```bash
# Weekly update (run every Sunday)
python main.py --full

# Or use cron:
0 2 * * 0 cd /Users/macbook/Desktop/Victor && python3 main.py --full
```

## 📞 Need Help?

1. Check `GUIDE.md` for detailed documentation
2. Check `SUMMARY.md` for what was built
3. Check `FLOW_DIAGRAM.py` to understand data flow
4. Look at individual script files for comments

## 🎯 Next Steps After Collection

1. **Analyze**: Use pandas to analyze trends
2. **Visualize**: Create charts with matplotlib
3. **Screen**: Filter by P/E, market cap, growth, etc.
4. **Monitor**: Track specific stocks
5. **Export**: Generate Excel/CSV reports

---

**Quick Start:** `python setup.py` → `python main.py` → Check `data/reports/`
feat: Implement stock listing extraction and database population 2025-11-06 12:34:01 +01:00			`# 🎯 QUICK REFERENCE CARD`

			`## 🚀 First Time Setup`
			```bash
			`python setup.py`
			```
			`This installs everything and runs a test.`

			`## 📋 Main Commands`

			`### Run Everything (Test Mode - 5 stocks)`
			```bash
			`python main.py`
			```

			`### Run Full Pipeline (All Stocks)`
			```bash
			`python main.py --full`
			```

			`### Individual Steps`
			```bash
			`python extract_listings.py # Get stock listings only`
			`python database.py # Setup database`
			`python scrape_yahoo_finance.py # Get financials only`
			`python scrape_news_pr.py # Get news only`
			`python test_extraction.py # Quick test`
			```

			`## 📂 Where Is Everything?`

			`\| What \| Where \|`
			`\|------\|-------\|`
			\| Stock listings \| `data/listings/*.json` \|
			\| Financial data \| `data/financials/*.json` \|
			\| News & PR \| `data/news/*.json` \|
			\| Final reports \| `data/reports/*.txt` \|
			\| Database \| `data/stocks.db` \|
			\| Docs \| `GUIDE.md`, `SUMMARY.md` \|

			`## 🔍 Check Your Data`

			`### See what stocks were found`
			```bash
			`cat data/listings/all_listings_combined.json \| head -50`
			```

			`### Count how many stocks`
			```bash
			`python -c "import json; print(len(json.load(open('data/listings/all_listings_combined.json'))))"`
			```

			`### View a report`
			```bash
			`cat data/reports/ABC_report.txt`
			```

			`### Query the database`
			```bash
			`sqlite3 data/stocks.db "SELECT COUNT(*) FROM stocks_master;"`
			`sqlite3 data/stocks.db "SELECT symbol, company_name FROM stocks_master LIMIT 10;"`
			```

			`## 🐛 Troubleshooting`

			`### "No module named X"`
			```bash
			`pip install -r requirements.txt`
			```

			`### "playwright not found"`
			```bash
			`python3 -m playwright install chromium`
			```

			`### "No listings extracted"`
			- Check `data/listings/*_page.html`
			`- Websites may have changed`
			- Try updating selectors in `extract_listings.py`

			`### "Rate limited" or "Blocked"`
			- Add more delays in scripts (increase `await asyncio.sleep()` values)
			`- Run fewer stocks at a time`
			`- Use a VPN`

			`## 📊 Expected Results`

			`\| Exchange \| Typical # of Stocks \|`
			`\|----------\|---------------------\|`
			`\| TSX \| ~1,500-1,700 \|`
			`\| TSXV \| ~1,600-1,800 \|`
			`\| CSE \| ~600-800 \|`
			`\| CBOE \| Varies \|`

			`## ⏱️ Time Estimates`

			`\| Task \| Time \|`
			`\|------\|------\|`
			`\| Setup \| 5 minutes \|`
			`\| Extract listings \| 2-3 minutes \|`
			`\| Import to DB \| < 1 minute \|`
			`\| Scrape 1 stock financials \| 2-3 seconds \|`
			`\| Scrape 1 stock news \| 10-15 seconds \|`
			`\| Full pipeline (all stocks) \| Several hours \|`

			`## 💡 Pro Tips`

			1. Always test first: Run `python main.py` (test mode) before full run
			2. Check coverage: Query `coverage_report` table to see completeness
			`3. Run overnight: Full pipeline takes hours - run overnight`
			`4. Save HTML: Debug files saved automatically for troubleshooting`
			`5. Database queries: Use SQL for efficient analysis`

			`## 📝 Quick Database Queries`

			```sql
			`-- Total stocks`
			`SELECT COUNT(*) FROM stocks_master;`

			`-- Stocks by exchange`
			`SELECT exchange, COUNT(*) FROM stocks_master GROUP BY exchange;`

			`-- Stocks with complete data`
			`SELECT ticker FROM coverage_report`
			`WHERE has_financials=1 AND has_news=1 AND has_press_releases=1;`

			`-- Recent news for a stock`
			`SELECT title, source, published_date FROM news_articles`
			`WHERE stock_id = (SELECT id FROM stocks_master WHERE symbol='ABC')`
			`ORDER BY published_date DESC LIMIT 10;`
			```

			`## 🔄 Regular Updates`

			`To keep data fresh:`

			```bash
			`# Weekly update (run every Sunday)`
			`python main.py --full`

			`# Or use cron:`
			`0 2 * * 0 cd /Users/macbook/Desktop/Victor && python3 main.py --full`
			```

			`## 📞 Need Help?`

			1. Check `GUIDE.md` for detailed documentation
			2. Check `SUMMARY.md` for what was built
			3. Check `FLOW_DIAGRAM.py` to understand data flow
			`4. Look at individual script files for comments`

			`## 🎯 Next Steps After Collection`

			`1. Analyze: Use pandas to analyze trends`
			`2. Visualize: Create charts with matplotlib`
			`3. Screen: Filter by P/E, market cap, growth, etc.`
			`4. Monitor: Track specific stocks`
			`5. Export: Generate Excel/CSV reports`

			`---`

			Quick Start: `python setup.py` → `python main.py` → Check `data/reports/`