Files
microcap_scrapping/QUICKREF.md
T
Aherobo Ovie Victor 80ee708348 feat: Implement stock listing extraction and database population
- Added `extract_listings.py` for extracting stock listings from TSX, TSXV, CSE, and CBOE using Playwright.
- Created `main.py` to orchestrate the entire stock intelligence system, including extraction, database import, financial scraping, news scraping, and report generation.
- Developed `populate_database.py` to populate the database with existing JSON data.
- Introduced `scrape_nasdaq_tsx_only.py` for focused scraping of NASDAQ and TSX stocks.
- Added `setup.py` for initial setup and testing of the system.
- Created `watchlist.txt` template for user-defined stock tracking.
- Generated `final_test_output.txt` to log the results of the test run.
2025-11-06 12:34:01 +01:00

3.9 KiB

🎯 QUICK REFERENCE CARD

🚀 First Time Setup

python setup.py

This installs everything and runs a test.

📋 Main Commands

Run Everything (Test Mode - 5 stocks)

python main.py

Run Full Pipeline (All Stocks)

python main.py --full

Individual Steps

python extract_listings.py      # Get stock listings only
python database.py              # Setup database
python scrape_yahoo_finance.py  # Get financials only
python scrape_news_pr.py       # Get news only
python test_extraction.py       # Quick test

📂 Where Is Everything?

What Where
Stock listings data/listings/*.json
Financial data data/financials/*.json
News & PR data/news/*.json
Final reports data/reports/*.txt
Database data/stocks.db
Docs GUIDE.md, SUMMARY.md

🔍 Check Your Data

See what stocks were found

cat data/listings/all_listings_combined.json | head -50

Count how many stocks

python -c "import json; print(len(json.load(open('data/listings/all_listings_combined.json'))))"

View a report

cat data/reports/ABC_report.txt

Query the database

sqlite3 data/stocks.db "SELECT COUNT(*) FROM stocks_master;"
sqlite3 data/stocks.db "SELECT symbol, company_name FROM stocks_master LIMIT 10;"

🐛 Troubleshooting

"No module named X"

pip install -r requirements.txt

"playwright not found"

python3 -m playwright install chromium

"No listings extracted"

  • Check data/listings/*_page.html
  • Websites may have changed
  • Try updating selectors in extract_listings.py

"Rate limited" or "Blocked"

  • Add more delays in scripts (increase await asyncio.sleep() values)
  • Run fewer stocks at a time
  • Use a VPN

📊 Expected Results

Exchange Typical # of Stocks
TSX ~1,500-1,700
TSXV ~1,600-1,800
CSE ~600-800
CBOE Varies

⏱️ Time Estimates

Task Time
Setup 5 minutes
Extract listings 2-3 minutes
Import to DB < 1 minute
Scrape 1 stock financials 2-3 seconds
Scrape 1 stock news 10-15 seconds
Full pipeline (all stocks) Several hours

💡 Pro Tips

  1. Always test first: Run python main.py (test mode) before full run
  2. Check coverage: Query coverage_report table to see completeness
  3. Run overnight: Full pipeline takes hours - run overnight
  4. Save HTML: Debug files saved automatically for troubleshooting
  5. Database queries: Use SQL for efficient analysis

📝 Quick Database Queries

-- Total stocks
SELECT COUNT(*) FROM stocks_master;

-- Stocks by exchange
SELECT exchange, COUNT(*) FROM stocks_master GROUP BY exchange;

-- Stocks with complete data
SELECT ticker FROM coverage_report 
WHERE has_financials=1 AND has_news=1 AND has_press_releases=1;

-- Recent news for a stock
SELECT title, source, published_date FROM news_articles 
WHERE stock_id = (SELECT id FROM stocks_master WHERE symbol='ABC')
ORDER BY published_date DESC LIMIT 10;

🔄 Regular Updates

To keep data fresh:

# Weekly update (run every Sunday)
python main.py --full

# Or use cron:
0 2 * * 0 cd /Users/macbook/Desktop/Victor && python3 main.py --full

📞 Need Help?

  1. Check GUIDE.md for detailed documentation
  2. Check SUMMARY.md for what was built
  3. Check FLOW_DIAGRAM.py to understand data flow
  4. Look at individual script files for comments

🎯 Next Steps After Collection

  1. Analyze: Use pandas to analyze trends
  2. Visualize: Create charts with matplotlib
  3. Screen: Filter by P/E, market cap, growth, etc.
  4. Monitor: Track specific stocks
  5. Export: Generate Excel/CSV reports

Quick Start: python setup.pypython main.py → Check data/reports/