feat: Implement stock listing extraction and database population
- Added `extract_listings.py` for extracting stock listings from TSX, TSXV, CSE, and CBOE using Playwright. - Created `main.py` to orchestrate the entire stock intelligence system, including extraction, database import, financial scraping, news scraping, and report generation. - Developed `populate_database.py` to populate the database with existing JSON data. - Introduced `scrape_nasdaq_tsx_only.py` for focused scraping of NASDAQ and TSX stocks. - Added `setup.py` for initial setup and testing of the system. - Created `watchlist.txt` template for user-defined stock tracking. - Generated `final_test_output.txt` to log the results of the test run.
This commit is contained in:
@@ -0,0 +1,345 @@
|
||||
# 🚀 PRODUCTION-READY Stock Intelligence System
|
||||
|
||||
## ✅ COMPLETE IMPLEMENTATION
|
||||
|
||||
Your boss's requirements have been fully implemented:
|
||||
|
||||
### What's Included:
|
||||
- ✅ **Annual General Meeting Reports** - Scraped from SEDAR+ and SEC filings
|
||||
- ✅ **Tax Filings** - Extracted from annual reports and 10-K filings
|
||||
- ✅ **SEC Filings** - 10-K, 10-Q, 8-K, DEF 14A, ownership forms (3, 4, 5, 13D, 13G)
|
||||
- ✅ **SEDAR+ Filings** - All Canadian regulatory filings
|
||||
- ✅ **Founder/Insider Ownership** - Extracted from proxy statements and ownership filings
|
||||
- ✅ **Calculated Financial Metrics** - All ratios computed from base numbers (Step 4 formulas)
|
||||
- ✅ **Daily Updates** - Can run daily on any stock or full universe
|
||||
- ✅ **CSV Export** - Complete data export in CSV format
|
||||
- ✅ **SerpAPI Integration** - Robust news/PR scraping with API key: `68231e3b3a973a01483aaf098af6040d41e66f284f11abb15b8d9a005ac0f44d`
|
||||
|
||||
## 📦 Installation
|
||||
|
||||
```bash
|
||||
cd /Users/macbook/Desktop/Victor
|
||||
|
||||
# Install all dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Install Playwright browser
|
||||
python3 -m playwright install chromium
|
||||
```
|
||||
|
||||
## 🎯 How To Use
|
||||
|
||||
### 1. Initial Full Extraction (Run Once)
|
||||
```bash
|
||||
# Extract all stocks and complete data
|
||||
python main_robust.py --full
|
||||
```
|
||||
|
||||
### 2. Test Mode (Recommended First)
|
||||
```bash
|
||||
# Test with 5 stocks
|
||||
python main_robust.py --test 5
|
||||
|
||||
# Test with 10 stocks
|
||||
python main_robust.py --test 10
|
||||
```
|
||||
|
||||
### 3. Daily Update (Single Stock)
|
||||
```bash
|
||||
# Update specific stock
|
||||
python main_robust.py --ticker AAPL
|
||||
python main_robust.py --ticker SHOP
|
||||
python main_robust.py --ticker CVV
|
||||
```
|
||||
|
||||
### 4. Daily Automation (All Stocks)
|
||||
```bash
|
||||
# Run daily update for all stocks
|
||||
python daily_automation.py --daily
|
||||
```
|
||||
|
||||
### 5. Watchlist Mode
|
||||
```bash
|
||||
# Create watchlist.txt with tickers (one per line)
|
||||
echo "AAPL" > watchlist.txt
|
||||
echo "MSFT" >> watchlist.txt
|
||||
echo "TSLA" >> watchlist.txt
|
||||
|
||||
# Update only watchlist
|
||||
python daily_automation.py --watchlist
|
||||
```
|
||||
|
||||
### 6. Export to CSV
|
||||
```bash
|
||||
# Export all data to CSV files
|
||||
python export_csv.py
|
||||
```
|
||||
|
||||
## 📁 Complete File Structure
|
||||
|
||||
```
|
||||
Victor/
|
||||
├── 🎯 MAIN SCRIPTS
|
||||
│ ├── main_robust.py # Production-ready main orchestrator
|
||||
│ ├── daily_automation.py # Daily update automation
|
||||
│ ├── config.py # Configuration (includes SerpAPI key)
|
||||
│
|
||||
├── 📊 DATA COLLECTION MODULES
|
||||
│ ├── extract_listings.py # Extract stock listings from exchanges
|
||||
│ ├── scrape_yahoo_finance.py # Financial data from Yahoo Finance
|
||||
│ ├── scrape_news_pr.py # News & PR (direct scraping)
|
||||
│ ├── scrape_serpapi.py # News & PR (using SerpAPI - ROBUST)
|
||||
│ ├── scrape_sec_filings.py # SEC EDGAR filings + ownership
|
||||
│ ├── scrape_sedar.py # SEDAR+ filings + AGM + tax
|
||||
│
|
||||
├── 💰 FINANCIAL ANALYSIS
|
||||
│ ├── financial_calculator.py # Calculate ALL metrics from base numbers
|
||||
│ ├── database.py # SQLite database operations
|
||||
│ ├── export_csv.py # Export to CSV format
|
||||
│
|
||||
├── 📚 DOCUMENTATION
|
||||
│ ├── PRODUCTION_READY.md # This file
|
||||
│ ├── GUIDE.md # Detailed usage guide
|
||||
│ ├── SUMMARY.md # What was built
|
||||
│ ├── QUICKREF.md # Quick reference card
|
||||
│ ├── README.md # Technical plan
|
||||
│
|
||||
├── 📂 DATA (Created automatically)
|
||||
│ ├── listings/ # Stock listings (JSON)
|
||||
│ ├── financials/ # Yahoo Finance data (JSON)
|
||||
│ ├── metrics/ # Calculated metrics (JSON)
|
||||
│ ├── news/ # Direct scraped news (JSON)
|
||||
│ ├── serpapi_news/ # SerpAPI news (JSON)
|
||||
│ ├── sec_filings/ # SEC filings + ownership (JSON)
|
||||
│ ├── sedar_filings/ # SEDAR+ filings + AGM + tax (JSON)
|
||||
│ ├── reports/ # Comprehensive text reports
|
||||
│ ├── exports/ # CSV exports
|
||||
│ └── stocks.db # SQLite database
|
||||
```
|
||||
|
||||
## 🔥 Key Features
|
||||
|
||||
### 1. Complete Regulatory Filings
|
||||
- **SEC EDGAR**: 10-K, 10-Q, 8-K, DEF 14A
|
||||
- **Ownership Forms**: Forms 3, 4, 5, 13D, 13G (insider/founder shares)
|
||||
- **SEDAR+**: Annual reports, financials, MD&A, circulars
|
||||
- **AGM Information**: Date, location, agenda from circulars
|
||||
- **Tax Disclosures**: Extracted from financial statement notes
|
||||
|
||||
### 2. Calculated Financial Metrics
|
||||
All metrics from Step 4 of README:
|
||||
- **Valuation**: P/E, PEG, P/B, P/S, EV/EBITDA, Dividend Yield
|
||||
- **Profitability**: Margins, ROE, ROA, ROIC
|
||||
- **Leverage**: Debt/Equity, Interest Coverage
|
||||
- **Liquidity**: Current, Quick, Cash ratios
|
||||
- **Efficiency**: Turnover ratios, Days metrics
|
||||
- **Growth**: YoY growth rates
|
||||
- **Cash Flow**: FCF Yield, Operating CF ratio
|
||||
|
||||
### 3. Ownership Data
|
||||
- Founder shareholdings
|
||||
- Insider ownership
|
||||
- Major shareholders (13D/13G filings)
|
||||
- Director and officer holdings
|
||||
- Recent transactions (Form 4)
|
||||
|
||||
### 4. Robust Data Collection
|
||||
- **Primary**: Direct web scraping
|
||||
- **Fallback**: SerpAPI for guaranteed news/PR collection
|
||||
- **API Key Included**: Already configured in `config.py`
|
||||
|
||||
### 5. Daily Automation Ready
|
||||
```bash
|
||||
# Setup cron job for daily 2 AM updates
|
||||
python daily_automation.py --setup-cron
|
||||
```
|
||||
|
||||
## 📊 CSV Exports
|
||||
|
||||
The system creates these CSV files:
|
||||
|
||||
1. **stocks_export.csv** - Basic stock list with coverage status
|
||||
2. **stocks_detailed.csv** - All financial metrics
|
||||
3. **news_summary.csv** - All news articles
|
||||
4. **filings_summary.csv** - All regulatory filings
|
||||
|
||||
## 🎓 Usage Examples
|
||||
|
||||
### Example 1: Initial Setup
|
||||
```bash
|
||||
# Install
|
||||
pip install -r requirements.txt
|
||||
python3 -m playwright install chromium
|
||||
|
||||
# Test with 3 stocks
|
||||
python main_robust.py --test 3
|
||||
|
||||
# If successful, run full extraction
|
||||
python main_robust.py --full
|
||||
```
|
||||
|
||||
### Example 2: Daily Updates
|
||||
```bash
|
||||
# Update a specific stock
|
||||
python main_robust.py --ticker AAPL
|
||||
|
||||
# Or update all stocks
|
||||
python daily_automation.py --daily
|
||||
```
|
||||
|
||||
### Example 3: Analyze Results
|
||||
```bash
|
||||
# Export to CSV
|
||||
python export_csv.py
|
||||
|
||||
# Open CSV in Excel/Numbers
|
||||
open data/exports/stocks_detailed.csv
|
||||
|
||||
# Or analyze in Python
|
||||
python analyze.py
|
||||
```
|
||||
|
||||
### Example 4: Query Database
|
||||
```python
|
||||
import sqlite3
|
||||
|
||||
conn = sqlite3.connect('data/stocks.db')
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Find all tech stocks
|
||||
cursor.execute("SELECT symbol, company_name FROM stocks_master WHERE sector='Technology'")
|
||||
print(cursor.fetchall())
|
||||
|
||||
# Get stocks with P/E < 15
|
||||
cursor.execute("""
|
||||
SELECT s.symbol, m.pe_ratio
|
||||
FROM stocks_master s
|
||||
JOIN financial_metrics m ON s.id = m.stock_id
|
||||
WHERE m.pe_ratio < 15 AND m.pe_ratio > 0
|
||||
ORDER BY m.pe_ratio
|
||||
""")
|
||||
print(cursor.fetchall())
|
||||
```
|
||||
|
||||
## 🔄 Update Frequencies
|
||||
|
||||
| Data Type | Frequency | Command |
|
||||
|-----------|-----------|---------|
|
||||
| Listings | Quarterly | `python main_robust.py --full` |
|
||||
| Financials | Daily | `python daily_automation.py --daily` |
|
||||
| News | Daily | `python daily_automation.py --daily` |
|
||||
| Filings | Daily | `python daily_automation.py --daily` |
|
||||
| Metrics | Daily | Auto-calculated after financials |
|
||||
| CSV Exports | Daily | Auto-generated after updates |
|
||||
|
||||
## 🎯 What Gets Collected Per Stock
|
||||
|
||||
For each stock, the system collects:
|
||||
|
||||
### Financial Data
|
||||
- Current price, market cap
|
||||
- 3 years of financial statements
|
||||
- TTM (trailing twelve months) data
|
||||
- All calculated metrics (40+ ratios)
|
||||
|
||||
### News & Press Releases
|
||||
- Last 12 months of news articles
|
||||
- Official press releases
|
||||
- Source, date, URL, snippet for each
|
||||
|
||||
### Regulatory Filings
|
||||
- **US Stocks**: 10-K, 10-Q, 8-K, proxies
|
||||
- **Canadian Stocks**: Annual reports, financials, MD&A
|
||||
- AGM date, location, agenda
|
||||
- Tax disclosure details
|
||||
|
||||
### Ownership Information
|
||||
- Founder shareholdings
|
||||
- Insider ownership (directors, officers)
|
||||
- Major shareholders (>5%)
|
||||
- Recent buying/selling activity
|
||||
|
||||
### Comprehensive Report
|
||||
- Text file combining all data
|
||||
- Human-readable format
|
||||
- Updated daily
|
||||
|
||||
## 💡 Pro Tips
|
||||
|
||||
1. **Start Small**: Test with 5-10 stocks first
|
||||
2. **Check Coverage**: Query `coverage_report` table to see completeness
|
||||
3. **Use SerpAPI**: More reliable than direct scraping for news
|
||||
4. **Schedule Wisely**: Run during off-peak hours (2-4 AM)
|
||||
5. **Monitor Logs**: Check for errors and missing data
|
||||
6. **Export Daily**: CSV exports make analysis easier
|
||||
|
||||
## 🐛 Troubleshooting
|
||||
|
||||
### "No CIK found" (SEC)
|
||||
- Stock may not be US-listed
|
||||
- Try alternative ticker format
|
||||
|
||||
### "No SEDAR results"
|
||||
- SEDAR+ structure may have changed
|
||||
- Check saved HTML files for debugging
|
||||
|
||||
### "SerpAPI limit exceeded"
|
||||
- Check credit balance on SerpAPI dashboard
|
||||
- Reduce frequency of updates
|
||||
|
||||
### "Rate limited"
|
||||
- Increase delays in scripts
|
||||
- Spread updates throughout the day
|
||||
|
||||
## 📞 Support & Customization
|
||||
|
||||
All scripts are well-documented and can be customized:
|
||||
|
||||
- **Modify scrapers**: Update selectors in scraper files
|
||||
- **Add exchanges**: Extend `extract_listings.py`
|
||||
- **Change frequencies**: Edit `config.py`
|
||||
- **Custom metrics**: Add to `financial_calculator.py`
|
||||
- **Different exports**: Modify `export_csv.py`
|
||||
|
||||
## ✅ Verification Checklist
|
||||
|
||||
After running, verify:
|
||||
|
||||
- [ ] Stock listings extracted (`data/listings/`)
|
||||
- [ ] Database populated (`data/stocks.db`)
|
||||
- [ ] Financials scraped (`data/financials/`)
|
||||
- [ ] Metrics calculated (`data/metrics/`)
|
||||
- [ ] News collected (`data/serpapi_news/`)
|
||||
- [ ] Filings downloaded (`data/sec_filings/`, `data/sedar_filings/`)
|
||||
- [ ] Reports generated (`data/reports/`)
|
||||
- [ ] CSV files created (`data/exports/`)
|
||||
|
||||
## 🚀 Ready to Go!
|
||||
|
||||
Your system is production-ready and includes everything your boss requested:
|
||||
|
||||
✅ AGM reports
|
||||
✅ Tax filings
|
||||
✅ SEC filings
|
||||
✅ SEDAR+ filings
|
||||
✅ Founder/insider ownership
|
||||
✅ All financial metrics calculated
|
||||
✅ Daily automation capability
|
||||
✅ CSV exports
|
||||
✅ Robust data collection with SerpAPI
|
||||
|
||||
**Start with:**
|
||||
```bash
|
||||
python main_robust.py --test 5
|
||||
```
|
||||
|
||||
**Then run daily:**
|
||||
```bash
|
||||
python daily_automation.py --daily
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** November 6, 2025
|
||||
**System Status:** ✅ Production Ready
|
||||
**API Key:** Configured in `config.py`
|
||||
Reference in New Issue
Block a user