80ee708348
- Added `extract_listings.py` for extracting stock listings from TSX, TSXV, CSE, and CBOE using Playwright. - Created `main.py` to orchestrate the entire stock intelligence system, including extraction, database import, financial scraping, news scraping, and report generation. - Developed `populate_database.py` to populate the database with existing JSON data. - Introduced `scrape_nasdaq_tsx_only.py` for focused scraping of NASDAQ and TSX stocks. - Added `setup.py` for initial setup and testing of the system. - Created `watchlist.txt` template for user-defined stock tracking. - Generated `final_test_output.txt` to log the results of the test run.
346 lines
9.9 KiB
Markdown
346 lines
9.9 KiB
Markdown
# 🚀 PRODUCTION-READY Stock Intelligence System
|
|
|
|
## ✅ COMPLETE IMPLEMENTATION
|
|
|
|
Your boss's requirements have been fully implemented:
|
|
|
|
### What's Included:
|
|
- ✅ **Annual General Meeting Reports** - Scraped from SEDAR+ and SEC filings
|
|
- ✅ **Tax Filings** - Extracted from annual reports and 10-K filings
|
|
- ✅ **SEC Filings** - 10-K, 10-Q, 8-K, DEF 14A, ownership forms (3, 4, 5, 13D, 13G)
|
|
- ✅ **SEDAR+ Filings** - All Canadian regulatory filings
|
|
- ✅ **Founder/Insider Ownership** - Extracted from proxy statements and ownership filings
|
|
- ✅ **Calculated Financial Metrics** - All ratios computed from base numbers (Step 4 formulas)
|
|
- ✅ **Daily Updates** - Can run daily on any stock or full universe
|
|
- ✅ **CSV Export** - Complete data export in CSV format
|
|
- ✅ **SerpAPI Integration** - Robust news/PR scraping with API key: `68231e3b3a973a01483aaf098af6040d41e66f284f11abb15b8d9a005ac0f44d`
|
|
|
|
## 📦 Installation
|
|
|
|
```bash
|
|
cd /Users/macbook/Desktop/Victor
|
|
|
|
# Install all dependencies
|
|
pip install -r requirements.txt
|
|
|
|
# Install Playwright browser
|
|
python3 -m playwright install chromium
|
|
```
|
|
|
|
## 🎯 How To Use
|
|
|
|
### 1. Initial Full Extraction (Run Once)
|
|
```bash
|
|
# Extract all stocks and complete data
|
|
python main_robust.py --full
|
|
```
|
|
|
|
### 2. Test Mode (Recommended First)
|
|
```bash
|
|
# Test with 5 stocks
|
|
python main_robust.py --test 5
|
|
|
|
# Test with 10 stocks
|
|
python main_robust.py --test 10
|
|
```
|
|
|
|
### 3. Daily Update (Single Stock)
|
|
```bash
|
|
# Update specific stock
|
|
python main_robust.py --ticker AAPL
|
|
python main_robust.py --ticker SHOP
|
|
python main_robust.py --ticker CVV
|
|
```
|
|
|
|
### 4. Daily Automation (All Stocks)
|
|
```bash
|
|
# Run daily update for all stocks
|
|
python daily_automation.py --daily
|
|
```
|
|
|
|
### 5. Watchlist Mode
|
|
```bash
|
|
# Create watchlist.txt with tickers (one per line)
|
|
echo "AAPL" > watchlist.txt
|
|
echo "MSFT" >> watchlist.txt
|
|
echo "TSLA" >> watchlist.txt
|
|
|
|
# Update only watchlist
|
|
python daily_automation.py --watchlist
|
|
```
|
|
|
|
### 6. Export to CSV
|
|
```bash
|
|
# Export all data to CSV files
|
|
python export_csv.py
|
|
```
|
|
|
|
## 📁 Complete File Structure
|
|
|
|
```
|
|
Victor/
|
|
├── 🎯 MAIN SCRIPTS
|
|
│ ├── main_robust.py # Production-ready main orchestrator
|
|
│ ├── daily_automation.py # Daily update automation
|
|
│ ├── config.py # Configuration (includes SerpAPI key)
|
|
│
|
|
├── 📊 DATA COLLECTION MODULES
|
|
│ ├── extract_listings.py # Extract stock listings from exchanges
|
|
│ ├── scrape_yahoo_finance.py # Financial data from Yahoo Finance
|
|
│ ├── scrape_news_pr.py # News & PR (direct scraping)
|
|
│ ├── scrape_serpapi.py # News & PR (using SerpAPI - ROBUST)
|
|
│ ├── scrape_sec_filings.py # SEC EDGAR filings + ownership
|
|
│ ├── scrape_sedar.py # SEDAR+ filings + AGM + tax
|
|
│
|
|
├── 💰 FINANCIAL ANALYSIS
|
|
│ ├── financial_calculator.py # Calculate ALL metrics from base numbers
|
|
│ ├── database.py # SQLite database operations
|
|
│ ├── export_csv.py # Export to CSV format
|
|
│
|
|
├── 📚 DOCUMENTATION
|
|
│ ├── PRODUCTION_READY.md # This file
|
|
│ ├── GUIDE.md # Detailed usage guide
|
|
│ ├── SUMMARY.md # What was built
|
|
│ ├── QUICKREF.md # Quick reference card
|
|
│ ├── README.md # Technical plan
|
|
│
|
|
├── 📂 DATA (Created automatically)
|
|
│ ├── listings/ # Stock listings (JSON)
|
|
│ ├── financials/ # Yahoo Finance data (JSON)
|
|
│ ├── metrics/ # Calculated metrics (JSON)
|
|
│ ├── news/ # Direct scraped news (JSON)
|
|
│ ├── serpapi_news/ # SerpAPI news (JSON)
|
|
│ ├── sec_filings/ # SEC filings + ownership (JSON)
|
|
│ ├── sedar_filings/ # SEDAR+ filings + AGM + tax (JSON)
|
|
│ ├── reports/ # Comprehensive text reports
|
|
│ ├── exports/ # CSV exports
|
|
│ └── stocks.db # SQLite database
|
|
```
|
|
|
|
## 🔥 Key Features
|
|
|
|
### 1. Complete Regulatory Filings
|
|
- **SEC EDGAR**: 10-K, 10-Q, 8-K, DEF 14A
|
|
- **Ownership Forms**: Forms 3, 4, 5, 13D, 13G (insider/founder shares)
|
|
- **SEDAR+**: Annual reports, financials, MD&A, circulars
|
|
- **AGM Information**: Date, location, agenda from circulars
|
|
- **Tax Disclosures**: Extracted from financial statement notes
|
|
|
|
### 2. Calculated Financial Metrics
|
|
All metrics from Step 4 of README:
|
|
- **Valuation**: P/E, PEG, P/B, P/S, EV/EBITDA, Dividend Yield
|
|
- **Profitability**: Margins, ROE, ROA, ROIC
|
|
- **Leverage**: Debt/Equity, Interest Coverage
|
|
- **Liquidity**: Current, Quick, Cash ratios
|
|
- **Efficiency**: Turnover ratios, Days metrics
|
|
- **Growth**: YoY growth rates
|
|
- **Cash Flow**: FCF Yield, Operating CF ratio
|
|
|
|
### 3. Ownership Data
|
|
- Founder shareholdings
|
|
- Insider ownership
|
|
- Major shareholders (13D/13G filings)
|
|
- Director and officer holdings
|
|
- Recent transactions (Form 4)
|
|
|
|
### 4. Robust Data Collection
|
|
- **Primary**: Direct web scraping
|
|
- **Fallback**: SerpAPI for guaranteed news/PR collection
|
|
- **API Key Included**: Already configured in `config.py`
|
|
|
|
### 5. Daily Automation Ready
|
|
```bash
|
|
# Setup cron job for daily 2 AM updates
|
|
python daily_automation.py --setup-cron
|
|
```
|
|
|
|
## 📊 CSV Exports
|
|
|
|
The system creates these CSV files:
|
|
|
|
1. **stocks_export.csv** - Basic stock list with coverage status
|
|
2. **stocks_detailed.csv** - All financial metrics
|
|
3. **news_summary.csv** - All news articles
|
|
4. **filings_summary.csv** - All regulatory filings
|
|
|
|
## 🎓 Usage Examples
|
|
|
|
### Example 1: Initial Setup
|
|
```bash
|
|
# Install
|
|
pip install -r requirements.txt
|
|
python3 -m playwright install chromium
|
|
|
|
# Test with 3 stocks
|
|
python main_robust.py --test 3
|
|
|
|
# If successful, run full extraction
|
|
python main_robust.py --full
|
|
```
|
|
|
|
### Example 2: Daily Updates
|
|
```bash
|
|
# Update a specific stock
|
|
python main_robust.py --ticker AAPL
|
|
|
|
# Or update all stocks
|
|
python daily_automation.py --daily
|
|
```
|
|
|
|
### Example 3: Analyze Results
|
|
```bash
|
|
# Export to CSV
|
|
python export_csv.py
|
|
|
|
# Open CSV in Excel/Numbers
|
|
open data/exports/stocks_detailed.csv
|
|
|
|
# Or analyze in Python
|
|
python analyze.py
|
|
```
|
|
|
|
### Example 4: Query Database
|
|
```python
|
|
import sqlite3
|
|
|
|
conn = sqlite3.connect('data/stocks.db')
|
|
cursor = conn.cursor()
|
|
|
|
# Find all tech stocks
|
|
cursor.execute("SELECT symbol, company_name FROM stocks_master WHERE sector='Technology'")
|
|
print(cursor.fetchall())
|
|
|
|
# Get stocks with P/E < 15
|
|
cursor.execute("""
|
|
SELECT s.symbol, m.pe_ratio
|
|
FROM stocks_master s
|
|
JOIN financial_metrics m ON s.id = m.stock_id
|
|
WHERE m.pe_ratio < 15 AND m.pe_ratio > 0
|
|
ORDER BY m.pe_ratio
|
|
""")
|
|
print(cursor.fetchall())
|
|
```
|
|
|
|
## 🔄 Update Frequencies
|
|
|
|
| Data Type | Frequency | Command |
|
|
|-----------|-----------|---------|
|
|
| Listings | Quarterly | `python main_robust.py --full` |
|
|
| Financials | Daily | `python daily_automation.py --daily` |
|
|
| News | Daily | `python daily_automation.py --daily` |
|
|
| Filings | Daily | `python daily_automation.py --daily` |
|
|
| Metrics | Daily | Auto-calculated after financials |
|
|
| CSV Exports | Daily | Auto-generated after updates |
|
|
|
|
## 🎯 What Gets Collected Per Stock
|
|
|
|
For each stock, the system collects:
|
|
|
|
### Financial Data
|
|
- Current price, market cap
|
|
- 3 years of financial statements
|
|
- TTM (trailing twelve months) data
|
|
- All calculated metrics (40+ ratios)
|
|
|
|
### News & Press Releases
|
|
- Last 12 months of news articles
|
|
- Official press releases
|
|
- Source, date, URL, snippet for each
|
|
|
|
### Regulatory Filings
|
|
- **US Stocks**: 10-K, 10-Q, 8-K, proxies
|
|
- **Canadian Stocks**: Annual reports, financials, MD&A
|
|
- AGM date, location, agenda
|
|
- Tax disclosure details
|
|
|
|
### Ownership Information
|
|
- Founder shareholdings
|
|
- Insider ownership (directors, officers)
|
|
- Major shareholders (>5%)
|
|
- Recent buying/selling activity
|
|
|
|
### Comprehensive Report
|
|
- Text file combining all data
|
|
- Human-readable format
|
|
- Updated daily
|
|
|
|
## 💡 Pro Tips
|
|
|
|
1. **Start Small**: Test with 5-10 stocks first
|
|
2. **Check Coverage**: Query `coverage_report` table to see completeness
|
|
3. **Use SerpAPI**: More reliable than direct scraping for news
|
|
4. **Schedule Wisely**: Run during off-peak hours (2-4 AM)
|
|
5. **Monitor Logs**: Check for errors and missing data
|
|
6. **Export Daily**: CSV exports make analysis easier
|
|
|
|
## 🐛 Troubleshooting
|
|
|
|
### "No CIK found" (SEC)
|
|
- Stock may not be US-listed
|
|
- Try alternative ticker format
|
|
|
|
### "No SEDAR results"
|
|
- SEDAR+ structure may have changed
|
|
- Check saved HTML files for debugging
|
|
|
|
### "SerpAPI limit exceeded"
|
|
- Check credit balance on SerpAPI dashboard
|
|
- Reduce frequency of updates
|
|
|
|
### "Rate limited"
|
|
- Increase delays in scripts
|
|
- Spread updates throughout the day
|
|
|
|
## 📞 Support & Customization
|
|
|
|
All scripts are well-documented and can be customized:
|
|
|
|
- **Modify scrapers**: Update selectors in scraper files
|
|
- **Add exchanges**: Extend `extract_listings.py`
|
|
- **Change frequencies**: Edit `config.py`
|
|
- **Custom metrics**: Add to `financial_calculator.py`
|
|
- **Different exports**: Modify `export_csv.py`
|
|
|
|
## ✅ Verification Checklist
|
|
|
|
After running, verify:
|
|
|
|
- [ ] Stock listings extracted (`data/listings/`)
|
|
- [ ] Database populated (`data/stocks.db`)
|
|
- [ ] Financials scraped (`data/financials/`)
|
|
- [ ] Metrics calculated (`data/metrics/`)
|
|
- [ ] News collected (`data/serpapi_news/`)
|
|
- [ ] Filings downloaded (`data/sec_filings/`, `data/sedar_filings/`)
|
|
- [ ] Reports generated (`data/reports/`)
|
|
- [ ] CSV files created (`data/exports/`)
|
|
|
|
## 🚀 Ready to Go!
|
|
|
|
Your system is production-ready and includes everything your boss requested:
|
|
|
|
✅ AGM reports
|
|
✅ Tax filings
|
|
✅ SEC filings
|
|
✅ SEDAR+ filings
|
|
✅ Founder/insider ownership
|
|
✅ All financial metrics calculated
|
|
✅ Daily automation capability
|
|
✅ CSV exports
|
|
✅ Robust data collection with SerpAPI
|
|
|
|
**Start with:**
|
|
```bash
|
|
python main_robust.py --test 5
|
|
```
|
|
|
|
**Then run daily:**
|
|
```bash
|
|
python daily_automation.py --daily
|
|
```
|
|
|
|
---
|
|
|
|
**Last Updated:** November 6, 2025
|
|
**System Status:** ✅ Production Ready
|
|
**API Key:** Configured in `config.py`
|