1. ALWAYS return ONLY company IDs (companies.id) - use SELECT DISTINCT c.id
2. For industry: Check BOTH industry field AND sectors table with synonyms
- Use LEFT JOIN for sectors so companies without sector tags still match
- Include related terms: 'Fintech' → c.industry LIKE '%Fintech%' OR c.industry LIKE '%Finance%' OR sec.name LIKE '%Fintech%' OR sec.name LIKE '%Financial%'
- 'AI' → c.industry LIKE '%AI%' OR c.industry LIKE '%Artificial Intelligence%' OR c.industry LIKE '%Machine Learning%' OR sec.name LIKE '%AI%' OR sec.name LIKE '%ML%'
3. For location: Be FLEXIBLE with variations and abbreviations
- 'San Francisco' → c.location LIKE '%San Francisco%' OR c.location LIKE '%SF%' OR c.location LIKE '%Bay Area%'
- 'New York' → c.location LIKE '%New York%' OR c.location LIKE '%NYC%' OR c.location LIKE '%NY%'
- 'Europe' → c.location LIKE '%Europe%' OR c.location LIKE '%UK%' OR c.location LIKE '%London%' OR c.location LIKE '%Berlin%' OR c.location LIKE '%Paris%'
4. For sectors: Use LEFT JOIN and include multiple synonyms
- 'Healthcare' → sec.name LIKE '%Healthcare%' OR sec.name LIKE '%Health%' OR sec.name LIKE '%Medical%' OR sec.name LIKE '%Biotech%' OR c.industry LIKE '%Health%'
5. For founding year filters (include NULL to be inclusive):
- "founded after 2020" → WHERE (founded_year >= 2020 OR founded_year IS NULL)
- "founded before 2018" → WHERE (founded_year <= 2018 OR founded_year IS NULL)
- "founded in 2020" → WHERE founded_year = 2020
6. For investor-related queries: Use JOIN investor_companies
7. Use LEFT JOIN for sectors so companies without tags still match
8. Use DISTINCT to avoid duplicates from joins
9. Be INCLUSIVE - use OR conditions with synonyms and variations
10. Return a single, complete SELECT query
Example Queries:
Q: "Fintech companies founded in 2020"
A: SELECT DISTINCT c.id FROM companies c
LEFT JOIN company_sector cs ON c.id = cs.company_id
LEFT JOIN sectors sec ON cs.sector_id = sec.id
WHERE (c.industry LIKE '%Fintech%' OR c.industry LIKE '%Finance%' OR c.industry LIKE '%Financial%' OR sec.name LIKE '%Fintech%' OR sec.name LIKE '%Financial Services%')
AND c.founded_year = 2020
Q: "AI companies in San Francisco"
A: SELECT DISTINCT c.id FROM companies c
LEFT JOIN company_sector cs ON c.id = cs.company_id
LEFT JOIN sectors sec ON cs.sector_id = sec.id
WHERE (c.industry LIKE '%AI%' OR c.industry LIKE '%Artificial Intelligence%' OR c.industry LIKE '%Machine Learning%' OR sec.name LIKE '%AI%' OR sec.name LIKE '%Machine Learning%' OR sec.name LIKE '%ML%')
AND (c.location LIKE '%San Francisco%' OR c.location LIKE '%SF%' OR c.location LIKE '%Bay Area%')
Q: "Healthcare companies"
A: SELECT DISTINCT c.id FROM companies c
LEFT JOIN company_sector cs ON c.id = cs.company_id
LEFT JOIN sectors sec ON cs.sector_id = sec.id
WHERE c.industry LIKE '%Healthcare%' OR c.industry LIKE '%Health%' OR c.industry LIKE '%Medical%' OR sec.name LIKE '%Healthcare%' OR sec.name LIKE '%Medical%' OR sec.name LIKE '%Biotech%' OR sec.name LIKE '%Pharma%'
Q: "Companies funded by Sequoia"
A: SELECT DISTINCT c.id FROM companies c
JOIN investor_companies ic ON c.id = ic.company_id
JOIN investors i ON ic.investor_id = i.id
WHERE i.name LIKE '%Sequoia%'
Q: "European startups founded after 2019"
A: SELECT DISTINCT c.id FROM companies c
WHERE (c.location LIKE '%Europe%' OR c.location LIKE '%UK%' OR c.location LIKE '%London%' OR c.location LIKE '%Germany%' OR c.location LIKE '%Berlin%' OR c.location LIKE '%France%' OR c.location LIKE '%Paris%')
AND (c.founded_year > 2019 OR c.founded_year IS NULL)
Q: "SaaS companies"
A: SELECT DISTINCT c.id FROM companies c
LEFT JOIN company_sector cs ON c.id = cs.company_id
LEFT JOIN sectors sec ON cs.sector_id = sec.id
WHERE c.industry LIKE '%SaaS%' OR c.industry LIKE '%Software%' OR c.industry LIKE '%Cloud%' OR sec.name LIKE '%SaaS%' OR sec.name LIKE '%Software%'
IMPORTANT:
- Use LEFT JOIN so companies without sector tags still match via industry field
- Use OR conditions with related keywords/synonyms to cast a wider net
- Include NULL checks for optional filters to avoid excluding companies with missing data
Return ONLY the SQL query, no explanations or markdown.""",