# DS Task AI News - API Documentation ## Base URL ``` http://localhost:8000 ``` ## Authentication Currently, no authentication is required. In production, consider implementing API keys or OAuth. ## Rate Limiting - **Limit**: 100 requests per minute per IP address - **Response**: HTTP 429 when limit exceeded - **Headers**: No rate limit headers currently implemented ## Response Format All API responses follow this structure: ```json { "success": true, "message": "Optional message", "data": {}, "count": 0 } ``` ## Error Handling Error responses include: ```json { "detail": "Error description", "status_code": 400 } ``` ## Caching - **Articles endpoint**: 3-minute cache for improved performance - **Search results**: In-memory caching with 5-minute TTL - **Vector operations**: Cached for frequent similarity searches --- ## Endpoints ### 1. Health Check **GET** `/` Check if the API is running. **Response:** ```json { "message": "DS Task AI News API is running!", "version": "1.0.0", "status": "healthy" } ``` --- ### 2. Detailed Health Check **GET** `/health` Get detailed system status and statistics. **Response:** ```json { "status": "healthy", "vector_store": { "total_articles": 150, "index_dimension": 384, "index_exists": true, "last_updated": "2025-07-07T16:00:00" }, "settings": { "embedding_model": "sentence-transformers/all-MiniLM-L6-v2", "vector_db_type": "faiss", "rss_feeds_count": 3 } } ``` --- ### 3. Fetch News **POST** `/fetch-news` Fetch news from configured RSS feeds and add to vector store. **Response:** ```json { "success": true, "message": "News fetched and processed successfully", "articles_fetched": 45, "articles_stored": 45, "total_articles": 195 } ``` **Error Response:** ```json { "detail": "Error fetching news: Connection timeout" } ``` --- ### 4. Get Recommendations by Article ID **GET** `/recommend-news` Get similar articles based on an existing article ID. **Parameters:** - `article_id` (required): ID of the reference article - `top_k` (optional, default=5): Number of recommendations **Example:** ``` GET /recommend-news?article_id=abc123&top_k=10 ``` **Response:** ```json { "success": true, "article_id": "abc123", "recommendations": [ { "id": "def456", "title": "AI Breakthrough in Healthcare", "content": "Recent developments in artificial intelligence...", "url": "https://example.com/article", "source": "TechNews", "published_date": "2025-07-07T10:00:00", "similarity_score": 0.89 } ], "count": 1 } ``` --- ### 5. Get Recommendations by Query **POST** `/recommend-by-query` Get article recommendations based on a text query. **Request Body:** ```json { "query": "artificial intelligence healthcare", "top_k": 5 } ``` **Response:** ```json { "success": true, "query": "artificial intelligence healthcare", "recommendations": [ { "id": "xyz789", "title": "AI Transforms Medical Diagnosis", "content": "Machine learning algorithms are revolutionizing...", "url": "https://example.com/ai-medical", "source": "HealthTech", "published_date": "2025-07-07T14:30:00", "similarity_score": 0.92 } ], "count": 1 } ``` --- ### 6. Get Recommendations by Interests **POST** `/recommend-by-interests` Get recommendations based on user interests. **Request Body:** ```json { "interests": ["artificial intelligence", "machine learning", "healthcare"], "top_k": 10 } ``` **Response:** ```json { "success": true, "interests": ["artificial intelligence", "machine learning", "healthcare"], "recommendations": [...], "count": 8 } ``` --- ### 7. Get Trending Articles **GET** `/trending` Get trending (most recent) articles. **Parameters:** - `top_k` (optional, default=10): Number of articles to return **Example:** ``` GET /trending?top_k=20 ``` **Response:** ```json { "success": true, "trending_articles": [ { "id": "trend1", "title": "Breaking: New AI Model Released", "content": "A groundbreaking AI model has been announced...", "url": "https://example.com/breaking-ai", "source": "AI Weekly", "published_date": "2025-07-07T16:00:00" } ], "count": 1 } ``` --- ### 8. Get All Articles **GET** `/articles` Get all articles with optional filtering. **Parameters:** - `source` (optional): Filter by news source - `limit` (optional, default=50): Maximum articles to return **Example:** ``` GET /articles?source=BBC%20News&limit=25 ``` **Response:** ```json { "success": true, "articles": [...], "count": 25, "source_filter": "BBC News" } ``` --- ### 9. Advanced Search **POST** `/search` Advanced search with filters. **Request Body:** ```json { "query": "climate change technology", "source": "BBC News", "top_k": 15 } ``` **Response:** ```json { "success": true, "query": "climate change technology", "filters": { "source": "BBC News" }, "results": [...], "count": 12 } ``` --- ### 10. Get Statistics **GET** `/stats` Get system statistics and information. **Response:** ```json { "success": true, "statistics": { "total_articles": 200, "index_dimension": 384, "index_exists": true, "rss_feeds": [ "https://feeds.bbci.co.uk/news/rss.xml", "https://rss.cnn.com/rss/edition.rss" ], "embedding_model": "sentence-transformers/all-MiniLM-L6-v2" } } ``` --- ### 11. Test RSS Feeds **GET** `/test-rss` Test RSS feed connectivity and parsing. **Response:** ```json { "results": [ { "url": "https://feeds.bbci.co.uk/news/rss.xml", "title": "BBC News", "entries_count": 32, "success": true, "sample_article": { "title": "Tech Giants Announce AI Partnership", "published": "Mon, 07 Jul 2025 16:00:00 GMT", "link": "https://bbc.com/news/tech-partnership" } } ], "timestamp": "2025-07-07T16:15:00" } ``` --- ## Interactive Documentation FastAPI automatically generates interactive API documentation: - **Swagger UI**: http://localhost:8000/docs - **ReDoc**: http://localhost:8000/redoc ## Rate Limiting Currently no rate limiting is implemented. Consider adding rate limiting in production: - Per IP: 100 requests/minute - Per endpoint: Varies based on computational cost ## CORS CORS is enabled for all origins in development. In production, configure specific allowed origins. ## Error Codes - **200**: Success - **400**: Bad Request (invalid parameters) - **404**: Not Found (article ID not found) - **500**: Internal Server Error (system error) ## Data Models ### Article Object ```json { "id": "string", "title": "string", "content": "string", "url": "string", "source": "string", "published_date": "ISO 8601 datetime", "similarity_score": "float (0-1, only in recommendations)" } ``` ### Query Object ```json { "query": "string", "top_k": "integer (1-100)" } ``` ## SDK Examples ### Python ```python import requests # Fetch news response = requests.post("http://localhost:8000/fetch-news") print(response.json()) # Get recommendations response = requests.post( "http://localhost:8000/recommend-by-query", json={"query": "artificial intelligence", "top_k": 5} ) recommendations = response.json()["recommendations"] ``` ### JavaScript ```javascript // Fetch news fetch('http://localhost:8000/fetch-news', {method: 'POST'}) .then(response => response.json()) .then(data => console.log(data)); // Get recommendations fetch('http://localhost:8000/recommend-by-query', { method: 'POST', headers: {'Content-Type': 'application/json'}, body: JSON.stringify({ query: 'artificial intelligence', top_k: 5 }) }) .then(response => response.json()) .then(data => console.log(data.recommendations)); ``` --- ## Deployment Guide ### Prerequisites - Python 3.10+ - 4GB+ RAM (for Sentence Transformers model) - 2GB+ disk space ### Local Development Setup 1. **Clone and Setup** ```bash git clone cd ds_task_ai_news ``` 2. **Install Dependencies** ```bash pip install -r backend/requirements.txt ``` 3. **Environment Configuration** Create `.env` file in root directory: ```env # Optional API Keys GROQ_API_KEY=your_groq_api_key_here COHERE_API_KEY=your_cohere_api_key_here # Server Settings HOST=0.0.0.0 PORT=8000 DEBUG=true # RSS Feeds (comma-separated) RSS_FEEDS=https://feeds.bbci.co.uk/news/technology/rss.xml,https://techcrunch.com/feed/,https://www.wired.com/feed/rss # Vector Database VECTOR_DIMENSION=384 VECTOR_DB_TYPE=faiss ``` 4. **Run the Application** ```bash cd backend python main.py ``` ### Production Deployment #### Docker Deployment ```dockerfile FROM python:3.10-slim WORKDIR /app COPY backend/requirements.txt . RUN pip install -r requirements.txt COPY . . WORKDIR /app/backend EXPOSE 8000 CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"] ``` #### Docker Compose ```yaml version: '3.8' services: ai-news-api: build: . ports: - "8000:8000" environment: - GROQ_API_KEY=${GROQ_API_KEY} - COHERE_API_KEY=${COHERE_API_KEY} volumes: - ./data:/app/data - ./models:/app/models restart: unless-stopped ``` #### Nginx Configuration ```nginx server { listen 80; server_name your-domain.com; location / { proxy_pass http://localhost:8000; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } } ``` ### Performance Optimization #### Memory Management - **Sentence Transformers**: Uses ~1GB RAM when loaded - **FAISS Index**: Memory usage scales with article count - **Caching**: In-memory cache uses ~50MB for typical workloads #### Scaling Recommendations - **Horizontal**: Use load balancer with multiple API instances - **Vertical**: Increase RAM for larger article databases - **Database**: Consider PostgreSQL for metadata storage at scale ### Monitoring and Maintenance #### Health Checks ```bash # Basic health check curl http://localhost:8000/health # System statistics curl http://localhost:8000/stats # AI analyzer status curl http://localhost:8000/ai-status ``` #### Log Monitoring ```bash # Application logs tail -f /var/log/ai-news/app.log # Error tracking grep "ERROR" /var/log/ai-news/app.log ``` #### Backup Strategy ```bash # Backup vector database cp data/news_vectors.faiss backup/ cp data/news_vectors_metadata.pkl backup/ # Backup processed articles tar -czf backup/articles_$(date +%Y%m%d).tar.gz data/processed_news/ ``` ### Troubleshooting #### Common Issues 1. **Sentence Transformers Model Loading** ```bash # Verify model exists ls -la models/all-MiniLM-L6-v2/ # Test model loading python -c "from sentence_transformers import SentenceTransformer; model = SentenceTransformer('./models/all-MiniLM-L6-v2'); print('Model loaded successfully')" ``` 2. **FAISS Index Issues** ```bash # Rebuild index rm data/news_vectors.faiss data/news_vectors_metadata.pkl # Restart application to rebuild ``` 3. **Memory Issues** ```bash # Check memory usage free -h # Monitor process memory ps aux | grep python ``` #### Performance Tuning - Adjust `RATE_LIMIT_REQUESTS` in main.py for your needs - Modify cache TTL in vector_store.py - Optimize `max_articles_per_feed` in config.py ### Security Considerations #### Production Security - Use HTTPS in production - Implement proper API authentication - Set up firewall rules - Regular security updates - Monitor for unusual traffic patterns #### Environment Variables Never commit sensitive data to version control: ```bash # Use environment-specific .env files .env.production .env.staging .env.development ```