DS_TASK_AI_VIEWS/docs/API_Documentation.md

# DS Task AI News - API Documentation

## Base URL
```
http://localhost:8000
```

## Authentication
Currently, no authentication is required. In production, consider implementing API keys or OAuth.

## Rate Limiting
- **Limit**: 100 requests per minute per IP address
- **Response**: HTTP 429 when limit exceeded
- **Headers**: No rate limit headers currently implemented

## Response Format
All API responses follow this structure:
```json
{
    "success": true,
    "message": "Optional message",
    "data": {},
    "count": 0
}
```

## Error Handling
Error responses include:
```json
{
    "detail": "Error description",
    "status_code": 400
}
```

## Caching
- **Articles endpoint**: 3-minute cache for improved performance
- **Search results**: In-memory caching with 5-minute TTL
- **Vector operations**: Cached for frequent similarity searches

---

## Endpoints

### 1. Health Check

**GET** `/`

Check if the API is running.

**Response:**
```json
{
    "message": "DS Task AI News API is running!",
    "version": "1.0.0",
    "status": "healthy"
}
```

---

### 2. Detailed Health Check

**GET** `/health`

Get detailed system status and statistics.

**Response:**
```json
{
    "status": "healthy",
    "vector_store": {
        "total_articles": 150,
        "index_dimension": 384,
        "index_exists": true,
        "last_updated": "2025-07-07T16:00:00"
    },
    "settings": {
        "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
        "vector_db_type": "faiss",
        "rss_feeds_count": 3
    }
}
```

---

### 3. Fetch News

**POST** `/fetch-news`

Fetch news from configured RSS feeds and add to vector store.

**Response:**
```json
{
    "success": true,
    "message": "News fetched and processed successfully",
    "articles_fetched": 45,
    "articles_stored": 45,
    "total_articles": 195
}
```

**Error Response:**
```json
{
    "detail": "Error fetching news: Connection timeout"
}
```

---

### 4. Get Recommendations by Article ID

**GET** `/recommend-news`

Get similar articles based on an existing article ID.

**Parameters:**
- `article_id` (required): ID of the reference article
- `top_k` (optional, default=5): Number of recommendations

**Example:**
```
GET /recommend-news?article_id=abc123&top_k=10
```

**Response:**
```json
{
    "success": true,
    "article_id": "abc123",
    "recommendations": [
        {
            "id": "def456",
            "title": "AI Breakthrough in Healthcare",
            "content": "Recent developments in artificial intelligence...",
            "url": "https://example.com/article",
            "source": "TechNews",
            "published_date": "2025-07-07T10:00:00",
            "similarity_score": 0.89
        }
    ],
    "count": 1
}
```

---

### 5. Get Recommendations by Query

**POST** `/recommend-by-query`

Get article recommendations based on a text query.

**Request Body:**
```json
{
    "query": "artificial intelligence healthcare",
    "top_k": 5
}
```

**Response:**
```json
{
    "success": true,
    "query": "artificial intelligence healthcare",
    "recommendations": [
        {
            "id": "xyz789",
            "title": "AI Transforms Medical Diagnosis",
            "content": "Machine learning algorithms are revolutionizing...",
            "url": "https://example.com/ai-medical",
            "source": "HealthTech",
            "published_date": "2025-07-07T14:30:00",
            "similarity_score": 0.92
        }
    ],
    "count": 1
}
```

---

### 6. Get Recommendations by Interests

**POST** `/recommend-by-interests`

Get recommendations based on user interests.

**Request Body:**
```json
{
    "interests": ["artificial intelligence", "machine learning", "healthcare"],
    "top_k": 10
}
```

**Response:**
```json
{
    "success": true,
    "interests": ["artificial intelligence", "machine learning", "healthcare"],
    "recommendations": [...],
    "count": 8
}
```

---

### 7. Get Trending Articles

**GET** `/trending`

Get trending (most recent) articles.

**Parameters:**
- `top_k` (optional, default=10): Number of articles to return

**Example:**
```
GET /trending?top_k=20
```

**Response:**
```json
{
    "success": true,
    "trending_articles": [
        {
            "id": "trend1",
            "title": "Breaking: New AI Model Released",
            "content": "A groundbreaking AI model has been announced...",
            "url": "https://example.com/breaking-ai",
            "source": "AI Weekly",
            "published_date": "2025-07-07T16:00:00"
        }
    ],
    "count": 1
}
```

---

### 8. Get All Articles

**GET** `/articles`

Get all articles with optional filtering.

**Parameters:**
- `source` (optional): Filter by news source
- `limit` (optional, default=50): Maximum articles to return

**Example:**
```
GET /articles?source=BBC%20News&limit=25
```

**Response:**
```json
{
    "success": true,
    "articles": [...],
    "count": 25,
    "source_filter": "BBC News"
}
```

---

### 9. Advanced Search

**POST** `/search`

Advanced search with filters.

**Request Body:**
```json
{
    "query": "climate change technology",
    "source": "BBC News",
    "top_k": 15
}
```

**Response:**
```json
{
    "success": true,
    "query": "climate change technology",
    "filters": {
        "source": "BBC News"
    },
    "results": [...],
    "count": 12
}
```

---

### 10. Get Statistics

**GET** `/stats`

Get system statistics and information.

**Response:**
```json
{
    "success": true,
    "statistics": {
        "total_articles": 200,
        "index_dimension": 384,
        "index_exists": true,
        "rss_feeds": [
            "https://feeds.bbci.co.uk/news/rss.xml",
            "https://rss.cnn.com/rss/edition.rss"
        ],
        "embedding_model": "sentence-transformers/all-MiniLM-L6-v2"
    }
}
```

---

### 11. Test RSS Feeds

**GET** `/test-rss`

Test RSS feed connectivity and parsing.

**Response:**
```json
{
    "results": [
        {
            "url": "https://feeds.bbci.co.uk/news/rss.xml",
            "title": "BBC News",
            "entries_count": 32,
            "success": true,
            "sample_article": {
                "title": "Tech Giants Announce AI Partnership",
                "published": "Mon, 07 Jul 2025 16:00:00 GMT",
                "link": "https://bbc.com/news/tech-partnership"
            }
        }
    ],
    "timestamp": "2025-07-07T16:15:00"
}
```

---

## Interactive Documentation

FastAPI automatically generates interactive API documentation:

- **Swagger UI**: http://localhost:8000/docs
- **ReDoc**: http://localhost:8000/redoc

## Rate Limiting

Currently no rate limiting is implemented. Consider adding rate limiting in production:
- Per IP: 100 requests/minute
- Per endpoint: Varies based on computational cost

## CORS

CORS is enabled for all origins in development. In production, configure specific allowed origins.

## Error Codes

- **200**: Success
- **400**: Bad Request (invalid parameters)
- **404**: Not Found (article ID not found)
- **500**: Internal Server Error (system error)

## Data Models

### Article Object
```json
{
    "id": "string",
    "title": "string",
    "content": "string",
    "url": "string",
    "source": "string",
    "published_date": "ISO 8601 datetime",
    "similarity_score": "float (0-1, only in recommendations)"
}
```

### Query Object
```json
{
    "query": "string",
    "top_k": "integer (1-100)"
}
```

## SDK Examples

### Python
```python
import requests

# Fetch news
response = requests.post("http://localhost:8000/fetch-news")
print(response.json())

# Get recommendations
response = requests.post(
    "http://localhost:8000/recommend-by-query",
    json={"query": "artificial intelligence", "top_k": 5}
)
recommendations = response.json()["recommendations"]
```

### JavaScript
```javascript
// Fetch news
fetch('http://localhost:8000/fetch-news', {method: 'POST'})
    .then(response => response.json())
    .then(data => console.log(data));

// Get recommendations
fetch('http://localhost:8000/recommend-by-query', {
    method: 'POST',
    headers: {'Content-Type': 'application/json'},
    body: JSON.stringify({
        query: 'artificial intelligence',
        top_k: 5
    })
})
.then(response => response.json())
.then(data => console.log(data.recommendations));
```

---

## Deployment Guide

### Prerequisites
- Python 3.10+
- 4GB+ RAM (for Sentence Transformers model)
- 2GB+ disk space

### Local Development Setup

1. **Clone and Setup**
```bash
git clone <repository-url>
cd ds_task_ai_news
```

2. **Install Dependencies**
```bash
pip install -r backend/requirements.txt
```

3. **Environment Configuration**
Create `.env` file in root directory:
```env
# Optional API Keys
GROQ_API_KEY=your_groq_api_key_here
COHERE_API_KEY=your_cohere_api_key_here

# Server Settings
HOST=0.0.0.0
PORT=8000
DEBUG=true

# RSS Feeds (comma-separated)
RSS_FEEDS=https://feeds.bbci.co.uk/news/technology/rss.xml,https://techcrunch.com/feed/,https://www.wired.com/feed/rss

# Vector Database
VECTOR_DIMENSION=384
VECTOR_DB_TYPE=faiss
```

4. **Run the Application**
```bash
cd backend
python main.py
```

### Production Deployment

#### Docker Deployment
```dockerfile
FROM python:3.10-slim

WORKDIR /app
COPY backend/requirements.txt .
RUN pip install -r requirements.txt

COPY . .
WORKDIR /app/backend

EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
```

#### Docker Compose
```yaml
version: '3.8'
services:
  ai-news-api:
    build: .
    ports:
      - "8000:8000"
    environment:
      - GROQ_API_KEY=${GROQ_API_KEY}
      - COHERE_API_KEY=${COHERE_API_KEY}
    volumes:
      - ./data:/app/data
      - ./models:/app/models
    restart: unless-stopped
```

#### Nginx Configuration
```nginx
server {
    listen 80;
    server_name your-domain.com;

    location / {
        proxy_pass http://localhost:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}
```

### Performance Optimization

#### Memory Management
- **Sentence Transformers**: Uses ~1GB RAM when loaded
- **FAISS Index**: Memory usage scales with article count
- **Caching**: In-memory cache uses ~50MB for typical workloads

#### Scaling Recommendations
- **Horizontal**: Use load balancer with multiple API instances
- **Vertical**: Increase RAM for larger article databases
- **Database**: Consider PostgreSQL for metadata storage at scale

### Monitoring and Maintenance

#### Health Checks
```bash
# Basic health check
curl http://localhost:8000/health

# System statistics
curl http://localhost:8000/stats

# AI analyzer status
curl http://localhost:8000/ai-status
```

#### Log Monitoring
```bash
# Application logs
tail -f /var/log/ai-news/app.log

# Error tracking
grep "ERROR" /var/log/ai-news/app.log
```

#### Backup Strategy
```bash
# Backup vector database
cp data/news_vectors.faiss backup/
cp data/news_vectors_metadata.pkl backup/

# Backup processed articles
tar -czf backup/articles_$(date +%Y%m%d).tar.gz data/processed_news/
```

### Troubleshooting

#### Common Issues

1. **Sentence Transformers Model Loading**
```bash
# Verify model exists
ls -la models/all-MiniLM-L6-v2/

# Test model loading
python -c "from sentence_transformers import SentenceTransformer; model = SentenceTransformer('./models/all-MiniLM-L6-v2'); print('Model loaded successfully')"
```

2. **FAISS Index Issues**
```bash
# Rebuild index
rm data/news_vectors.faiss data/news_vectors_metadata.pkl
# Restart application to rebuild
```

3. **Memory Issues**
```bash
# Check memory usage
free -h
# Monitor process memory
ps aux | grep python
```

#### Performance Tuning
- Adjust `RATE_LIMIT_REQUESTS` in main.py for your needs
- Modify cache TTL in vector_store.py
- Optimize `max_articles_per_feed` in config.py

### Security Considerations

#### Production Security
- Use HTTPS in production
- Implement proper API authentication
- Set up firewall rules
- Regular security updates
- Monitor for unusual traffic patterns

#### Environment Variables
Never commit sensitive data to version control:
```bash
# Use environment-specific .env files
.env.production
.env.staging
.env.development
```