Files
DS_TASK_AI_VIEWS/docs/API_Documentation.md
Aherobo Ovie Victor beed04d05c feat: Complete all 4 major optimization tasks
 Network & Model Optimization:
- Fixed Sentence Transformers path to use local model
- Configured real semantic embeddings (384-dimensional)
- Replaced hash-based fallback with AI-powered similarity

 Advanced AI Features Integration:
- Added ai_analyzer.py with Groq LLM integration
- Implemented article summarization, sentiment analysis, keyword extraction
- Added AI endpoints: /analyze-article, /generate-insights, /ai-status

 API Enhancement & User Experience:
- Enhanced articles endpoint with pagination (offset/limit, metadata)
- Added advanced filtering (date ranges, source, category)
- Improved search with semantic similarity + multi-parameter filters

 Production Polish & Performance:
- Implemented in-memory caching system in vector_store.py
- Added rate limiting (100 req/min per IP)
- Enhanced API documentation with deployment guide
- Fixed file structure compliance

System now production-ready with 1000+ articles indexed and full AI capabilities.
2025-07-08 16:45:38 +01:00

635 lines
12 KiB
Markdown

# DS Task AI News - API Documentation
## Base URL
```
http://localhost:8000
```
## Authentication
Currently, no authentication is required. In production, consider implementing API keys or OAuth.
## Rate Limiting
- **Limit**: 100 requests per minute per IP address
- **Response**: HTTP 429 when limit exceeded
- **Headers**: No rate limit headers currently implemented
## Response Format
All API responses follow this structure:
```json
{
"success": true,
"message": "Optional message",
"data": {},
"count": 0
}
```
## Error Handling
Error responses include:
```json
{
"detail": "Error description",
"status_code": 400
}
```
## Caching
- **Articles endpoint**: 3-minute cache for improved performance
- **Search results**: In-memory caching with 5-minute TTL
- **Vector operations**: Cached for frequent similarity searches
---
## Endpoints
### 1. Health Check
**GET** `/`
Check if the API is running.
**Response:**
```json
{
"message": "DS Task AI News API is running!",
"version": "1.0.0",
"status": "healthy"
}
```
---
### 2. Detailed Health Check
**GET** `/health`
Get detailed system status and statistics.
**Response:**
```json
{
"status": "healthy",
"vector_store": {
"total_articles": 150,
"index_dimension": 384,
"index_exists": true,
"last_updated": "2025-07-07T16:00:00"
},
"settings": {
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
"vector_db_type": "faiss",
"rss_feeds_count": 3
}
}
```
---
### 3. Fetch News
**POST** `/fetch-news`
Fetch news from configured RSS feeds and add to vector store.
**Response:**
```json
{
"success": true,
"message": "News fetched and processed successfully",
"articles_fetched": 45,
"articles_stored": 45,
"total_articles": 195
}
```
**Error Response:**
```json
{
"detail": "Error fetching news: Connection timeout"
}
```
---
### 4. Get Recommendations by Article ID
**GET** `/recommend-news`
Get similar articles based on an existing article ID.
**Parameters:**
- `article_id` (required): ID of the reference article
- `top_k` (optional, default=5): Number of recommendations
**Example:**
```
GET /recommend-news?article_id=abc123&top_k=10
```
**Response:**
```json
{
"success": true,
"article_id": "abc123",
"recommendations": [
{
"id": "def456",
"title": "AI Breakthrough in Healthcare",
"content": "Recent developments in artificial intelligence...",
"url": "https://example.com/article",
"source": "TechNews",
"published_date": "2025-07-07T10:00:00",
"similarity_score": 0.89
}
],
"count": 1
}
```
---
### 5. Get Recommendations by Query
**POST** `/recommend-by-query`
Get article recommendations based on a text query.
**Request Body:**
```json
{
"query": "artificial intelligence healthcare",
"top_k": 5
}
```
**Response:**
```json
{
"success": true,
"query": "artificial intelligence healthcare",
"recommendations": [
{
"id": "xyz789",
"title": "AI Transforms Medical Diagnosis",
"content": "Machine learning algorithms are revolutionizing...",
"url": "https://example.com/ai-medical",
"source": "HealthTech",
"published_date": "2025-07-07T14:30:00",
"similarity_score": 0.92
}
],
"count": 1
}
```
---
### 6. Get Recommendations by Interests
**POST** `/recommend-by-interests`
Get recommendations based on user interests.
**Request Body:**
```json
{
"interests": ["artificial intelligence", "machine learning", "healthcare"],
"top_k": 10
}
```
**Response:**
```json
{
"success": true,
"interests": ["artificial intelligence", "machine learning", "healthcare"],
"recommendations": [...],
"count": 8
}
```
---
### 7. Get Trending Articles
**GET** `/trending`
Get trending (most recent) articles.
**Parameters:**
- `top_k` (optional, default=10): Number of articles to return
**Example:**
```
GET /trending?top_k=20
```
**Response:**
```json
{
"success": true,
"trending_articles": [
{
"id": "trend1",
"title": "Breaking: New AI Model Released",
"content": "A groundbreaking AI model has been announced...",
"url": "https://example.com/breaking-ai",
"source": "AI Weekly",
"published_date": "2025-07-07T16:00:00"
}
],
"count": 1
}
```
---
### 8. Get All Articles
**GET** `/articles`
Get all articles with optional filtering.
**Parameters:**
- `source` (optional): Filter by news source
- `limit` (optional, default=50): Maximum articles to return
**Example:**
```
GET /articles?source=BBC%20News&limit=25
```
**Response:**
```json
{
"success": true,
"articles": [...],
"count": 25,
"source_filter": "BBC News"
}
```
---
### 9. Advanced Search
**POST** `/search`
Advanced search with filters.
**Request Body:**
```json
{
"query": "climate change technology",
"source": "BBC News",
"top_k": 15
}
```
**Response:**
```json
{
"success": true,
"query": "climate change technology",
"filters": {
"source": "BBC News"
},
"results": [...],
"count": 12
}
```
---
### 10. Get Statistics
**GET** `/stats`
Get system statistics and information.
**Response:**
```json
{
"success": true,
"statistics": {
"total_articles": 200,
"index_dimension": 384,
"index_exists": true,
"rss_feeds": [
"https://feeds.bbci.co.uk/news/rss.xml",
"https://rss.cnn.com/rss/edition.rss"
],
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2"
}
}
```
---
### 11. Test RSS Feeds
**GET** `/test-rss`
Test RSS feed connectivity and parsing.
**Response:**
```json
{
"results": [
{
"url": "https://feeds.bbci.co.uk/news/rss.xml",
"title": "BBC News",
"entries_count": 32,
"success": true,
"sample_article": {
"title": "Tech Giants Announce AI Partnership",
"published": "Mon, 07 Jul 2025 16:00:00 GMT",
"link": "https://bbc.com/news/tech-partnership"
}
}
],
"timestamp": "2025-07-07T16:15:00"
}
```
---
## Interactive Documentation
FastAPI automatically generates interactive API documentation:
- **Swagger UI**: http://localhost:8000/docs
- **ReDoc**: http://localhost:8000/redoc
## Rate Limiting
Currently no rate limiting is implemented. Consider adding rate limiting in production:
- Per IP: 100 requests/minute
- Per endpoint: Varies based on computational cost
## CORS
CORS is enabled for all origins in development. In production, configure specific allowed origins.
## Error Codes
- **200**: Success
- **400**: Bad Request (invalid parameters)
- **404**: Not Found (article ID not found)
- **500**: Internal Server Error (system error)
## Data Models
### Article Object
```json
{
"id": "string",
"title": "string",
"content": "string",
"url": "string",
"source": "string",
"published_date": "ISO 8601 datetime",
"similarity_score": "float (0-1, only in recommendations)"
}
```
### Query Object
```json
{
"query": "string",
"top_k": "integer (1-100)"
}
```
## SDK Examples
### Python
```python
import requests
# Fetch news
response = requests.post("http://localhost:8000/fetch-news")
print(response.json())
# Get recommendations
response = requests.post(
"http://localhost:8000/recommend-by-query",
json={"query": "artificial intelligence", "top_k": 5}
)
recommendations = response.json()["recommendations"]
```
### JavaScript
```javascript
// Fetch news
fetch('http://localhost:8000/fetch-news', {method: 'POST'})
.then(response => response.json())
.then(data => console.log(data));
// Get recommendations
fetch('http://localhost:8000/recommend-by-query', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({
query: 'artificial intelligence',
top_k: 5
})
})
.then(response => response.json())
.then(data => console.log(data.recommendations));
```
---
## Deployment Guide
### Prerequisites
- Python 3.10+
- 4GB+ RAM (for Sentence Transformers model)
- 2GB+ disk space
### Local Development Setup
1. **Clone and Setup**
```bash
git clone <repository-url>
cd ds_task_ai_news
```
2. **Install Dependencies**
```bash
pip install -r backend/requirements.txt
```
3. **Environment Configuration**
Create `.env` file in root directory:
```env
# Optional API Keys
GROQ_API_KEY=your_groq_api_key_here
COHERE_API_KEY=your_cohere_api_key_here
# Server Settings
HOST=0.0.0.0
PORT=8000
DEBUG=true
# RSS Feeds (comma-separated)
RSS_FEEDS=https://feeds.bbci.co.uk/news/technology/rss.xml,https://techcrunch.com/feed/,https://www.wired.com/feed/rss
# Vector Database
VECTOR_DIMENSION=384
VECTOR_DB_TYPE=faiss
```
4. **Run the Application**
```bash
cd backend
python main.py
```
### Production Deployment
#### Docker Deployment
```dockerfile
FROM python:3.10-slim
WORKDIR /app
COPY backend/requirements.txt .
RUN pip install -r requirements.txt
COPY . .
WORKDIR /app/backend
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
```
#### Docker Compose
```yaml
version: '3.8'
services:
ai-news-api:
build: .
ports:
- "8000:8000"
environment:
- GROQ_API_KEY=${GROQ_API_KEY}
- COHERE_API_KEY=${COHERE_API_KEY}
volumes:
- ./data:/app/data
- ./models:/app/models
restart: unless-stopped
```
#### Nginx Configuration
```nginx
server {
listen 80;
server_name your-domain.com;
location / {
proxy_pass http://localhost:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
```
### Performance Optimization
#### Memory Management
- **Sentence Transformers**: Uses ~1GB RAM when loaded
- **FAISS Index**: Memory usage scales with article count
- **Caching**: In-memory cache uses ~50MB for typical workloads
#### Scaling Recommendations
- **Horizontal**: Use load balancer with multiple API instances
- **Vertical**: Increase RAM for larger article databases
- **Database**: Consider PostgreSQL for metadata storage at scale
### Monitoring and Maintenance
#### Health Checks
```bash
# Basic health check
curl http://localhost:8000/health
# System statistics
curl http://localhost:8000/stats
# AI analyzer status
curl http://localhost:8000/ai-status
```
#### Log Monitoring
```bash
# Application logs
tail -f /var/log/ai-news/app.log
# Error tracking
grep "ERROR" /var/log/ai-news/app.log
```
#### Backup Strategy
```bash
# Backup vector database
cp data/news_vectors.faiss backup/
cp data/news_vectors_metadata.pkl backup/
# Backup processed articles
tar -czf backup/articles_$(date +%Y%m%d).tar.gz data/processed_news/
```
### Troubleshooting
#### Common Issues
1. **Sentence Transformers Model Loading**
```bash
# Verify model exists
ls -la models/all-MiniLM-L6-v2/
# Test model loading
python -c "from sentence_transformers import SentenceTransformer; model = SentenceTransformer('./models/all-MiniLM-L6-v2'); print('Model loaded successfully')"
```
2. **FAISS Index Issues**
```bash
# Rebuild index
rm data/news_vectors.faiss data/news_vectors_metadata.pkl
# Restart application to rebuild
```
3. **Memory Issues**
```bash
# Check memory usage
free -h
# Monitor process memory
ps aux | grep python
```
#### Performance Tuning
- Adjust `RATE_LIMIT_REQUESTS` in main.py for your needs
- Modify cache TTL in vector_store.py
- Optimize `max_articles_per_feed` in config.py
### Security Considerations
#### Production Security
- Use HTTPS in production
- Implement proper API authentication
- Set up firewall rules
- Regular security updates
- Monitor for unusual traffic patterns
#### Environment Variables
Never commit sensitive data to version control:
```bash
# Use environment-specific .env files
.env.production
.env.staging
.env.development
```