2025-07-07 18:31:38 +01:00
|
|
|
# DS Task AI News - API Documentation
|
|
|
|
|
|
|
|
|
|
## Base URL
|
|
|
|
|
```
|
|
|
|
|
http://localhost:8000
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Authentication
|
|
|
|
|
Currently, no authentication is required. In production, consider implementing API keys or OAuth.
|
|
|
|
|
|
2025-07-08 16:45:38 +01:00
|
|
|
## Rate Limiting
|
|
|
|
|
- **Limit**: 100 requests per minute per IP address
|
|
|
|
|
- **Response**: HTTP 429 when limit exceeded
|
|
|
|
|
- **Headers**: No rate limit headers currently implemented
|
|
|
|
|
|
2025-07-07 18:31:38 +01:00
|
|
|
## Response Format
|
|
|
|
|
All API responses follow this structure:
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"success": true,
|
|
|
|
|
"message": "Optional message",
|
|
|
|
|
"data": {},
|
|
|
|
|
"count": 0
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Error Handling
|
|
|
|
|
Error responses include:
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"detail": "Error description",
|
|
|
|
|
"status_code": 400
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
2025-07-08 16:45:38 +01:00
|
|
|
## Caching
|
|
|
|
|
- **Articles endpoint**: 3-minute cache for improved performance
|
|
|
|
|
- **Search results**: In-memory caching with 5-minute TTL
|
|
|
|
|
- **Vector operations**: Cached for frequent similarity searches
|
|
|
|
|
|
2025-07-07 18:31:38 +01:00
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Endpoints
|
|
|
|
|
|
|
|
|
|
### 1. Health Check
|
|
|
|
|
|
|
|
|
|
**GET** `/`
|
|
|
|
|
|
|
|
|
|
Check if the API is running.
|
|
|
|
|
|
|
|
|
|
**Response:**
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"message": "DS Task AI News API is running!",
|
|
|
|
|
"version": "1.0.0",
|
|
|
|
|
"status": "healthy"
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 2. Detailed Health Check
|
|
|
|
|
|
|
|
|
|
**GET** `/health`
|
|
|
|
|
|
|
|
|
|
Get detailed system status and statistics.
|
|
|
|
|
|
|
|
|
|
**Response:**
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"status": "healthy",
|
|
|
|
|
"vector_store": {
|
|
|
|
|
"total_articles": 150,
|
|
|
|
|
"index_dimension": 384,
|
|
|
|
|
"index_exists": true,
|
|
|
|
|
"last_updated": "2025-07-07T16:00:00"
|
|
|
|
|
},
|
|
|
|
|
"settings": {
|
|
|
|
|
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
|
|
|
|
|
"vector_db_type": "faiss",
|
|
|
|
|
"rss_feeds_count": 3
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 3. Fetch News
|
|
|
|
|
|
|
|
|
|
**POST** `/fetch-news`
|
|
|
|
|
|
|
|
|
|
Fetch news from configured RSS feeds and add to vector store.
|
|
|
|
|
|
|
|
|
|
**Response:**
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"success": true,
|
|
|
|
|
"message": "News fetched and processed successfully",
|
|
|
|
|
"articles_fetched": 45,
|
|
|
|
|
"articles_stored": 45,
|
|
|
|
|
"total_articles": 195
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Error Response:**
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"detail": "Error fetching news: Connection timeout"
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 4. Get Recommendations by Article ID
|
|
|
|
|
|
|
|
|
|
**GET** `/recommend-news`
|
|
|
|
|
|
|
|
|
|
Get similar articles based on an existing article ID.
|
|
|
|
|
|
|
|
|
|
**Parameters:**
|
|
|
|
|
- `article_id` (required): ID of the reference article
|
|
|
|
|
- `top_k` (optional, default=5): Number of recommendations
|
|
|
|
|
|
|
|
|
|
**Example:**
|
|
|
|
|
```
|
|
|
|
|
GET /recommend-news?article_id=abc123&top_k=10
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Response:**
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"success": true,
|
|
|
|
|
"article_id": "abc123",
|
|
|
|
|
"recommendations": [
|
|
|
|
|
{
|
|
|
|
|
"id": "def456",
|
|
|
|
|
"title": "AI Breakthrough in Healthcare",
|
|
|
|
|
"content": "Recent developments in artificial intelligence...",
|
|
|
|
|
"url": "https://example.com/article",
|
|
|
|
|
"source": "TechNews",
|
|
|
|
|
"published_date": "2025-07-07T10:00:00",
|
|
|
|
|
"similarity_score": 0.89
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"count": 1
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 5. Get Recommendations by Query
|
|
|
|
|
|
|
|
|
|
**POST** `/recommend-by-query`
|
|
|
|
|
|
|
|
|
|
Get article recommendations based on a text query.
|
|
|
|
|
|
|
|
|
|
**Request Body:**
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"query": "artificial intelligence healthcare",
|
|
|
|
|
"top_k": 5
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Response:**
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"success": true,
|
|
|
|
|
"query": "artificial intelligence healthcare",
|
|
|
|
|
"recommendations": [
|
|
|
|
|
{
|
|
|
|
|
"id": "xyz789",
|
|
|
|
|
"title": "AI Transforms Medical Diagnosis",
|
|
|
|
|
"content": "Machine learning algorithms are revolutionizing...",
|
|
|
|
|
"url": "https://example.com/ai-medical",
|
|
|
|
|
"source": "HealthTech",
|
|
|
|
|
"published_date": "2025-07-07T14:30:00",
|
|
|
|
|
"similarity_score": 0.92
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"count": 1
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 6. Get Recommendations by Interests
|
|
|
|
|
|
|
|
|
|
**POST** `/recommend-by-interests`
|
|
|
|
|
|
|
|
|
|
Get recommendations based on user interests.
|
|
|
|
|
|
|
|
|
|
**Request Body:**
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"interests": ["artificial intelligence", "machine learning", "healthcare"],
|
|
|
|
|
"top_k": 10
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Response:**
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"success": true,
|
|
|
|
|
"interests": ["artificial intelligence", "machine learning", "healthcare"],
|
|
|
|
|
"recommendations": [...],
|
|
|
|
|
"count": 8
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 7. Get Trending Articles
|
|
|
|
|
|
|
|
|
|
**GET** `/trending`
|
|
|
|
|
|
|
|
|
|
Get trending (most recent) articles.
|
|
|
|
|
|
|
|
|
|
**Parameters:**
|
|
|
|
|
- `top_k` (optional, default=10): Number of articles to return
|
|
|
|
|
|
|
|
|
|
**Example:**
|
|
|
|
|
```
|
|
|
|
|
GET /trending?top_k=20
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Response:**
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"success": true,
|
|
|
|
|
"trending_articles": [
|
|
|
|
|
{
|
|
|
|
|
"id": "trend1",
|
|
|
|
|
"title": "Breaking: New AI Model Released",
|
|
|
|
|
"content": "A groundbreaking AI model has been announced...",
|
|
|
|
|
"url": "https://example.com/breaking-ai",
|
|
|
|
|
"source": "AI Weekly",
|
|
|
|
|
"published_date": "2025-07-07T16:00:00"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"count": 1
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 8. Get All Articles
|
|
|
|
|
|
|
|
|
|
**GET** `/articles`
|
|
|
|
|
|
|
|
|
|
Get all articles with optional filtering.
|
|
|
|
|
|
|
|
|
|
**Parameters:**
|
|
|
|
|
- `source` (optional): Filter by news source
|
|
|
|
|
- `limit` (optional, default=50): Maximum articles to return
|
|
|
|
|
|
|
|
|
|
**Example:**
|
|
|
|
|
```
|
|
|
|
|
GET /articles?source=BBC%20News&limit=25
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Response:**
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"success": true,
|
|
|
|
|
"articles": [...],
|
|
|
|
|
"count": 25,
|
|
|
|
|
"source_filter": "BBC News"
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 9. Advanced Search
|
|
|
|
|
|
|
|
|
|
**POST** `/search`
|
|
|
|
|
|
|
|
|
|
Advanced search with filters.
|
|
|
|
|
|
|
|
|
|
**Request Body:**
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"query": "climate change technology",
|
|
|
|
|
"source": "BBC News",
|
|
|
|
|
"top_k": 15
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Response:**
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"success": true,
|
|
|
|
|
"query": "climate change technology",
|
|
|
|
|
"filters": {
|
|
|
|
|
"source": "BBC News"
|
|
|
|
|
},
|
|
|
|
|
"results": [...],
|
|
|
|
|
"count": 12
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 10. Get Statistics
|
|
|
|
|
|
|
|
|
|
**GET** `/stats`
|
|
|
|
|
|
|
|
|
|
Get system statistics and information.
|
|
|
|
|
|
|
|
|
|
**Response:**
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"success": true,
|
|
|
|
|
"statistics": {
|
|
|
|
|
"total_articles": 200,
|
|
|
|
|
"index_dimension": 384,
|
|
|
|
|
"index_exists": true,
|
|
|
|
|
"rss_feeds": [
|
|
|
|
|
"https://feeds.bbci.co.uk/news/rss.xml",
|
|
|
|
|
"https://rss.cnn.com/rss/edition.rss"
|
|
|
|
|
],
|
|
|
|
|
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2"
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
### 11. Test RSS Feeds
|
|
|
|
|
|
|
|
|
|
**GET** `/test-rss`
|
|
|
|
|
|
|
|
|
|
Test RSS feed connectivity and parsing.
|
|
|
|
|
|
|
|
|
|
**Response:**
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"results": [
|
|
|
|
|
{
|
|
|
|
|
"url": "https://feeds.bbci.co.uk/news/rss.xml",
|
|
|
|
|
"title": "BBC News",
|
|
|
|
|
"entries_count": 32,
|
|
|
|
|
"success": true,
|
|
|
|
|
"sample_article": {
|
|
|
|
|
"title": "Tech Giants Announce AI Partnership",
|
|
|
|
|
"published": "Mon, 07 Jul 2025 16:00:00 GMT",
|
|
|
|
|
"link": "https://bbc.com/news/tech-partnership"
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"timestamp": "2025-07-07T16:15:00"
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Interactive Documentation
|
|
|
|
|
|
|
|
|
|
FastAPI automatically generates interactive API documentation:
|
|
|
|
|
|
|
|
|
|
- **Swagger UI**: http://localhost:8000/docs
|
|
|
|
|
- **ReDoc**: http://localhost:8000/redoc
|
|
|
|
|
|
|
|
|
|
## Rate Limiting
|
|
|
|
|
|
|
|
|
|
Currently no rate limiting is implemented. Consider adding rate limiting in production:
|
|
|
|
|
- Per IP: 100 requests/minute
|
|
|
|
|
- Per endpoint: Varies based on computational cost
|
|
|
|
|
|
|
|
|
|
## CORS
|
|
|
|
|
|
|
|
|
|
CORS is enabled for all origins in development. In production, configure specific allowed origins.
|
|
|
|
|
|
|
|
|
|
## Error Codes
|
|
|
|
|
|
|
|
|
|
- **200**: Success
|
|
|
|
|
- **400**: Bad Request (invalid parameters)
|
|
|
|
|
- **404**: Not Found (article ID not found)
|
|
|
|
|
- **500**: Internal Server Error (system error)
|
|
|
|
|
|
|
|
|
|
## Data Models
|
|
|
|
|
|
|
|
|
|
### Article Object
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"id": "string",
|
|
|
|
|
"title": "string",
|
|
|
|
|
"content": "string",
|
|
|
|
|
"url": "string",
|
|
|
|
|
"source": "string",
|
|
|
|
|
"published_date": "ISO 8601 datetime",
|
|
|
|
|
"similarity_score": "float (0-1, only in recommendations)"
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Query Object
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"query": "string",
|
|
|
|
|
"top_k": "integer (1-100)"
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## SDK Examples
|
|
|
|
|
|
|
|
|
|
### Python
|
|
|
|
|
```python
|
|
|
|
|
import requests
|
|
|
|
|
|
|
|
|
|
# Fetch news
|
|
|
|
|
response = requests.post("http://localhost:8000/fetch-news")
|
|
|
|
|
print(response.json())
|
|
|
|
|
|
|
|
|
|
# Get recommendations
|
|
|
|
|
response = requests.post(
|
|
|
|
|
"http://localhost:8000/recommend-by-query",
|
|
|
|
|
json={"query": "artificial intelligence", "top_k": 5}
|
|
|
|
|
)
|
|
|
|
|
recommendations = response.json()["recommendations"]
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### JavaScript
|
|
|
|
|
```javascript
|
|
|
|
|
// Fetch news
|
|
|
|
|
fetch('http://localhost:8000/fetch-news', {method: 'POST'})
|
|
|
|
|
.then(response => response.json())
|
|
|
|
|
.then(data => console.log(data));
|
|
|
|
|
|
|
|
|
|
// Get recommendations
|
|
|
|
|
fetch('http://localhost:8000/recommend-by-query', {
|
|
|
|
|
method: 'POST',
|
|
|
|
|
headers: {'Content-Type': 'application/json'},
|
|
|
|
|
body: JSON.stringify({
|
|
|
|
|
query: 'artificial intelligence',
|
|
|
|
|
top_k: 5
|
|
|
|
|
})
|
|
|
|
|
})
|
|
|
|
|
.then(response => response.json())
|
|
|
|
|
.then(data => console.log(data.recommendations));
|
|
|
|
|
```
|
2025-07-08 16:45:38 +01:00
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Deployment Guide
|
|
|
|
|
|
|
|
|
|
### Prerequisites
|
|
|
|
|
- Python 3.10+
|
|
|
|
|
- 4GB+ RAM (for Sentence Transformers model)
|
|
|
|
|
- 2GB+ disk space
|
|
|
|
|
|
|
|
|
|
### Local Development Setup
|
|
|
|
|
|
|
|
|
|
1. **Clone and Setup**
|
|
|
|
|
```bash
|
|
|
|
|
git clone <repository-url>
|
|
|
|
|
cd ds_task_ai_news
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
2. **Install Dependencies**
|
|
|
|
|
```bash
|
|
|
|
|
pip install -r backend/requirements.txt
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
3. **Environment Configuration**
|
|
|
|
|
Create `.env` file in root directory:
|
|
|
|
|
```env
|
|
|
|
|
# Optional API Keys
|
|
|
|
|
GROQ_API_KEY=your_groq_api_key_here
|
|
|
|
|
COHERE_API_KEY=your_cohere_api_key_here
|
|
|
|
|
|
|
|
|
|
# Server Settings
|
|
|
|
|
HOST=0.0.0.0
|
|
|
|
|
PORT=8000
|
|
|
|
|
DEBUG=true
|
|
|
|
|
|
|
|
|
|
# RSS Feeds (comma-separated)
|
|
|
|
|
RSS_FEEDS=https://feeds.bbci.co.uk/news/technology/rss.xml,https://techcrunch.com/feed/,https://www.wired.com/feed/rss
|
|
|
|
|
|
|
|
|
|
# Vector Database
|
|
|
|
|
VECTOR_DIMENSION=384
|
|
|
|
|
VECTOR_DB_TYPE=faiss
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
4. **Run the Application**
|
|
|
|
|
```bash
|
|
|
|
|
cd backend
|
|
|
|
|
python main.py
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Production Deployment
|
|
|
|
|
|
|
|
|
|
#### Docker Deployment
|
|
|
|
|
```dockerfile
|
|
|
|
|
FROM python:3.10-slim
|
|
|
|
|
|
|
|
|
|
WORKDIR /app
|
|
|
|
|
COPY backend/requirements.txt .
|
|
|
|
|
RUN pip install -r requirements.txt
|
|
|
|
|
|
|
|
|
|
COPY . .
|
|
|
|
|
WORKDIR /app/backend
|
|
|
|
|
|
|
|
|
|
EXPOSE 8000
|
|
|
|
|
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
#### Docker Compose
|
|
|
|
|
```yaml
|
|
|
|
|
version: '3.8'
|
|
|
|
|
services:
|
|
|
|
|
ai-news-api:
|
|
|
|
|
build: .
|
|
|
|
|
ports:
|
|
|
|
|
- "8000:8000"
|
|
|
|
|
environment:
|
|
|
|
|
- GROQ_API_KEY=${GROQ_API_KEY}
|
|
|
|
|
- COHERE_API_KEY=${COHERE_API_KEY}
|
|
|
|
|
volumes:
|
|
|
|
|
- ./data:/app/data
|
|
|
|
|
- ./models:/app/models
|
|
|
|
|
restart: unless-stopped
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
#### Nginx Configuration
|
|
|
|
|
```nginx
|
|
|
|
|
server {
|
|
|
|
|
listen 80;
|
|
|
|
|
server_name your-domain.com;
|
|
|
|
|
|
|
|
|
|
location / {
|
|
|
|
|
proxy_pass http://localhost:8000;
|
|
|
|
|
proxy_set_header Host $host;
|
|
|
|
|
proxy_set_header X-Real-IP $remote_addr;
|
|
|
|
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
|
|
|
|
proxy_set_header X-Forwarded-Proto $scheme;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Performance Optimization
|
|
|
|
|
|
|
|
|
|
#### Memory Management
|
|
|
|
|
- **Sentence Transformers**: Uses ~1GB RAM when loaded
|
|
|
|
|
- **FAISS Index**: Memory usage scales with article count
|
|
|
|
|
- **Caching**: In-memory cache uses ~50MB for typical workloads
|
|
|
|
|
|
|
|
|
|
#### Scaling Recommendations
|
|
|
|
|
- **Horizontal**: Use load balancer with multiple API instances
|
|
|
|
|
- **Vertical**: Increase RAM for larger article databases
|
|
|
|
|
- **Database**: Consider PostgreSQL for metadata storage at scale
|
|
|
|
|
|
|
|
|
|
### Monitoring and Maintenance
|
|
|
|
|
|
|
|
|
|
#### Health Checks
|
|
|
|
|
```bash
|
|
|
|
|
# Basic health check
|
|
|
|
|
curl http://localhost:8000/health
|
|
|
|
|
|
|
|
|
|
# System statistics
|
|
|
|
|
curl http://localhost:8000/stats
|
|
|
|
|
|
|
|
|
|
# AI analyzer status
|
|
|
|
|
curl http://localhost:8000/ai-status
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
#### Log Monitoring
|
|
|
|
|
```bash
|
|
|
|
|
# Application logs
|
|
|
|
|
tail -f /var/log/ai-news/app.log
|
|
|
|
|
|
|
|
|
|
# Error tracking
|
|
|
|
|
grep "ERROR" /var/log/ai-news/app.log
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
#### Backup Strategy
|
|
|
|
|
```bash
|
|
|
|
|
# Backup vector database
|
|
|
|
|
cp data/news_vectors.faiss backup/
|
|
|
|
|
cp data/news_vectors_metadata.pkl backup/
|
|
|
|
|
|
|
|
|
|
# Backup processed articles
|
|
|
|
|
tar -czf backup/articles_$(date +%Y%m%d).tar.gz data/processed_news/
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Troubleshooting
|
|
|
|
|
|
|
|
|
|
#### Common Issues
|
|
|
|
|
|
|
|
|
|
1. **Sentence Transformers Model Loading**
|
|
|
|
|
```bash
|
|
|
|
|
# Verify model exists
|
|
|
|
|
ls -la models/all-MiniLM-L6-v2/
|
|
|
|
|
|
|
|
|
|
# Test model loading
|
|
|
|
|
python -c "from sentence_transformers import SentenceTransformer; model = SentenceTransformer('./models/all-MiniLM-L6-v2'); print('Model loaded successfully')"
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
2. **FAISS Index Issues**
|
|
|
|
|
```bash
|
|
|
|
|
# Rebuild index
|
|
|
|
|
rm data/news_vectors.faiss data/news_vectors_metadata.pkl
|
|
|
|
|
# Restart application to rebuild
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
3. **Memory Issues**
|
|
|
|
|
```bash
|
|
|
|
|
# Check memory usage
|
|
|
|
|
free -h
|
|
|
|
|
# Monitor process memory
|
|
|
|
|
ps aux | grep python
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
#### Performance Tuning
|
|
|
|
|
- Adjust `RATE_LIMIT_REQUESTS` in main.py for your needs
|
|
|
|
|
- Modify cache TTL in vector_store.py
|
|
|
|
|
- Optimize `max_articles_per_feed` in config.py
|
|
|
|
|
|
|
|
|
|
### Security Considerations
|
|
|
|
|
|
|
|
|
|
#### Production Security
|
|
|
|
|
- Use HTTPS in production
|
|
|
|
|
- Implement proper API authentication
|
|
|
|
|
- Set up firewall rules
|
|
|
|
|
- Regular security updates
|
|
|
|
|
- Monitor for unusual traffic patterns
|
|
|
|
|
|
|
|
|
|
#### Environment Variables
|
|
|
|
|
Never commit sensitive data to version control:
|
|
|
|
|
```bash
|
|
|
|
|
# Use environment-specific .env files
|
|
|
|
|
.env.production
|
|
|
|
|
.env.staging
|
|
|
|
|
.env.development
|
|
|
|
|
```
|