feat: Implement complete RSS news fetching system with multi-source support

This commit is contained in:
Aherobo Ovie Victor
2025-07-07 18:31:38 +01:00
parent c158262a49
commit e188af8b17
22 changed files with 2210 additions and 0 deletions
+430
View File
@@ -0,0 +1,430 @@
# DS Task AI News - API Documentation
## Base URL
```
http://localhost:8000
```
## Authentication
Currently, no authentication is required. In production, consider implementing API keys or OAuth.
## Response Format
All API responses follow this structure:
```json
{
"success": true,
"message": "Optional message",
"data": {},
"count": 0
}
```
## Error Handling
Error responses include:
```json
{
"detail": "Error description",
"status_code": 400
}
```
---
## Endpoints
### 1. Health Check
**GET** `/`
Check if the API is running.
**Response:**
```json
{
"message": "DS Task AI News API is running!",
"version": "1.0.0",
"status": "healthy"
}
```
---
### 2. Detailed Health Check
**GET** `/health`
Get detailed system status and statistics.
**Response:**
```json
{
"status": "healthy",
"vector_store": {
"total_articles": 150,
"index_dimension": 384,
"index_exists": true,
"last_updated": "2025-07-07T16:00:00"
},
"settings": {
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
"vector_db_type": "faiss",
"rss_feeds_count": 3
}
}
```
---
### 3. Fetch News
**POST** `/fetch-news`
Fetch news from configured RSS feeds and add to vector store.
**Response:**
```json
{
"success": true,
"message": "News fetched and processed successfully",
"articles_fetched": 45,
"articles_stored": 45,
"total_articles": 195
}
```
**Error Response:**
```json
{
"detail": "Error fetching news: Connection timeout"
}
```
---
### 4. Get Recommendations by Article ID
**GET** `/recommend-news`
Get similar articles based on an existing article ID.
**Parameters:**
- `article_id` (required): ID of the reference article
- `top_k` (optional, default=5): Number of recommendations
**Example:**
```
GET /recommend-news?article_id=abc123&top_k=10
```
**Response:**
```json
{
"success": true,
"article_id": "abc123",
"recommendations": [
{
"id": "def456",
"title": "AI Breakthrough in Healthcare",
"content": "Recent developments in artificial intelligence...",
"url": "https://example.com/article",
"source": "TechNews",
"published_date": "2025-07-07T10:00:00",
"similarity_score": 0.89
}
],
"count": 1
}
```
---
### 5. Get Recommendations by Query
**POST** `/recommend-by-query`
Get article recommendations based on a text query.
**Request Body:**
```json
{
"query": "artificial intelligence healthcare",
"top_k": 5
}
```
**Response:**
```json
{
"success": true,
"query": "artificial intelligence healthcare",
"recommendations": [
{
"id": "xyz789",
"title": "AI Transforms Medical Diagnosis",
"content": "Machine learning algorithms are revolutionizing...",
"url": "https://example.com/ai-medical",
"source": "HealthTech",
"published_date": "2025-07-07T14:30:00",
"similarity_score": 0.92
}
],
"count": 1
}
```
---
### 6. Get Recommendations by Interests
**POST** `/recommend-by-interests`
Get recommendations based on user interests.
**Request Body:**
```json
{
"interests": ["artificial intelligence", "machine learning", "healthcare"],
"top_k": 10
}
```
**Response:**
```json
{
"success": true,
"interests": ["artificial intelligence", "machine learning", "healthcare"],
"recommendations": [...],
"count": 8
}
```
---
### 7. Get Trending Articles
**GET** `/trending`
Get trending (most recent) articles.
**Parameters:**
- `top_k` (optional, default=10): Number of articles to return
**Example:**
```
GET /trending?top_k=20
```
**Response:**
```json
{
"success": true,
"trending_articles": [
{
"id": "trend1",
"title": "Breaking: New AI Model Released",
"content": "A groundbreaking AI model has been announced...",
"url": "https://example.com/breaking-ai",
"source": "AI Weekly",
"published_date": "2025-07-07T16:00:00"
}
],
"count": 1
}
```
---
### 8. Get All Articles
**GET** `/articles`
Get all articles with optional filtering.
**Parameters:**
- `source` (optional): Filter by news source
- `limit` (optional, default=50): Maximum articles to return
**Example:**
```
GET /articles?source=BBC%20News&limit=25
```
**Response:**
```json
{
"success": true,
"articles": [...],
"count": 25,
"source_filter": "BBC News"
}
```
---
### 9. Advanced Search
**POST** `/search`
Advanced search with filters.
**Request Body:**
```json
{
"query": "climate change technology",
"source": "BBC News",
"top_k": 15
}
```
**Response:**
```json
{
"success": true,
"query": "climate change technology",
"filters": {
"source": "BBC News"
},
"results": [...],
"count": 12
}
```
---
### 10. Get Statistics
**GET** `/stats`
Get system statistics and information.
**Response:**
```json
{
"success": true,
"statistics": {
"total_articles": 200,
"index_dimension": 384,
"index_exists": true,
"rss_feeds": [
"https://feeds.bbci.co.uk/news/rss.xml",
"https://rss.cnn.com/rss/edition.rss"
],
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2"
}
}
```
---
### 11. Test RSS Feeds
**GET** `/test-rss`
Test RSS feed connectivity and parsing.
**Response:**
```json
{
"results": [
{
"url": "https://feeds.bbci.co.uk/news/rss.xml",
"title": "BBC News",
"entries_count": 32,
"success": true,
"sample_article": {
"title": "Tech Giants Announce AI Partnership",
"published": "Mon, 07 Jul 2025 16:00:00 GMT",
"link": "https://bbc.com/news/tech-partnership"
}
}
],
"timestamp": "2025-07-07T16:15:00"
}
```
---
## Interactive Documentation
FastAPI automatically generates interactive API documentation:
- **Swagger UI**: http://localhost:8000/docs
- **ReDoc**: http://localhost:8000/redoc
## Rate Limiting
Currently no rate limiting is implemented. Consider adding rate limiting in production:
- Per IP: 100 requests/minute
- Per endpoint: Varies based on computational cost
## CORS
CORS is enabled for all origins in development. In production, configure specific allowed origins.
## Error Codes
- **200**: Success
- **400**: Bad Request (invalid parameters)
- **404**: Not Found (article ID not found)
- **500**: Internal Server Error (system error)
## Data Models
### Article Object
```json
{
"id": "string",
"title": "string",
"content": "string",
"url": "string",
"source": "string",
"published_date": "ISO 8601 datetime",
"similarity_score": "float (0-1, only in recommendations)"
}
```
### Query Object
```json
{
"query": "string",
"top_k": "integer (1-100)"
}
```
## SDK Examples
### Python
```python
import requests
# Fetch news
response = requests.post("http://localhost:8000/fetch-news")
print(response.json())
# Get recommendations
response = requests.post(
"http://localhost:8000/recommend-by-query",
json={"query": "artificial intelligence", "top_k": 5}
)
recommendations = response.json()["recommendations"]
```
### JavaScript
```javascript
// Fetch news
fetch('http://localhost:8000/fetch-news', {method: 'POST'})
.then(response => response.json())
.then(data => console.log(data));
// Get recommendations
fetch('http://localhost:8000/recommend-by-query', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({
query: 'artificial intelligence',
top_k: 5
})
})
.then(response => response.json())
.then(data => console.log(data.recommendations));
```