Update README and backend functionality for improved news application

- Enhanced README.md with a clearer project overview, features, technologies used, and installation instructions.
- Updated vector dimension in config.py from 4096 to 1024 for Cohere embeddings.
- Modified main.py to serve HTML responses for the home page, news fetching, and recommendations.
- Improved error handling and ensured articles have links in the responses.
- Cleaned up news_fetcher.py by removing unnecessary print statements.
- Updated recommender.py to refine insights generation and summary extraction.
- Added Jinja2 for templating and improved the project structure for better organization.
- Included API documentation for better understanding of endpoints and usage.
This commit is contained in:
boladeE
2025-04-15 11:59:39 +01:00
parent e3d00bb4dc
commit bc485b44b8
14 changed files with 957 additions and 108 deletions
+90 -77
View File
@@ -1,93 +1,106 @@
# DS Task AI News
## Project Overview
An AI-powered news application that fetches, processes, and recommends news articles based on your interests.
DS Task AI News is an AI-powered news retrieval system that gathers news articles from various online sources, stores them in a vector database, and enables users to discover relevant articles based on their interests. The system uses advanced AI techniques to find and recommend related news articles dynamically.
## Overview
DS Task AI News is a web application that uses AI technologies to fetch, analyze, and recommend news articles. The application fetches news from various RSS feeds, processes them using AI, and provides personalized insights and recommendations.
## Features
* **News Aggregation** : Fetches news using RSS feeds from various online portals.
* **Vector Database Storage** : Stores news articles in a vector database for efficient similarity searches.
* **AI-powered Recommendations** : Uses Cohere embeddings and re-ranking to provide relevant news recommendations.
* **LLM-powered Analysis** : Utilizes Groq for AI-driven insights and processing.
- **Latest News**: View the latest news articles fetched from various RSS feeds.
- **News Recommendations**: Get personalized news recommendations based on your interests.
- **AI Insights**: Receive AI-generated insights about news articles.
- **Article Summaries**: Get concise summaries of individual articles.
## Tech Stack
## Technologies Used
* **LLM** : Groq
* **Search** : RSS Feeds for news aggregation
* **Embeddings & Re-Ranking** : Cohere
* **Vector Database** : (e.g., Pinecone, Weaviate, or FAISS)
* **Backend** : FastAPI
- **FastAPI**: Web framework for building APIs.
- **Jinja2**: Template engine for rendering HTML.
- **Tailwind CSS**: Utility-first CSS framework for styling.
- **feedparser**: Library for parsing RSS feeds.
- **BeautifulSoup**: Library for parsing HTML.
- **Cohere**: API for generating embeddings.
- **Pinecone**: Vector database for storing and retrieving embeddings.
- **Groq**: API for generating insights and summaries.
## File Structure
## Getting Started
### Prerequisites
- Python 3.8 or higher
- pip (Python package manager)
- Internet connection
### Installation
1. Clone the repository:
```
git clone https://github.com/yourusername/ds_task_ai_news.git
cd ds_task_ai_news
```
2. Install the required dependencies:
```
pip install -r requirements.txt
```
3. Set up the required environment variables:
- Create a `.env` file in the root directory with the following content:
```
GROQ_API_KEY=your_groq_api_key
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_ENVIRONMENT=your_pinecone_environment
PINECONE_INDEX=your_pinecone_index
```
4. Run the application:
```
python backend/main.py
```
5. Open your web browser and navigate to `http://localhost:8000`.
## Documentation
- [API Documentation](docs/API_Documentation.md): Detailed documentation of the API endpoints.
- [Technical Documentation](docs/Technical_Documentation.md): Technical details of the application architecture and components.
- [User Guide](docs/User_Guide.md): Guide for using the application.
## Project Structure
```
DS_Task_AI_News/
│-- backend/
│-- main.py # FastAPI backend
│-- news_fetcher.py # Fetches news using RSS feeds
│-- vector_store.py # Handles vector database operations
│-- embeddings.py # Generates embeddings using Cohere
│-- recommender.py # Fetches related news articles
│-- config.py # Configuration settings
│-- requirements.txt # Dependencies
│-- data/
│-- raw_news/ # Stores raw news articles before processing
│-- processed_news/ # Stores cleaned and processed articles
│-- docs/
│-- README.md # Documentation for new developers
│ │-- API_Documentation.md # API details
│-- .env # Environment variables
│-- .gitignore # Git ignore file
│-- LICENSE # License information
ds_task_ai_news/
├── backend/
├── main.py
├── news_fetcher.py
├── embeddings.py
├── vector_store.py
├── recommender.py
├── config.py
└── templates/
│ ├── base.html
│ ├── home.html
├── news.html
└── recommendations.html
├── data/
│ ├── raw_news/
└── processed_news/
├── docs/
│ ├── API_Documentation.md
│ ├── Technical_Documentation.md
│ └── User_Guide.md
└── requirements.txt
```
## Setup & Installation
## License
### 1. Clone the Repository
This project is licensed under the MIT License - see the LICENSE file for details.
```bash
git clone http://23.29.118.76:3000/Test/ds_task_ai_news
cd ds-task-ai-news
```
## Acknowledgments
### 2. Set Up the Backend
```bash
cd backend
pip install -r requirements.txt
python main.py
```
## Fetching News Using RSS Feeds
* News is aggregated from RSS feeds of different news sources.
* The `news_fetcher.py` script pulls data from RSS feeds, extracts relevant information, and stores it in the database.
### **Example RSS Fetching Code (Python)**
```python
import feedparser
def fetch_rss_news(feed_url):
feed = feedparser.parse(feed_url)
articles = []
for entry in feed.entries:
articles.append({
"title": entry.title,
"content": entry.summary,
"date": entry.published,
"slug": entry.title.lower().replace(" ", "-"),
"categories": ["Technology", "AI and Innovation"],
"tags": ["AI", "Technology", "Innovation"]
})
return articles
```
## API Endpoints
* `GET /fetch-news`: Fetches news from RSS feeds.
* `GET /recommend-news?article_id=xyz`: Retrieves similar news based on the selected article.
- [FastAPI](https://fastapi.tiangolo.com/)
- [Tailwind CSS](https://tailwindcss.com/)
- [Cohere](https://cohere.ai/)
- [Pinecone](https://www.pinecone.io/)
- [Groq](https://groq.com/)
+1 -1
View File
@@ -26,7 +26,7 @@ RSS_FEEDS = [
]
# Vector Database Settings
VECTOR_DIMENSION = 4096 # Cohere embedding dimension
VECTOR_DIMENSION = 1024 # Cohere embedding dimension
TOP_K_RESULTS = 5
# Data Directories
+68 -21
View File
@@ -1,5 +1,7 @@
from fastapi import FastAPI, HTTPException
from fastapi import FastAPI, HTTPException, Request
from fastapi.middleware.cors import CORSMiddleware
from fastapi.templating import Jinja2Templates
from fastapi.responses import HTMLResponse
from typing import List, Dict, Any
import json
import os
@@ -12,6 +14,19 @@ from config import RAW_NEWS_DIR, PROCESSED_NEWS_DIR
app = FastAPI(title="DS Task AI News API")
# Configure templates
templates = Jinja2Templates(directory="backend/templates")
# Add custom filters
def from_json(value):
"""Parse a JSON string into a Python object."""
try:
return json.loads(value)
except (json.JSONDecodeError, TypeError):
return None
templates.env.filters["from_json"] = from_json
# Add CORS middleware
app.add_middleware(
CORSMiddleware,
@@ -27,34 +42,51 @@ embedding_generator = EmbeddingGenerator()
vector_store = VectorStore()
recommender = NewsRecommender()
@app.get("/")
async def root():
"""Root endpoint returning API information."""
return {
"name": "DS Task AI News API",
"version": "1.0.0",
"description": "AI-powered news retrieval and recommendation system"
}
@app.get("/", response_class=HTMLResponse)
async def root(request: Request):
"""Root endpoint returning the home page with links to other routes."""
return templates.TemplateResponse(
"home.html",
{"request": request}
)
@app.get("/fetch-news")
async def fetch_news():
@app.get("/fetch-news", response_class=HTMLResponse)
async def fetch_news(request: Request):
"""Fetch news from RSS feeds and store in vector database."""
try:
result = news_fetcher.process()
if result["status"] == "error":
raise HTTPException(status_code=404, detail=result["message"])
return result
# Get the latest processed articles
processed_files = sorted(os.listdir(PROCESSED_NEWS_DIR), reverse=True)
if not processed_files:
raise HTTPException(status_code=404, detail="No processed articles found")
latest_file = os.path.join(PROCESSED_NEWS_DIR, processed_files[0])
with open(latest_file, 'r', encoding='utf-8') as f:
articles = json.load(f)
# Ensure each article has a link
for article in articles:
if 'link' not in article or not article['link']:
# If no link is available, use the article ID as a fallback
article['link'] = f"/article/{article.get('id', '')}"
return templates.TemplateResponse(
"news.html",
{"request": request, "articles": articles}
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.get("/recommend-news")
async def recommend_news(article_id: str = None, query: str = None):
@app.get("/recommend-news", response_class=HTMLResponse)
async def recommend_news(request: Request, article_id: str = None, query: str = None):
"""Get news recommendations based on article ID or search query."""
try:
if article_id:
# Get article from vector store
article = vector_store.search_similar([0] * 4096, top_k=1) # Placeholder vector
article = vector_store.search_similar([0] * 1024, top_k=1) # Placeholder vector with correct dimension
if not article:
raise HTTPException(status_code=404, detail="Article not found")
@@ -76,13 +108,23 @@ async def recommend_news(article_id: str = None, query: str = None):
if not similar_articles:
raise HTTPException(status_code=404, detail="No similar articles found")
# Ensure each article has a link
for article in similar_articles:
if 'link' not in article or not article['link']:
# If no link is available, use the article ID as a fallback
article['link'] = f"/article/{article.get('id', '')}"
# Generate insights for the articles
insights = recommender.analyze_articles(similar_articles)
return {
"articles": similar_articles,
"insights": insights
}
return templates.TemplateResponse(
"recommendations.html",
{
"request": request,
"articles": similar_articles,
"insights": insights
}
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@@ -91,12 +133,17 @@ async def get_article(article_id: str):
"""Get a specific article and its summary."""
try:
# Search for the article
articles = vector_store.search_similar([0] * 4096, top_k=1) # Placeholder vector
articles = vector_store.search_similar([0] * 1024, top_k=1) # Placeholder vector with correct dimension
if not articles:
raise HTTPException(status_code=404, detail="Article not found")
article = articles[0]
# Ensure the article has a link
if 'link' not in article or not article['link']:
# If no link is available, use the article ID as a fallback
article['link'] = f"/article/{article.get('id', '')}"
# Generate summary
summary = recommender.generate_summary(article)
@@ -109,4 +156,4 @@ async def get_article(article_id: str):
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
uvicorn.run(app, host="localhost", port=8000)
-2
View File
@@ -174,5 +174,3 @@ class NewsFetcher:
logger.info("News processing pipeline completed with status: %s", result["status"])
return result
news_fetcher = NewsFetcher()
print(news_fetcher.process())
+28 -6
View File
@@ -1,6 +1,7 @@
from groq import Groq
from typing import List, Dict, Any
from config import GROQ_API_KEY
import json
class NewsRecommender:
def __init__(self):
@@ -11,7 +12,7 @@ class NewsRecommender:
try:
# Prepare the prompt
articles_text = "\n\n".join([
f"Title: {article['title']}\nContent: {article['content']}"
f"Title: {article['title']}"
for article in articles
])
@@ -33,13 +34,34 @@ Format the response as a JSON with these keys: themes, insights, implications, r
{"role": "system", "content": "You are a news analyst providing insights about technology and AI news."},
{"role": "user", "content": prompt}
],
model="mixtral-8x7b-32768",
model="llama3-70b-8192",
temperature=0.7,
max_tokens=1000
max_tokens=500
)
# Parse and return the analysis
return completion.choices[0].message.content
response_text = completion.choices[0].message.content
# Try to extract JSON from the response if it's wrapped in markdown code blocks
if "```json" in response_text:
json_str = response_text.split("```json")[1].split("```")[0].strip()
try:
return json.loads(json_str)
except json.JSONDecodeError:
pass
elif "```" in response_text:
json_str = response_text.split("```")[1].split("```")[0].strip()
try:
return json.loads(json_str)
except json.JSONDecodeError:
pass
# If we couldn't extract JSON, try to parse the entire response
try:
return json.loads(response_text)
except json.JSONDecodeError:
# If all parsing attempts fail, return the raw text
return response_text
except Exception as e:
print(f"Error analyzing articles: {str(e)}")
return {
@@ -64,9 +86,9 @@ Please provide a concise summary focusing on the key points and implications."""
{"role": "system", "content": "You are a news summarizer providing concise summaries of technology and AI news."},
{"role": "user", "content": prompt}
],
model="mixtral-8x7b-32768",
model="llama3-70b-8192",
temperature=0.5,
max_tokens=500
max_tokens=250
)
return completion.choices[0].message.content
+1
View File
@@ -9,3 +9,4 @@ pydantic==2.6.3
python-multipart==0.0.9
httpx==0.27.0
beautifulsoup4==4.12.3
jinja2==3.1.2
+34
View File
@@ -0,0 +1,34 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>{% block title %}DS Task AI News{% endblock %}</title>
<link href="https://cdn.jsdelivr.net/npm/tailwindcss@2.2.19/dist/tailwind.min.css" rel="stylesheet">
<style>
.article-card {
transition: transform 0.2s;
}
.article-card:hover {
transform: translateY(-5px);
}
</style>
</head>
<body class="bg-gray-100 min-h-screen">
<nav class="bg-blue-600 text-white p-4">
<div class="container mx-auto">
<h1 class="text-2xl font-bold">DS Task AI News</h1>
</div>
</nav>
<main class="container mx-auto px-4 py-8">
{% block content %}{% endblock %}
</main>
<footer class="bg-gray-800 text-white p-4 mt-8">
<div class="container mx-auto text-center">
<p>&copy; 2024 DS Task AI News. All rights reserved.</p>
</div>
</footer>
</body>
</html>
+54
View File
@@ -0,0 +1,54 @@
{% extends "base.html" %}
{% block title %}Home - DS Task AI News{% endblock %}
{% block content %}
<div class="max-w-4xl mx-auto">
<div class="text-center mb-12">
<h1 class="text-4xl font-bold text-gray-800 mb-4">Welcome to DS Task AI News</h1>
<p class="text-xl text-gray-600">Your AI-powered news retrieval and recommendation system</p>
</div>
<div class="grid grid-cols-1 md:grid-cols-2 gap-8">
<!-- Fetch News Card -->
<div class="bg-white rounded-lg shadow-md overflow-hidden hover:shadow-lg transition-shadow duration-300">
<div class="p-6">
<h2 class="text-2xl font-semibold text-gray-800 mb-4">Latest News</h2>
<p class="text-gray-600 mb-6">View the latest news articles fetched from our RSS feeds.</p>
<a href="/fetch-news" class="inline-block bg-blue-600 text-white px-6 py-3 rounded-md font-medium hover:bg-blue-700 transition-colors duration-300">
View Latest News
</a>
</div>
</div>
<!-- Recommend News Card -->
<div class="bg-white rounded-lg shadow-md overflow-hidden hover:shadow-lg transition-shadow duration-300">
<div class="p-6">
<h2 class="text-2xl font-semibold text-gray-800 mb-4">News Recommendations</h2>
<p class="text-gray-600 mb-6">Get personalized news recommendations based on your interests.</p>
<div class="space-y-4">
<a href="/recommend-news?query=technology" class="block bg-blue-600 text-white px-6 py-3 rounded-md font-medium hover:bg-blue-700 transition-colors duration-300 text-center">
Technology News
</a>
<a href="/recommend-news?query=artificial intelligence" class="block bg-blue-600 text-white px-6 py-3 rounded-md font-medium hover:bg-blue-700 transition-colors duration-300 text-center">
AI News
</a>
</div>
</div>
</div>
</div>
<div class="mt-12 bg-white rounded-lg shadow-md p-6">
<h2 class="text-2xl font-semibold text-gray-800 mb-4">About This Application</h2>
<p class="text-gray-600 mb-4">
This application uses AI to fetch, process, and recommend news articles. It leverages:
</p>
<ul class="list-disc list-inside text-gray-600 space-y-2">
<li>RSS feeds for news collection</li>
<li>Cohere embeddings for semantic understanding</li>
<li>Pinecone vector database for efficient retrieval</li>
<li>AI-powered analysis for personalized recommendations</li>
</ul>
</div>
</div>
{% endblock %}
+42
View File
@@ -0,0 +1,42 @@
{% extends "base.html" %}
{% block title %}Latest News - DS Task AI News{% endblock %}
{% block content %}
<div class="space-y-6">
<h2 class="text-3xl font-bold text-gray-800 mb-6">Latest News Articles</h2>
<div class="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-6">
{% for article in articles %}
<article class="article-card bg-white rounded-lg shadow-md overflow-hidden">
<div class="p-6">
<h3 class="text-xl font-semibold text-gray-800 mb-2">
<a href="{{ article.link }}" target="_blank" class="hover:text-blue-600">
{{ article.title }}
</a>
</h3>
<p class="text-gray-600 mb-4">{{ article.content[:200] }}...</p>
<div class="flex justify-between items-center text-sm text-gray-500">
<span>{{ article.source }}</span>
<span>{{ article.published }}</span>
</div>
{% if article.categories %}
<div class="mt-4 flex flex-wrap gap-2">
{% for category in article.categories %}
<span class="px-2 py-1 bg-blue-100 text-blue-800 rounded-full text-xs">
{{ category }}
</span>
{% endfor %}
</div>
{% endif %}
<div class="mt-4">
<a href="{{ article.link }}" target="_blank" class="inline-block bg-blue-600 text-white px-4 py-2 rounded-md font-medium hover:bg-blue-700 transition-colors duration-300">
Read More
</a>
</div>
</div>
</article>
{% endfor %}
</div>
</div>
{% endblock %}
+157
View File
@@ -0,0 +1,157 @@
{% extends "base.html" %}
{% block title %}Recommended News - DS Task AI News{% endblock %}
{% block content %}
<div class="space-y-8">
<div class="bg-white rounded-lg shadow-md p-6 mb-8">
<h2 class="text-2xl font-bold text-gray-800 mb-4">AI Insights</h2>
<div class="prose max-w-none">
{% if insights %}
{% if insights is string %}
{# If insights is a string (JSON or markdown), try to parse it #}
{% set insights_data = insights | from_json %}
{% if insights_data %}
<div class="space-y-6">
{% if insights_data.themes %}
<div>
<h3 class="text-xl font-semibold text-gray-800 mb-2">Themes</h3>
<ul class="list-disc list-inside space-y-1">
{% for theme in insights_data.themes %}
<li class="text-gray-700">{{ theme }}</li>
{% endfor %}
</ul>
</div>
{% endif %}
{% if insights_data.insights %}
<div>
<h3 class="text-xl font-semibold text-gray-800 mb-2">Key Insights</h3>
<ul class="list-disc list-inside space-y-1">
{% for insight in insights_data.insights %}
<li class="text-gray-700">{{ insight }}</li>
{% endfor %}
</ul>
</div>
{% endif %}
{% if insights_data.implications %}
<div>
<h3 class="text-xl font-semibold text-gray-800 mb-2">Implications</h3>
<ul class="list-disc list-inside space-y-1">
{% for implication in insights_data.implications %}
<li class="text-gray-700">{{ implication }}</li>
{% endfor %}
</ul>
</div>
{% endif %}
{% if insights_data.related_areas %}
<div>
<h3 class="text-xl font-semibold text-gray-800 mb-2">Related Areas</h3>
<div class="flex flex-wrap gap-2">
{% for area in insights_data.related_areas %}
<span class="px-3 py-1 bg-blue-100 text-blue-800 rounded-full text-sm">
{{ area }}
</span>
{% endfor %}
</div>
</div>
{% endif %}
</div>
{% else %}
{# If parsing failed, display the raw insights #}
<div class="whitespace-pre-wrap">{{ insights }}</div>
{% endif %}
{% else %}
{# If insights is already a dict/object #}
<div class="space-y-6">
{% if insights.themes %}
<div>
<h3 class="text-xl font-semibold text-gray-800 mb-2">Themes</h3>
<ul class="list-disc list-inside space-y-1">
{% for theme in insights.themes %}
<li class="text-gray-700">{{ theme }}</li>
{% endfor %}
</ul>
</div>
{% endif %}
{% if insights.insights %}
<div>
<h3 class="text-xl font-semibold text-gray-800 mb-2">Key Insights</h3>
<ul class="list-disc list-inside space-y-1">
{% for insight in insights.insights %}
<li class="text-gray-700">{{ insight }}</li>
{% endfor %}
</ul>
</div>
{% endif %}
{% if insights.implications %}
<div>
<h3 class="text-xl font-semibold text-gray-800 mb-2">Implications</h3>
<ul class="list-disc list-inside space-y-1">
{% for implication in insights.implications %}
<li class="text-gray-700">{{ implication }}</li>
{% endfor %}
</ul>
</div>
{% endif %}
{% if insights.related_areas %}
<div>
<h3 class="text-xl font-semibold text-gray-800 mb-2">Related Areas</h3>
<div class="flex flex-wrap gap-2">
{% for area in insights.related_areas %}
<span class="px-3 py-1 bg-blue-100 text-blue-800 rounded-full text-sm">
{{ area }}
</span>
{% endfor %}
</div>
</div>
{% endif %}
</div>
{% endif %}
{% else %}
<p class="text-gray-600">No insights available for these articles.</p>
{% endif %}
</div>
</div>
<h2 class="text-3xl font-bold text-gray-800 mb-6">Recommended Articles</h2>
<div class="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-6">
{% for article in articles %}
<article class="article-card bg-white rounded-lg shadow-md overflow-hidden">
<div class="p-6">
<h3 class="text-xl font-semibold text-gray-800 mb-2">
<a href="{{ article.link }}" target="_blank" class="hover:text-blue-600">
{{ article.title }}
</a>
</h3>
<p class="text-gray-600 mb-4">{{ article.content[:200] }}...</p>
<div class="flex justify-between items-center text-sm text-gray-500">
<span>{{ article.source }}</span>
<span>{{ article.published }}</span>
</div>
{% if article.categories %}
<div class="mt-4 flex flex-wrap gap-2">
{% for category in article.categories %}
<span class="px-2 py-1 bg-blue-100 text-blue-800 rounded-full text-xs">
{{ category }}
</span>
{% endfor %}
</div>
{% endif %}
<div class="mt-4">
<a href="{{ article.link }}" target="_blank" class="inline-block bg-blue-600 text-white px-4 py-2 rounded-md font-medium hover:bg-blue-700 transition-colors duration-300">
Read More
</a>
</div>
</div>
</article>
{% endfor %}
</div>
</div>
{% endblock %}
+4 -1
View File
@@ -2,7 +2,6 @@ from pinecone import Pinecone, ServerlessSpec
from typing import List, Dict, Any
from config import (
PINECONE_API_KEY,
PINECONE_ENVIRONMENT,
PINECONE_INDEX_NAME,
VECTOR_DIMENSION,
TOP_K_RESULTS
@@ -16,13 +15,17 @@ class VectorStore:
def _ensure_index(self):
"""Ensure the Pinecone index exists, create if it doesn't."""
# Check if index exists, create if it doesn't
if self.index_name not in self.pinecone.list_indexes().names():
# Create a new index with the correct dimension
self.pinecone.create_index(
name=self.index_name,
dimension=VECTOR_DIMENSION,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
print(f"Created new index '{self.index_name}' with dimension {VECTOR_DIMENSION}")
self.index = self.pinecone.Index(self.index_name)
def upsert_articles(self, articles: List[Dict[str, Any]]) -> bool:
+186
View File
@@ -0,0 +1,186 @@
# DS Task AI News API Documentation
## Overview
The DS Task AI News API is a FastAPI-based application that provides endpoints for fetching, processing, and recommending news articles. The API uses AI-powered analysis to generate insights and recommendations based on news articles from various RSS feeds.
## Base URL
```
http://localhost:8000
```
## Endpoints
### 1. Home Page
**Endpoint:** `/`
**Method:** `GET`
**Description:** Returns the home page with links to other routes.
**Response:** HTML page with navigation links to other endpoints.
**Example:**
```
GET /
```
### 2. Fetch News
**Endpoint:** `/fetch-news`
**Method:** `GET`
**Description:** Fetches news from RSS feeds, processes them, and stores them in the vector database. Returns a page displaying the latest news articles.
**Response:** HTML page displaying the latest news articles.
**Example:**
```
GET /fetch-news
```
### 3. Recommend News
**Endpoint:** `/recommend-news`
**Method:** `GET`
**Description:** Gets news recommendations based on an article ID or search query. Returns a page displaying recommended articles and AI-generated insights.
**Query Parameters:**
- `article_id` (optional): ID of an article to base recommendations on.
- `query` (optional): Search query to base recommendations on.
**Response:** HTML page displaying recommended articles and AI-generated insights.
**Example:**
```
GET /recommend-news?query=artificial%20intelligence
```
### 4. Get Article
**Endpoint:** `/article/{article_id}`
**Method:** `GET`
**Description:** Gets a specific article and its summary.
**Path Parameters:**
- `article_id`: ID of the article to retrieve.
**Response:** JSON object containing the article and its summary.
**Example Response:**
```json
{
"article": {
"title": "Example Article Title",
"content": "Example article content...",
"link": "https://example.com/article",
"published": "2023-01-01T12:00:00",
"source": "Example News",
"categories": ["Technology", "AI"],
"id": "article123"
},
"summary": "This is a summary of the article..."
}
```
**Example:**
```
GET /article/article123
```
## Data Models
### Article
```json
{
"title": "string",
"content": "string",
"link": "string",
"published": "string",
"source": "string",
"categories": ["string"],
"id": "string"
}
```
### Insights
```json
{
"themes": ["string"],
"insights": ["string"],
"implications": ["string"],
"related_areas": ["string"]
}
```
## Error Handling
The API uses standard HTTP status codes to indicate the success or failure of requests:
- `200 OK`: The request was successful.
- `400 Bad Request`: The request was invalid or cannot be served.
- `404 Not Found`: The requested resource was not found.
- `500 Internal Server Error`: An error occurred on the server.
Error responses include a JSON object with a `detail` field containing a description of the error:
```json
{
"detail": "Error message"
}
```
## Authentication
The API does not currently require authentication.
## Rate Limiting
The API does not currently implement rate limiting.
## Dependencies
The API relies on the following external services:
- **Groq API**: For generating article summaries and insights.
- **Pinecone Vector Database**: For storing and retrieving article embeddings.
## Configuration
The API can be configured by modifying the following environment variables:
- `GROQ_API_KEY`: API key for the Groq service.
- `PINECONE_API_KEY`: API key for the Pinecone vector database.
- `PINECONE_ENVIRONMENT`: Environment for the Pinecone vector database.
- `PINECONE_INDEX`: Index name for the Pinecone vector database.
## Development
To run the API locally:
1. Install the required dependencies:
```
pip install -r requirements.txt
```
2. Set the required environment variables.
3. Run the API:
```
python backend/main.py
```
The API will be available at `http://localhost:8000`.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
+150
View File
@@ -0,0 +1,150 @@
# DS Task AI News - Technical Documentation
## Architecture Overview
The DS Task AI News application is built using a modular architecture with the following components:
1. **FastAPI Backend**: Handles HTTP requests and serves HTML templates.
2. **News Fetcher**: Fetches news articles from RSS feeds.
3. **Embedding Generator**: Generates embeddings for articles using Cohere.
4. **Vector Store**: Stores and retrieves article embeddings using Pinecone.
5. **News Recommender**: Generates insights and recommendations using Groq.
6. **HTML Templates**: Renders the user interface.
## Component Details
### 1. FastAPI Backend (`main.py`)
The FastAPI backend serves as the entry point for the application. It handles HTTP requests and serves HTML templates. The backend includes the following endpoints:
- `/`: Home page with links to other routes.
- `/fetch-news`: Fetches news from RSS feeds and displays the latest articles.
- `/recommend-news`: Gets news recommendations based on an article ID or search query.
- `/article/{article_id}`: Gets a specific article and its summary.
### 2. News Fetcher (`news_fetcher.py`)
The News Fetcher component is responsible for fetching news articles from RSS feeds. It performs the following tasks:
- Fetches articles from configured RSS feeds using the `feedparser` library.
- Cleans HTML content to extract plain text.
- Saves raw articles to JSON files.
- Processes articles with embeddings.
- Saves processed articles to JSON files.
- Stores articles in the vector database.
### 3. Embedding Generator (`embeddings.py`)
The Embedding Generator component is responsible for generating embeddings for articles. It performs the following tasks:
- Generates embeddings for article content using Cohere.
- Processes articles to include embeddings.
- Generates query embeddings for search queries.
### 4. Vector Store (`vector_store.py`)
The Vector Store component is responsible for storing and retrieving article embeddings. It performs the following tasks:
- Stores article embeddings in the Pinecone vector database.
- Retrieves similar articles based on query embeddings.
- Upserts articles to update the vector database.
### 5. News Recommender (`recommender.py`)
The News Recommender component is responsible for generating insights and recommendations. It performs the following tasks:
- Analyzes articles to generate insights using Groq.
- Generates summaries for individual articles using Groq.
### 6. HTML Templates
The HTML templates are responsible for rendering the user interface. The templates include:
- `base.html`: Base template with common layout elements.
- `home.html`: Home page template.
- `news.html`: Template for displaying news articles.
- `recommendations.html`: Template for displaying recommended articles and insights.
## Data Flow
1. **Fetching News**:
- User requests the `/fetch-news` endpoint.
- The backend calls the News Fetcher to fetch articles from RSS feeds.
- The News Fetcher cleans the articles and saves them to JSON files.
- The News Fetcher calls the Embedding Generator to generate embeddings for the articles.
- The News Fetcher calls the Vector Store to store the articles in the vector database.
- The backend renders the `news.html` template with the fetched articles.
2. **Recommending News**:
- User requests the `/recommend-news` endpoint with a query parameter.
- The backend calls the Embedding Generator to generate a query embedding.
- The backend calls the Vector Store to retrieve similar articles.
- The backend calls the News Recommender to generate insights for the articles.
- The backend renders the `recommendations.html` template with the recommended articles and insights.
3. **Getting an Article**:
- User requests the `/article/{article_id}` endpoint.
- The backend calls the Vector Store to retrieve the article.
- The backend calls the News Recommender to generate a summary for the article.
- The backend returns the article and summary as JSON.
## Configuration
The application is configured using environment variables and configuration files:
- `config.py`: Contains configuration variables for the application.
- Environment variables: API keys and other sensitive information.
## Dependencies
The application relies on the following external services and libraries:
- **FastAPI**: Web framework for building APIs.
- **Jinja2**: Template engine for rendering HTML.
- **feedparser**: Library for parsing RSS feeds.
- **BeautifulSoup**: Library for parsing HTML.
- **Cohere**: API for generating embeddings.
- **Pinecone**: Vector database for storing and retrieving embeddings.
- **Groq**: API for generating insights and summaries.
## File Structure
```
ds_task_ai_news/
├── backend/
│ ├── main.py
│ ├── news_fetcher.py
│ ├── embeddings.py
│ ├── vector_store.py
│ ├── recommender.py
│ ├── config.py
│ └── templates/
│ ├── base.html
│ ├── home.html
│ ├── news.html
│ └── recommendations.html
├── data/
│ ├── raw_news/
│ └── processed_news/
├── docs/
│ ├── API_Documentation.md
│ └── Technical_Documentation.md
└── requirements.txt
```
## Error Handling
The application uses try-except blocks to handle errors gracefully. Errors are logged using the `logging` module and returned as HTTP responses with appropriate status codes.
## Future Improvements
Potential improvements for the application include:
1. **Authentication**: Add user authentication to protect sensitive endpoints.
2. **Rate Limiting**: Implement rate limiting to prevent abuse.
3. **Caching**: Add caching to improve performance.
4. **Testing**: Add unit and integration tests.
5. **Deployment**: Deploy the application to a cloud provider.
6. **Monitoring**: Add monitoring and alerting.
7. **User Preferences**: Allow users to customize their news preferences.
8. **Mobile App**: Develop a mobile app for the application.
+142
View File
@@ -0,0 +1,142 @@
# DS Task AI News - User Guide
## Introduction
DS Task AI News is an AI-powered news application that fetches, processes, and recommends news articles based on your interests. The application uses advanced AI technologies to analyze news articles and provide personalized insights and recommendations.
## Features
- **Latest News**: View the latest news articles fetched from various RSS feeds.
- **News Recommendations**: Get personalized news recommendations based on your interests.
- **AI Insights**: Receive AI-generated insights about news articles.
- **Article Summaries**: Get concise summaries of individual articles.
## Getting Started
### Prerequisites
- Python 3.8 or higher
- pip (Python package manager)
- Internet connection
### Installation
1. Clone the repository:
```
git clone https://github.com/yourusername/ds_task_ai_news.git
cd ds_task_ai_news
```
2. Install the required dependencies:
```
pip install -r requirements.txt
```
3. Set up the required environment variables:
- Create a `.env` file in the root directory with the following content:
```
GROQ_API_KEY=your_groq_api_key
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_ENVIRONMENT=your_pinecone_environment
PINECONE_INDEX=your_pinecone_index
```
4. Run the application:
```
python backend/main.py
```
5. Open your web browser and navigate to `http://localhost:8000`.
## Using the Application
### Home Page
The home page provides links to the main features of the application:
- **Latest News**: View the latest news articles.
- **Technology News**: Get recommendations for technology-related news.
- **AI News**: Get recommendations for AI-related news.
### Latest News
To view the latest news articles:
1. Click on the "View Latest News" button on the home page.
2. The application will fetch the latest news articles from the configured RSS feeds.
3. The articles will be displayed in a grid layout with the following information:
- Title
- Content preview
- Source
- Publication date
- Categories
- "Read More" button
### News Recommendations
To get personalized news recommendations:
1. Click on one of the recommendation buttons on the home page (e.g., "Technology News" or "AI News").
2. Alternatively, you can navigate to `/recommend-news?query=your_search_query` to get recommendations based on a specific query.
3. The application will display recommended articles and AI-generated insights.
4. The insights section includes:
- Themes: Main topics and areas of focus in the news articles.
- Key Insights: Key takeaways and observations from the articles.
- Implications: Potential consequences and outcomes of the trends and developments.
- Related Areas: Other areas of interest connected to the themes and insights.
### Article Details
To view the details of a specific article:
1. Click on the "Read More" button for an article.
2. The article will open in a new tab with the full content.
## Customization
### Adding RSS Feeds
To add or modify the RSS feeds:
1. Open the `backend/config.py` file.
2. Locate the `RSS_FEEDS` list.
3. Add or remove RSS feed URLs as needed.
### Changing the UI
The application uses Tailwind CSS for styling. To modify the UI:
1. Open the HTML templates in the `backend/templates` directory.
2. Modify the HTML and CSS classes as needed.
## Troubleshooting
### Common Issues
1. **Application not starting**:
- Check if all dependencies are installed correctly.
- Verify that the environment variables are set correctly.
- Check the console for error messages.
2. **No news articles displayed**:
- Check your internet connection.
- Verify that the RSS feeds are accessible.
- Check the console for error messages.
3. **AI insights not displaying correctly**:
- Verify that the Groq API key is set correctly.
- Check the console for error messages.
### Getting Help
If you encounter any issues not covered in this guide, please:
1. Check the console for error messages.
2. Refer to the API and Technical documentation.
3. Contact the development team for assistance.
## Conclusion
DS Task AI News is a powerful tool for staying informed about the latest news and trends. By leveraging AI technologies, it provides personalized insights and recommendations to help you make sense of the news.
We hope you find this guide helpful. If you have any questions or feedback, please don't hesitate to contact us.