e3d00bb4dcff4912a962317d6ce87005202ee30d
- Implemented NewsFetcher class to fetch articles from RSS feeds and clean HTML content. - Added EmbeddingGenerator for generating embeddings using Cohere API. - Created VectorStore for storing and retrieving articles using Pinecone. - Developed NewsRecommender for analyzing articles and generating insights with Groq. - Set up FastAPI application with endpoints for fetching news and providing recommendations. - Configured logging for better traceability and debugging. - Updated .gitignore to include environment variables and data directories. - Added requirements.txt for project dependencies.
DS Task AI News
Project Overview
DS Task AI News is an AI-powered news retrieval system that gathers news articles from various online sources, stores them in a vector database, and enables users to discover relevant articles based on their interests. The system uses advanced AI techniques to find and recommend related news articles dynamically.
Features
- News Aggregation : Fetches news using RSS feeds from various online portals.
- Vector Database Storage : Stores news articles in a vector database for efficient similarity searches.
- AI-powered Recommendations : Uses Cohere embeddings and re-ranking to provide relevant news recommendations.
- LLM-powered Analysis : Utilizes Groq for AI-driven insights and processing.
Tech Stack
- LLM : Groq
- Search : RSS Feeds for news aggregation
- Embeddings & Re-Ranking : Cohere
- Vector Database : (e.g., Pinecone, Weaviate, or FAISS)
- Backend : FastAPI
File Structure
DS_Task_AI_News/
│-- backend/
│ │-- main.py # FastAPI backend
│ │-- news_fetcher.py # Fetches news using RSS feeds
│ │-- vector_store.py # Handles vector database operations
│ │-- embeddings.py # Generates embeddings using Cohere
│ │-- recommender.py # Fetches related news articles
│ │-- config.py # Configuration settings
│ │-- requirements.txt # Dependencies
│
│-- data/
│ │-- raw_news/ # Stores raw news articles before processing
│ │-- processed_news/ # Stores cleaned and processed articles
│
│-- docs/
│ │-- README.md # Documentation for new developers
│ │-- API_Documentation.md # API details
│
│-- .env # Environment variables
│-- .gitignore # Git ignore file
│-- LICENSE # License information
Setup & Installation
1. Clone the Repository
git clone http://23.29.118.76:3000/Test/ds_task_ai_news
cd ds-task-ai-news
2. Set Up the Backend
cd backend
pip install -r requirements.txt
python main.py
Fetching News Using RSS Feeds
- News is aggregated from RSS feeds of different news sources.
- The
news_fetcher.pyscript pulls data from RSS feeds, extracts relevant information, and stores it in the database.
Example RSS Fetching Code (Python)
import feedparser
def fetch_rss_news(feed_url):
feed = feedparser.parse(feed_url)
articles = []
for entry in feed.entries:
articles.append({
"title": entry.title,
"content": entry.summary,
"date": entry.published,
"slug": entry.title.lower().replace(" ", "-"),
"categories": ["Technology", "AI and Innovation"],
"tags": ["AI", "Technology", "Innovation"]
})
return articles
API Endpoints
GET /fetch-news: Fetches news from RSS feeds.GET /recommend-news?article_id=xyz: Retrieves similar news based on the selected article.
Description
Languages
Python
64.5%
HTML
35.5%