commit c158262a49167181e4486ad5a816a535f3f51703 Author: kowshik Date: Tue Mar 18 04:11:27 2025 +0600 Initial Commit diff --git a/README.md b/README.md new file mode 100644 index 0000000..937e04c --- /dev/null +++ b/README.md @@ -0,0 +1,93 @@ +# DS Task AI News + +## Project Overview + +DS Task AI News is an AI-powered news retrieval system that gathers news articles from various online sources, stores them in a vector database, and enables users to discover relevant articles based on their interests. The system uses advanced AI techniques to find and recommend related news articles dynamically. + +## Features + +* **News Aggregation** : Fetches news using RSS feeds from various online portals. +* **Vector Database Storage** : Stores news articles in a vector database for efficient similarity searches. +* **AI-powered Recommendations** : Uses Cohere embeddings and re-ranking to provide relevant news recommendations. +* **LLM-powered Analysis** : Utilizes Groq for AI-driven insights and processing. + +## Tech Stack + +* **LLM** : Groq +* **Search** : RSS Feeds for news aggregation +* **Embeddings & Re-Ranking** : Cohere +* **Vector Database** : (e.g., Pinecone, Weaviate, or FAISS) +* **Backend** : FastAPI + +## File Structure + +``` +DS_Task_AI_News/ +│-- backend/ +│ │-- main.py # FastAPI backend +│ │-- news_fetcher.py # Fetches news using RSS feeds +│ │-- vector_store.py # Handles vector database operations +│ │-- embeddings.py # Generates embeddings using Cohere +│ │-- recommender.py # Fetches related news articles +│ │-- config.py # Configuration settings +│ │-- requirements.txt # Dependencies +│ +│-- data/ +│ │-- raw_news/ # Stores raw news articles before processing +│ │-- processed_news/ # Stores cleaned and processed articles +│ +│-- docs/ +│ │-- README.md # Documentation for new developers +│ │-- API_Documentation.md # API details +│ +│-- .env # Environment variables +│-- .gitignore # Git ignore file +│-- LICENSE # License information +``` + +## Setup & Installation + +### 1. Clone the Repository + +```bash +git clone http://23.29.118.76:3000/Test/ds_task_ai_news +cd ds-task-ai-news +``` + +### 2. Set Up the Backend + +```bash +cd backend +pip install -r requirements.txt +python main.py +``` + +## Fetching News Using RSS Feeds + +* News is aggregated from RSS feeds of different news sources. +* The `news_fetcher.py` script pulls data from RSS feeds, extracts relevant information, and stores it in the database. + +### **Example RSS Fetching Code (Python)** + +```python +import feedparser + +def fetch_rss_news(feed_url): + feed = feedparser.parse(feed_url) + articles = [] + for entry in feed.entries: + articles.append({ + "title": entry.title, + "content": entry.summary, + "date": entry.published, + "slug": entry.title.lower().replace(" ", "-"), + "categories": ["Technology", "AI and Innovation"], + "tags": ["AI", "Technology", "Innovation"] + }) + return articles +``` + +## API Endpoints + +* `GET /fetch-news`: Fetches news from RSS feeds. +* `GET /recommend-news?article_id=xyz`: Retrieves similar news based on the selected article.