Initial Commit
This commit is contained in:
@@ -0,0 +1,93 @@
|
||||
# DS Task AI News
|
||||
|
||||
## Project Overview
|
||||
|
||||
DS Task AI News is an AI-powered news retrieval system that gathers news articles from various online sources, stores them in a vector database, and enables users to discover relevant articles based on their interests. The system uses advanced AI techniques to find and recommend related news articles dynamically.
|
||||
|
||||
## Features
|
||||
|
||||
* **News Aggregation** : Fetches news using RSS feeds from various online portals.
|
||||
* **Vector Database Storage** : Stores news articles in a vector database for efficient similarity searches.
|
||||
* **AI-powered Recommendations** : Uses Cohere embeddings and re-ranking to provide relevant news recommendations.
|
||||
* **LLM-powered Analysis** : Utilizes Groq for AI-driven insights and processing.
|
||||
|
||||
## Tech Stack
|
||||
|
||||
* **LLM** : Groq
|
||||
* **Search** : RSS Feeds for news aggregation
|
||||
* **Embeddings & Re-Ranking** : Cohere
|
||||
* **Vector Database** : (e.g., Pinecone, Weaviate, or FAISS)
|
||||
* **Backend** : FastAPI
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
DS_Task_AI_News/
|
||||
│-- backend/
|
||||
│ │-- main.py # FastAPI backend
|
||||
│ │-- news_fetcher.py # Fetches news using RSS feeds
|
||||
│ │-- vector_store.py # Handles vector database operations
|
||||
│ │-- embeddings.py # Generates embeddings using Cohere
|
||||
│ │-- recommender.py # Fetches related news articles
|
||||
│ │-- config.py # Configuration settings
|
||||
│ │-- requirements.txt # Dependencies
|
||||
│
|
||||
│-- data/
|
||||
│ │-- raw_news/ # Stores raw news articles before processing
|
||||
│ │-- processed_news/ # Stores cleaned and processed articles
|
||||
│
|
||||
│-- docs/
|
||||
│ │-- README.md # Documentation for new developers
|
||||
│ │-- API_Documentation.md # API details
|
||||
│
|
||||
│-- .env # Environment variables
|
||||
│-- .gitignore # Git ignore file
|
||||
│-- LICENSE # License information
|
||||
```
|
||||
|
||||
## Setup & Installation
|
||||
|
||||
### 1. Clone the Repository
|
||||
|
||||
```bash
|
||||
git clone http://23.29.118.76:3000/Test/ds_task_ai_news
|
||||
cd ds-task-ai-news
|
||||
```
|
||||
|
||||
### 2. Set Up the Backend
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
pip install -r requirements.txt
|
||||
python main.py
|
||||
```
|
||||
|
||||
## Fetching News Using RSS Feeds
|
||||
|
||||
* News is aggregated from RSS feeds of different news sources.
|
||||
* The `news_fetcher.py` script pulls data from RSS feeds, extracts relevant information, and stores it in the database.
|
||||
|
||||
### **Example RSS Fetching Code (Python)**
|
||||
|
||||
```python
|
||||
import feedparser
|
||||
|
||||
def fetch_rss_news(feed_url):
|
||||
feed = feedparser.parse(feed_url)
|
||||
articles = []
|
||||
for entry in feed.entries:
|
||||
articles.append({
|
||||
"title": entry.title,
|
||||
"content": entry.summary,
|
||||
"date": entry.published,
|
||||
"slug": entry.title.lower().replace(" ", "-"),
|
||||
"categories": ["Technology", "AI and Innovation"],
|
||||
"tags": ["AI", "Technology", "Innovation"]
|
||||
})
|
||||
return articles
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
* `GET /fetch-news`: Fetches news from RSS feeds.
|
||||
* `GET /recommend-news?article_id=xyz`: Retrieves similar news based on the selected article.
|
||||
Reference in New Issue
Block a user