diff --git a/README.md b/README.md index 937e04c..8decfdc 100644 --- a/README.md +++ b/README.md @@ -1,93 +1,106 @@ # DS Task AI News -## Project Overview +An AI-powered news application that fetches, processes, and recommends news articles based on your interests. -DS Task AI News is an AI-powered news retrieval system that gathers news articles from various online sources, stores them in a vector database, and enables users to discover relevant articles based on their interests. The system uses advanced AI techniques to find and recommend related news articles dynamically. +## Overview + +DS Task AI News is a web application that uses AI technologies to fetch, analyze, and recommend news articles. The application fetches news from various RSS feeds, processes them using AI, and provides personalized insights and recommendations. ## Features -* **News Aggregation** : Fetches news using RSS feeds from various online portals. -* **Vector Database Storage** : Stores news articles in a vector database for efficient similarity searches. -* **AI-powered Recommendations** : Uses Cohere embeddings and re-ranking to provide relevant news recommendations. -* **LLM-powered Analysis** : Utilizes Groq for AI-driven insights and processing. +- **Latest News**: View the latest news articles fetched from various RSS feeds. +- **News Recommendations**: Get personalized news recommendations based on your interests. +- **AI Insights**: Receive AI-generated insights about news articles. +- **Article Summaries**: Get concise summaries of individual articles. -## Tech Stack +## Technologies Used -* **LLM** : Groq -* **Search** : RSS Feeds for news aggregation -* **Embeddings & Re-Ranking** : Cohere -* **Vector Database** : (e.g., Pinecone, Weaviate, or FAISS) -* **Backend** : FastAPI +- **FastAPI**: Web framework for building APIs. +- **Jinja2**: Template engine for rendering HTML. +- **Tailwind CSS**: Utility-first CSS framework for styling. +- **feedparser**: Library for parsing RSS feeds. +- **BeautifulSoup**: Library for parsing HTML. +- **Cohere**: API for generating embeddings. +- **Pinecone**: Vector database for storing and retrieving embeddings. +- **Groq**: API for generating insights and summaries. -## File Structure +## Getting Started + +### Prerequisites + +- Python 3.8 or higher +- pip (Python package manager) +- Internet connection + +### Installation + +1. Clone the repository: + ``` + git clone https://github.com/yourusername/ds_task_ai_news.git + cd ds_task_ai_news + ``` + +2. Install the required dependencies: + ``` + pip install -r requirements.txt + ``` + +3. Set up the required environment variables: + - Create a `.env` file in the root directory with the following content: + ``` + GROQ_API_KEY=your_groq_api_key + PINECONE_API_KEY=your_pinecone_api_key + PINECONE_ENVIRONMENT=your_pinecone_environment + PINECONE_INDEX=your_pinecone_index + ``` + +4. Run the application: + ``` + python backend/main.py + ``` + +5. Open your web browser and navigate to `http://localhost:8000`. + +## Documentation + +- [API Documentation](docs/API_Documentation.md): Detailed documentation of the API endpoints. +- [Technical Documentation](docs/Technical_Documentation.md): Technical details of the application architecture and components. +- [User Guide](docs/User_Guide.md): Guide for using the application. + +## Project Structure ``` -DS_Task_AI_News/ -│-- backend/ -│ │-- main.py # FastAPI backend -│ │-- news_fetcher.py # Fetches news using RSS feeds -│ │-- vector_store.py # Handles vector database operations -│ │-- embeddings.py # Generates embeddings using Cohere -│ │-- recommender.py # Fetches related news articles -│ │-- config.py # Configuration settings -│ │-- requirements.txt # Dependencies -│ -│-- data/ -│ │-- raw_news/ # Stores raw news articles before processing -│ │-- processed_news/ # Stores cleaned and processed articles -│ -│-- docs/ -│ │-- README.md # Documentation for new developers -│ │-- API_Documentation.md # API details -│ -│-- .env # Environment variables -│-- .gitignore # Git ignore file -│-- LICENSE # License information +ds_task_ai_news/ +├── backend/ +│ ├── main.py +│ ├── news_fetcher.py +│ ├── embeddings.py +│ ├── vector_store.py +│ ├── recommender.py +│ ├── config.py +│ └── templates/ +│ ├── base.html +│ ├── home.html +│ ├── news.html +│ └── recommendations.html +├── data/ +│ ├── raw_news/ +│ └── processed_news/ +├── docs/ +│ ├── API_Documentation.md +│ ├── Technical_Documentation.md +│ └── User_Guide.md +└── requirements.txt ``` -## Setup & Installation +## License -### 1. Clone the Repository +This project is licensed under the MIT License - see the LICENSE file for details. -```bash -git clone http://23.29.118.76:3000/Test/ds_task_ai_news -cd ds-task-ai-news -``` +## Acknowledgments -### 2. Set Up the Backend - -```bash -cd backend -pip install -r requirements.txt -python main.py -``` - -## Fetching News Using RSS Feeds - -* News is aggregated from RSS feeds of different news sources. -* The `news_fetcher.py` script pulls data from RSS feeds, extracts relevant information, and stores it in the database. - -### **Example RSS Fetching Code (Python)** - -```python -import feedparser - -def fetch_rss_news(feed_url): - feed = feedparser.parse(feed_url) - articles = [] - for entry in feed.entries: - articles.append({ - "title": entry.title, - "content": entry.summary, - "date": entry.published, - "slug": entry.title.lower().replace(" ", "-"), - "categories": ["Technology", "AI and Innovation"], - "tags": ["AI", "Technology", "Innovation"] - }) - return articles -``` - -## API Endpoints - -* `GET /fetch-news`: Fetches news from RSS feeds. -* `GET /recommend-news?article_id=xyz`: Retrieves similar news based on the selected article. +- [FastAPI](https://fastapi.tiangolo.com/) +- [Tailwind CSS](https://tailwindcss.com/) +- [Cohere](https://cohere.ai/) +- [Pinecone](https://www.pinecone.io/) +- [Groq](https://groq.com/) diff --git a/backend/config.py b/backend/config.py index 7b8fb45..ae6ee89 100644 --- a/backend/config.py +++ b/backend/config.py @@ -26,7 +26,7 @@ RSS_FEEDS = [ ] # Vector Database Settings -VECTOR_DIMENSION = 4096 # Cohere embedding dimension +VECTOR_DIMENSION = 1024 # Cohere embedding dimension TOP_K_RESULTS = 5 # Data Directories diff --git a/backend/main.py b/backend/main.py index 4e99187..0c510c8 100644 --- a/backend/main.py +++ b/backend/main.py @@ -1,5 +1,7 @@ -from fastapi import FastAPI, HTTPException +from fastapi import FastAPI, HTTPException, Request from fastapi.middleware.cors import CORSMiddleware +from fastapi.templating import Jinja2Templates +from fastapi.responses import HTMLResponse from typing import List, Dict, Any import json import os @@ -12,6 +14,19 @@ from config import RAW_NEWS_DIR, PROCESSED_NEWS_DIR app = FastAPI(title="DS Task AI News API") +# Configure templates +templates = Jinja2Templates(directory="backend/templates") + +# Add custom filters +def from_json(value): + """Parse a JSON string into a Python object.""" + try: + return json.loads(value) + except (json.JSONDecodeError, TypeError): + return None + +templates.env.filters["from_json"] = from_json + # Add CORS middleware app.add_middleware( CORSMiddleware, @@ -27,34 +42,51 @@ embedding_generator = EmbeddingGenerator() vector_store = VectorStore() recommender = NewsRecommender() -@app.get("/") -async def root(): - """Root endpoint returning API information.""" - return { - "name": "DS Task AI News API", - "version": "1.0.0", - "description": "AI-powered news retrieval and recommendation system" - } +@app.get("/", response_class=HTMLResponse) +async def root(request: Request): + """Root endpoint returning the home page with links to other routes.""" + return templates.TemplateResponse( + "home.html", + {"request": request} + ) -@app.get("/fetch-news") -async def fetch_news(): +@app.get("/fetch-news", response_class=HTMLResponse) +async def fetch_news(request: Request): """Fetch news from RSS feeds and store in vector database.""" try: result = news_fetcher.process() if result["status"] == "error": raise HTTPException(status_code=404, detail=result["message"]) - return result + # Get the latest processed articles + processed_files = sorted(os.listdir(PROCESSED_NEWS_DIR), reverse=True) + if not processed_files: + raise HTTPException(status_code=404, detail="No processed articles found") + + latest_file = os.path.join(PROCESSED_NEWS_DIR, processed_files[0]) + with open(latest_file, 'r', encoding='utf-8') as f: + articles = json.load(f) + + # Ensure each article has a link + for article in articles: + if 'link' not in article or not article['link']: + # If no link is available, use the article ID as a fallback + article['link'] = f"/article/{article.get('id', '')}" + + return templates.TemplateResponse( + "news.html", + {"request": request, "articles": articles} + ) except Exception as e: raise HTTPException(status_code=500, detail=str(e)) -@app.get("/recommend-news") -async def recommend_news(article_id: str = None, query: str = None): +@app.get("/recommend-news", response_class=HTMLResponse) +async def recommend_news(request: Request, article_id: str = None, query: str = None): """Get news recommendations based on article ID or search query.""" try: if article_id: # Get article from vector store - article = vector_store.search_similar([0] * 4096, top_k=1) # Placeholder vector + article = vector_store.search_similar([0] * 1024, top_k=1) # Placeholder vector with correct dimension if not article: raise HTTPException(status_code=404, detail="Article not found") @@ -76,13 +108,23 @@ async def recommend_news(article_id: str = None, query: str = None): if not similar_articles: raise HTTPException(status_code=404, detail="No similar articles found") + # Ensure each article has a link + for article in similar_articles: + if 'link' not in article or not article['link']: + # If no link is available, use the article ID as a fallback + article['link'] = f"/article/{article.get('id', '')}" + # Generate insights for the articles insights = recommender.analyze_articles(similar_articles) - return { - "articles": similar_articles, - "insights": insights - } + return templates.TemplateResponse( + "recommendations.html", + { + "request": request, + "articles": similar_articles, + "insights": insights + } + ) except Exception as e: raise HTTPException(status_code=500, detail=str(e)) @@ -91,12 +133,17 @@ async def get_article(article_id: str): """Get a specific article and its summary.""" try: # Search for the article - articles = vector_store.search_similar([0] * 4096, top_k=1) # Placeholder vector + articles = vector_store.search_similar([0] * 1024, top_k=1) # Placeholder vector with correct dimension if not articles: raise HTTPException(status_code=404, detail="Article not found") article = articles[0] + # Ensure the article has a link + if 'link' not in article or not article['link']: + # If no link is available, use the article ID as a fallback + article['link'] = f"/article/{article.get('id', '')}" + # Generate summary summary = recommender.generate_summary(article) @@ -109,4 +156,4 @@ async def get_article(article_id: str): if __name__ == "__main__": import uvicorn - uvicorn.run(app, host="0.0.0.0", port=8000) + uvicorn.run(app, host="localhost", port=8000) diff --git a/backend/news_fetcher.py b/backend/news_fetcher.py index 4dac5af..f9a87db 100644 --- a/backend/news_fetcher.py +++ b/backend/news_fetcher.py @@ -174,5 +174,3 @@ class NewsFetcher: logger.info("News processing pipeline completed with status: %s", result["status"]) return result -news_fetcher = NewsFetcher() -print(news_fetcher.process()) diff --git a/backend/recommender.py b/backend/recommender.py index 03ccf54..370d382 100644 --- a/backend/recommender.py +++ b/backend/recommender.py @@ -1,6 +1,7 @@ from groq import Groq from typing import List, Dict, Any from config import GROQ_API_KEY +import json class NewsRecommender: def __init__(self): @@ -11,7 +12,7 @@ class NewsRecommender: try: # Prepare the prompt articles_text = "\n\n".join([ - f"Title: {article['title']}\nContent: {article['content']}" + f"Title: {article['title']}" for article in articles ]) @@ -33,13 +34,34 @@ Format the response as a JSON with these keys: themes, insights, implications, r {"role": "system", "content": "You are a news analyst providing insights about technology and AI news."}, {"role": "user", "content": prompt} ], - model="mixtral-8x7b-32768", + model="llama3-70b-8192", temperature=0.7, - max_tokens=1000 + max_tokens=500 ) # Parse and return the analysis - return completion.choices[0].message.content + response_text = completion.choices[0].message.content + + # Try to extract JSON from the response if it's wrapped in markdown code blocks + if "```json" in response_text: + json_str = response_text.split("```json")[1].split("```")[0].strip() + try: + return json.loads(json_str) + except json.JSONDecodeError: + pass + elif "```" in response_text: + json_str = response_text.split("```")[1].split("```")[0].strip() + try: + return json.loads(json_str) + except json.JSONDecodeError: + pass + + # If we couldn't extract JSON, try to parse the entire response + try: + return json.loads(response_text) + except json.JSONDecodeError: + # If all parsing attempts fail, return the raw text + return response_text except Exception as e: print(f"Error analyzing articles: {str(e)}") return { @@ -64,9 +86,9 @@ Please provide a concise summary focusing on the key points and implications.""" {"role": "system", "content": "You are a news summarizer providing concise summaries of technology and AI news."}, {"role": "user", "content": prompt} ], - model="mixtral-8x7b-32768", + model="llama3-70b-8192", temperature=0.5, - max_tokens=500 + max_tokens=250 ) return completion.choices[0].message.content diff --git a/backend/requirements.txt b/backend/requirements.txt index 2e4f93f..85c3df8 100644 --- a/backend/requirements.txt +++ b/backend/requirements.txt @@ -9,3 +9,4 @@ pydantic==2.6.3 python-multipart==0.0.9 httpx==0.27.0 beautifulsoup4==4.12.3 +jinja2==3.1.2 diff --git a/backend/templates/base.html b/backend/templates/base.html new file mode 100644 index 0000000..67dac17 --- /dev/null +++ b/backend/templates/base.html @@ -0,0 +1,34 @@ + + + + + + {% block title %}DS Task AI News{% endblock %} + + + + + + +
+ {% block content %}{% endblock %} +
+ + + + \ No newline at end of file diff --git a/backend/templates/home.html b/backend/templates/home.html new file mode 100644 index 0000000..b8198c3 --- /dev/null +++ b/backend/templates/home.html @@ -0,0 +1,54 @@ +{% extends "base.html" %} + +{% block title %}Home - DS Task AI News{% endblock %} + +{% block content %} +
+
+

Welcome to DS Task AI News

+

Your AI-powered news retrieval and recommendation system

+
+ +
+ +
+
+

Latest News

+

View the latest news articles fetched from our RSS feeds.

+ + View Latest News + +
+
+ + +
+
+

News Recommendations

+

Get personalized news recommendations based on your interests.

+ +
+
+
+ +
+

About This Application

+

+ This application uses AI to fetch, process, and recommend news articles. It leverages: +

+ +
+
+{% endblock %} \ No newline at end of file diff --git a/backend/templates/news.html b/backend/templates/news.html new file mode 100644 index 0000000..266c708 --- /dev/null +++ b/backend/templates/news.html @@ -0,0 +1,42 @@ +{% extends "base.html" %} + +{% block title %}Latest News - DS Task AI News{% endblock %} + +{% block content %} +
+

Latest News Articles

+ +
+ {% for article in articles %} +
+
+

+ + {{ article.title }} + +

+

{{ article.content[:200] }}...

+
+ {{ article.source }} + {{ article.published }} +
+ {% if article.categories %} +
+ {% for category in article.categories %} + + {{ category }} + + {% endfor %} +
+ {% endif %} + +
+
+ {% endfor %} +
+
+{% endblock %} \ No newline at end of file diff --git a/backend/templates/recommendations.html b/backend/templates/recommendations.html new file mode 100644 index 0000000..8096ae3 --- /dev/null +++ b/backend/templates/recommendations.html @@ -0,0 +1,157 @@ +{% extends "base.html" %} + +{% block title %}Recommended News - DS Task AI News{% endblock %} + +{% block content %} +
+
+

AI Insights

+
+ {% if insights %} + {% if insights is string %} + {# If insights is a string (JSON or markdown), try to parse it #} + {% set insights_data = insights | from_json %} + {% if insights_data %} +
+ {% if insights_data.themes %} +
+

Themes

+
    + {% for theme in insights_data.themes %} +
  • {{ theme }}
  • + {% endfor %} +
+
+ {% endif %} + + {% if insights_data.insights %} +
+

Key Insights

+
    + {% for insight in insights_data.insights %} +
  • {{ insight }}
  • + {% endfor %} +
+
+ {% endif %} + + {% if insights_data.implications %} +
+

Implications

+
    + {% for implication in insights_data.implications %} +
  • {{ implication }}
  • + {% endfor %} +
+
+ {% endif %} + + {% if insights_data.related_areas %} +
+

Related Areas

+
+ {% for area in insights_data.related_areas %} + + {{ area }} + + {% endfor %} +
+
+ {% endif %} +
+ {% else %} + {# If parsing failed, display the raw insights #} +
{{ insights }}
+ {% endif %} + {% else %} + {# If insights is already a dict/object #} +
+ {% if insights.themes %} +
+

Themes

+
    + {% for theme in insights.themes %} +
  • {{ theme }}
  • + {% endfor %} +
+
+ {% endif %} + + {% if insights.insights %} +
+

Key Insights

+
    + {% for insight in insights.insights %} +
  • {{ insight }}
  • + {% endfor %} +
+
+ {% endif %} + + {% if insights.implications %} +
+

Implications

+
    + {% for implication in insights.implications %} +
  • {{ implication }}
  • + {% endfor %} +
+
+ {% endif %} + + {% if insights.related_areas %} +
+

Related Areas

+
+ {% for area in insights.related_areas %} + + {{ area }} + + {% endfor %} +
+
+ {% endif %} +
+ {% endif %} + {% else %} +

No insights available for these articles.

+ {% endif %} +
+
+ +

Recommended Articles

+ +
+ {% for article in articles %} +
+
+

+ + {{ article.title }} + +

+

{{ article.content[:200] }}...

+
+ {{ article.source }} + {{ article.published }} +
+ {% if article.categories %} +
+ {% for category in article.categories %} + + {{ category }} + + {% endfor %} +
+ {% endif %} + +
+
+ {% endfor %} +
+
+{% endblock %} \ No newline at end of file diff --git a/backend/vector_store.py b/backend/vector_store.py index ece2be2..332ee34 100644 --- a/backend/vector_store.py +++ b/backend/vector_store.py @@ -2,7 +2,6 @@ from pinecone import Pinecone, ServerlessSpec from typing import List, Dict, Any from config import ( PINECONE_API_KEY, - PINECONE_ENVIRONMENT, PINECONE_INDEX_NAME, VECTOR_DIMENSION, TOP_K_RESULTS @@ -16,13 +15,17 @@ class VectorStore: def _ensure_index(self): """Ensure the Pinecone index exists, create if it doesn't.""" + # Check if index exists, create if it doesn't if self.index_name not in self.pinecone.list_indexes().names(): + # Create a new index with the correct dimension self.pinecone.create_index( name=self.index_name, dimension=VECTOR_DIMENSION, metric="cosine", spec=ServerlessSpec(cloud="aws", region="us-east-1") ) + print(f"Created new index '{self.index_name}' with dimension {VECTOR_DIMENSION}") + self.index = self.pinecone.Index(self.index_name) def upsert_articles(self, articles: List[Dict[str, Any]]) -> bool: diff --git a/docs/API_Documentation.md b/docs/API_Documentation.md index e69de29..30c12f7 100644 --- a/docs/API_Documentation.md +++ b/docs/API_Documentation.md @@ -0,0 +1,186 @@ +# DS Task AI News API Documentation + +## Overview + +The DS Task AI News API is a FastAPI-based application that provides endpoints for fetching, processing, and recommending news articles. The API uses AI-powered analysis to generate insights and recommendations based on news articles from various RSS feeds. + +## Base URL + +``` +http://localhost:8000 +``` + +## Endpoints + +### 1. Home Page + +**Endpoint:** `/` + +**Method:** `GET` + +**Description:** Returns the home page with links to other routes. + +**Response:** HTML page with navigation links to other endpoints. + +**Example:** +``` +GET / +``` + +### 2. Fetch News + +**Endpoint:** `/fetch-news` + +**Method:** `GET` + +**Description:** Fetches news from RSS feeds, processes them, and stores them in the vector database. Returns a page displaying the latest news articles. + +**Response:** HTML page displaying the latest news articles. + +**Example:** +``` +GET /fetch-news +``` + +### 3. Recommend News + +**Endpoint:** `/recommend-news` + +**Method:** `GET` + +**Description:** Gets news recommendations based on an article ID or search query. Returns a page displaying recommended articles and AI-generated insights. + +**Query Parameters:** +- `article_id` (optional): ID of an article to base recommendations on. +- `query` (optional): Search query to base recommendations on. + +**Response:** HTML page displaying recommended articles and AI-generated insights. + +**Example:** +``` +GET /recommend-news?query=artificial%20intelligence +``` + +### 4. Get Article + +**Endpoint:** `/article/{article_id}` + +**Method:** `GET` + +**Description:** Gets a specific article and its summary. + +**Path Parameters:** +- `article_id`: ID of the article to retrieve. + +**Response:** JSON object containing the article and its summary. + +**Example Response:** +```json +{ + "article": { + "title": "Example Article Title", + "content": "Example article content...", + "link": "https://example.com/article", + "published": "2023-01-01T12:00:00", + "source": "Example News", + "categories": ["Technology", "AI"], + "id": "article123" + }, + "summary": "This is a summary of the article..." +} +``` + +**Example:** +``` +GET /article/article123 +``` + +## Data Models + +### Article + +```json +{ + "title": "string", + "content": "string", + "link": "string", + "published": "string", + "source": "string", + "categories": ["string"], + "id": "string" +} +``` + +### Insights + +```json +{ + "themes": ["string"], + "insights": ["string"], + "implications": ["string"], + "related_areas": ["string"] +} +``` + +## Error Handling + +The API uses standard HTTP status codes to indicate the success or failure of requests: + +- `200 OK`: The request was successful. +- `400 Bad Request`: The request was invalid or cannot be served. +- `404 Not Found`: The requested resource was not found. +- `500 Internal Server Error`: An error occurred on the server. + +Error responses include a JSON object with a `detail` field containing a description of the error: + +```json +{ + "detail": "Error message" +} +``` + +## Authentication + +The API does not currently require authentication. + +## Rate Limiting + +The API does not currently implement rate limiting. + +## Dependencies + +The API relies on the following external services: + +- **Groq API**: For generating article summaries and insights. +- **Pinecone Vector Database**: For storing and retrieving article embeddings. + +## Configuration + +The API can be configured by modifying the following environment variables: + +- `GROQ_API_KEY`: API key for the Groq service. +- `PINECONE_API_KEY`: API key for the Pinecone vector database. +- `PINECONE_ENVIRONMENT`: Environment for the Pinecone vector database. +- `PINECONE_INDEX`: Index name for the Pinecone vector database. + +## Development + +To run the API locally: + +1. Install the required dependencies: + ``` + pip install -r requirements.txt + ``` + +2. Set the required environment variables. + +3. Run the API: + ``` + python backend/main.py + ``` + +The API will be available at `http://localhost:8000`. + +## License + +This project is licensed under the MIT License - see the LICENSE file for details. diff --git a/docs/Technical_Documentation.md b/docs/Technical_Documentation.md new file mode 100644 index 0000000..590f87d --- /dev/null +++ b/docs/Technical_Documentation.md @@ -0,0 +1,150 @@ +# DS Task AI News - Technical Documentation + +## Architecture Overview + +The DS Task AI News application is built using a modular architecture with the following components: + +1. **FastAPI Backend**: Handles HTTP requests and serves HTML templates. +2. **News Fetcher**: Fetches news articles from RSS feeds. +3. **Embedding Generator**: Generates embeddings for articles using Cohere. +4. **Vector Store**: Stores and retrieves article embeddings using Pinecone. +5. **News Recommender**: Generates insights and recommendations using Groq. +6. **HTML Templates**: Renders the user interface. + +## Component Details + +### 1. FastAPI Backend (`main.py`) + +The FastAPI backend serves as the entry point for the application. It handles HTTP requests and serves HTML templates. The backend includes the following endpoints: + +- `/`: Home page with links to other routes. +- `/fetch-news`: Fetches news from RSS feeds and displays the latest articles. +- `/recommend-news`: Gets news recommendations based on an article ID or search query. +- `/article/{article_id}`: Gets a specific article and its summary. + +### 2. News Fetcher (`news_fetcher.py`) + +The News Fetcher component is responsible for fetching news articles from RSS feeds. It performs the following tasks: + +- Fetches articles from configured RSS feeds using the `feedparser` library. +- Cleans HTML content to extract plain text. +- Saves raw articles to JSON files. +- Processes articles with embeddings. +- Saves processed articles to JSON files. +- Stores articles in the vector database. + +### 3. Embedding Generator (`embeddings.py`) + +The Embedding Generator component is responsible for generating embeddings for articles. It performs the following tasks: + +- Generates embeddings for article content using Cohere. +- Processes articles to include embeddings. +- Generates query embeddings for search queries. + +### 4. Vector Store (`vector_store.py`) + +The Vector Store component is responsible for storing and retrieving article embeddings. It performs the following tasks: + +- Stores article embeddings in the Pinecone vector database. +- Retrieves similar articles based on query embeddings. +- Upserts articles to update the vector database. + +### 5. News Recommender (`recommender.py`) + +The News Recommender component is responsible for generating insights and recommendations. It performs the following tasks: + +- Analyzes articles to generate insights using Groq. +- Generates summaries for individual articles using Groq. + +### 6. HTML Templates + +The HTML templates are responsible for rendering the user interface. The templates include: + +- `base.html`: Base template with common layout elements. +- `home.html`: Home page template. +- `news.html`: Template for displaying news articles. +- `recommendations.html`: Template for displaying recommended articles and insights. + +## Data Flow + +1. **Fetching News**: + - User requests the `/fetch-news` endpoint. + - The backend calls the News Fetcher to fetch articles from RSS feeds. + - The News Fetcher cleans the articles and saves them to JSON files. + - The News Fetcher calls the Embedding Generator to generate embeddings for the articles. + - The News Fetcher calls the Vector Store to store the articles in the vector database. + - The backend renders the `news.html` template with the fetched articles. + +2. **Recommending News**: + - User requests the `/recommend-news` endpoint with a query parameter. + - The backend calls the Embedding Generator to generate a query embedding. + - The backend calls the Vector Store to retrieve similar articles. + - The backend calls the News Recommender to generate insights for the articles. + - The backend renders the `recommendations.html` template with the recommended articles and insights. + +3. **Getting an Article**: + - User requests the `/article/{article_id}` endpoint. + - The backend calls the Vector Store to retrieve the article. + - The backend calls the News Recommender to generate a summary for the article. + - The backend returns the article and summary as JSON. + +## Configuration + +The application is configured using environment variables and configuration files: + +- `config.py`: Contains configuration variables for the application. +- Environment variables: API keys and other sensitive information. + +## Dependencies + +The application relies on the following external services and libraries: + +- **FastAPI**: Web framework for building APIs. +- **Jinja2**: Template engine for rendering HTML. +- **feedparser**: Library for parsing RSS feeds. +- **BeautifulSoup**: Library for parsing HTML. +- **Cohere**: API for generating embeddings. +- **Pinecone**: Vector database for storing and retrieving embeddings. +- **Groq**: API for generating insights and summaries. + +## File Structure + +``` +ds_task_ai_news/ +├── backend/ +│ ├── main.py +│ ├── news_fetcher.py +│ ├── embeddings.py +│ ├── vector_store.py +│ ├── recommender.py +│ ├── config.py +│ └── templates/ +│ ├── base.html +│ ├── home.html +│ ├── news.html +│ └── recommendations.html +├── data/ +│ ├── raw_news/ +│ └── processed_news/ +├── docs/ +│ ├── API_Documentation.md +│ └── Technical_Documentation.md +└── requirements.txt +``` + +## Error Handling + +The application uses try-except blocks to handle errors gracefully. Errors are logged using the `logging` module and returned as HTTP responses with appropriate status codes. + +## Future Improvements + +Potential improvements for the application include: + +1. **Authentication**: Add user authentication to protect sensitive endpoints. +2. **Rate Limiting**: Implement rate limiting to prevent abuse. +3. **Caching**: Add caching to improve performance. +4. **Testing**: Add unit and integration tests. +5. **Deployment**: Deploy the application to a cloud provider. +6. **Monitoring**: Add monitoring and alerting. +7. **User Preferences**: Allow users to customize their news preferences. +8. **Mobile App**: Develop a mobile app for the application. \ No newline at end of file diff --git a/docs/User_Guide.md b/docs/User_Guide.md new file mode 100644 index 0000000..b7cebf7 --- /dev/null +++ b/docs/User_Guide.md @@ -0,0 +1,142 @@ +# DS Task AI News - User Guide + +## Introduction + +DS Task AI News is an AI-powered news application that fetches, processes, and recommends news articles based on your interests. The application uses advanced AI technologies to analyze news articles and provide personalized insights and recommendations. + +## Features + +- **Latest News**: View the latest news articles fetched from various RSS feeds. +- **News Recommendations**: Get personalized news recommendations based on your interests. +- **AI Insights**: Receive AI-generated insights about news articles. +- **Article Summaries**: Get concise summaries of individual articles. + +## Getting Started + +### Prerequisites + +- Python 3.8 or higher +- pip (Python package manager) +- Internet connection + +### Installation + +1. Clone the repository: + ``` + git clone https://github.com/yourusername/ds_task_ai_news.git + cd ds_task_ai_news + ``` + +2. Install the required dependencies: + ``` + pip install -r requirements.txt + ``` + +3. Set up the required environment variables: + - Create a `.env` file in the root directory with the following content: + ``` + GROQ_API_KEY=your_groq_api_key + PINECONE_API_KEY=your_pinecone_api_key + PINECONE_ENVIRONMENT=your_pinecone_environment + PINECONE_INDEX=your_pinecone_index + ``` + +4. Run the application: + ``` + python backend/main.py + ``` + +5. Open your web browser and navigate to `http://localhost:8000`. + +## Using the Application + +### Home Page + +The home page provides links to the main features of the application: + +- **Latest News**: View the latest news articles. +- **Technology News**: Get recommendations for technology-related news. +- **AI News**: Get recommendations for AI-related news. + +### Latest News + +To view the latest news articles: + +1. Click on the "View Latest News" button on the home page. +2. The application will fetch the latest news articles from the configured RSS feeds. +3. The articles will be displayed in a grid layout with the following information: + - Title + - Content preview + - Source + - Publication date + - Categories + - "Read More" button + +### News Recommendations + +To get personalized news recommendations: + +1. Click on one of the recommendation buttons on the home page (e.g., "Technology News" or "AI News"). +2. Alternatively, you can navigate to `/recommend-news?query=your_search_query` to get recommendations based on a specific query. +3. The application will display recommended articles and AI-generated insights. +4. The insights section includes: + - Themes: Main topics and areas of focus in the news articles. + - Key Insights: Key takeaways and observations from the articles. + - Implications: Potential consequences and outcomes of the trends and developments. + - Related Areas: Other areas of interest connected to the themes and insights. + +### Article Details + +To view the details of a specific article: + +1. Click on the "Read More" button for an article. +2. The article will open in a new tab with the full content. + +## Customization + +### Adding RSS Feeds + +To add or modify the RSS feeds: + +1. Open the `backend/config.py` file. +2. Locate the `RSS_FEEDS` list. +3. Add or remove RSS feed URLs as needed. + +### Changing the UI + +The application uses Tailwind CSS for styling. To modify the UI: + +1. Open the HTML templates in the `backend/templates` directory. +2. Modify the HTML and CSS classes as needed. + +## Troubleshooting + +### Common Issues + +1. **Application not starting**: + - Check if all dependencies are installed correctly. + - Verify that the environment variables are set correctly. + - Check the console for error messages. + +2. **No news articles displayed**: + - Check your internet connection. + - Verify that the RSS feeds are accessible. + - Check the console for error messages. + +3. **AI insights not displaying correctly**: + - Verify that the Groq API key is set correctly. + - Check the console for error messages. + +### Getting Help + +If you encounter any issues not covered in this guide, please: + +1. Check the console for error messages. +2. Refer to the API and Technical documentation. +3. Contact the development team for assistance. + +## Conclusion + +DS Task AI News is a powerful tool for staying informed about the latest news and trends. By leveraging AI technologies, it provides personalized insights and recommendations to help you make sense of the news. + +We hope you find this guide helpful. If you have any questions or feedback, please don't hesitate to contact us. \ No newline at end of file