+{% endblock %}
\ No newline at end of file
diff --git a/backend/templates/recommendations.html b/backend/templates/recommendations.html
new file mode 100644
index 0000000..8096ae3
--- /dev/null
+++ b/backend/templates/recommendations.html
@@ -0,0 +1,157 @@
+{% extends "base.html" %}
+
+{% block title %}Recommended News - DS Task AI News{% endblock %}
+
+{% block content %}
+
+
+
AI Insights
+
+ {% if insights %}
+ {% if insights is string %}
+ {# If insights is a string (JSON or markdown), try to parse it #}
+ {% set insights_data = insights | from_json %}
+ {% if insights_data %}
+
+{% endblock %}
\ No newline at end of file
diff --git a/backend/vector_store.py b/backend/vector_store.py
index ece2be2..332ee34 100644
--- a/backend/vector_store.py
+++ b/backend/vector_store.py
@@ -2,7 +2,6 @@ from pinecone import Pinecone, ServerlessSpec
from typing import List, Dict, Any
from config import (
PINECONE_API_KEY,
- PINECONE_ENVIRONMENT,
PINECONE_INDEX_NAME,
VECTOR_DIMENSION,
TOP_K_RESULTS
@@ -16,13 +15,17 @@ class VectorStore:
def _ensure_index(self):
"""Ensure the Pinecone index exists, create if it doesn't."""
+ # Check if index exists, create if it doesn't
if self.index_name not in self.pinecone.list_indexes().names():
+ # Create a new index with the correct dimension
self.pinecone.create_index(
name=self.index_name,
dimension=VECTOR_DIMENSION,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
+ print(f"Created new index '{self.index_name}' with dimension {VECTOR_DIMENSION}")
+
self.index = self.pinecone.Index(self.index_name)
def upsert_articles(self, articles: List[Dict[str, Any]]) -> bool:
diff --git a/docs/API_Documentation.md b/docs/API_Documentation.md
index e69de29..30c12f7 100644
--- a/docs/API_Documentation.md
+++ b/docs/API_Documentation.md
@@ -0,0 +1,186 @@
+# DS Task AI News API Documentation
+
+## Overview
+
+The DS Task AI News API is a FastAPI-based application that provides endpoints for fetching, processing, and recommending news articles. The API uses AI-powered analysis to generate insights and recommendations based on news articles from various RSS feeds.
+
+## Base URL
+
+```
+http://localhost:8000
+```
+
+## Endpoints
+
+### 1. Home Page
+
+**Endpoint:** `/`
+
+**Method:** `GET`
+
+**Description:** Returns the home page with links to other routes.
+
+**Response:** HTML page with navigation links to other endpoints.
+
+**Example:**
+```
+GET /
+```
+
+### 2. Fetch News
+
+**Endpoint:** `/fetch-news`
+
+**Method:** `GET`
+
+**Description:** Fetches news from RSS feeds, processes them, and stores them in the vector database. Returns a page displaying the latest news articles.
+
+**Response:** HTML page displaying the latest news articles.
+
+**Example:**
+```
+GET /fetch-news
+```
+
+### 3. Recommend News
+
+**Endpoint:** `/recommend-news`
+
+**Method:** `GET`
+
+**Description:** Gets news recommendations based on an article ID or search query. Returns a page displaying recommended articles and AI-generated insights.
+
+**Query Parameters:**
+- `article_id` (optional): ID of an article to base recommendations on.
+- `query` (optional): Search query to base recommendations on.
+
+**Response:** HTML page displaying recommended articles and AI-generated insights.
+
+**Example:**
+```
+GET /recommend-news?query=artificial%20intelligence
+```
+
+### 4. Get Article
+
+**Endpoint:** `/article/{article_id}`
+
+**Method:** `GET`
+
+**Description:** Gets a specific article and its summary.
+
+**Path Parameters:**
+- `article_id`: ID of the article to retrieve.
+
+**Response:** JSON object containing the article and its summary.
+
+**Example Response:**
+```json
+{
+ "article": {
+ "title": "Example Article Title",
+ "content": "Example article content...",
+ "link": "https://example.com/article",
+ "published": "2023-01-01T12:00:00",
+ "source": "Example News",
+ "categories": ["Technology", "AI"],
+ "id": "article123"
+ },
+ "summary": "This is a summary of the article..."
+}
+```
+
+**Example:**
+```
+GET /article/article123
+```
+
+## Data Models
+
+### Article
+
+```json
+{
+ "title": "string",
+ "content": "string",
+ "link": "string",
+ "published": "string",
+ "source": "string",
+ "categories": ["string"],
+ "id": "string"
+}
+```
+
+### Insights
+
+```json
+{
+ "themes": ["string"],
+ "insights": ["string"],
+ "implications": ["string"],
+ "related_areas": ["string"]
+}
+```
+
+## Error Handling
+
+The API uses standard HTTP status codes to indicate the success or failure of requests:
+
+- `200 OK`: The request was successful.
+- `400 Bad Request`: The request was invalid or cannot be served.
+- `404 Not Found`: The requested resource was not found.
+- `500 Internal Server Error`: An error occurred on the server.
+
+Error responses include a JSON object with a `detail` field containing a description of the error:
+
+```json
+{
+ "detail": "Error message"
+}
+```
+
+## Authentication
+
+The API does not currently require authentication.
+
+## Rate Limiting
+
+The API does not currently implement rate limiting.
+
+## Dependencies
+
+The API relies on the following external services:
+
+- **Groq API**: For generating article summaries and insights.
+- **Pinecone Vector Database**: For storing and retrieving article embeddings.
+
+## Configuration
+
+The API can be configured by modifying the following environment variables:
+
+- `GROQ_API_KEY`: API key for the Groq service.
+- `PINECONE_API_KEY`: API key for the Pinecone vector database.
+- `PINECONE_ENVIRONMENT`: Environment for the Pinecone vector database.
+- `PINECONE_INDEX`: Index name for the Pinecone vector database.
+
+## Development
+
+To run the API locally:
+
+1. Install the required dependencies:
+ ```
+ pip install -r requirements.txt
+ ```
+
+2. Set the required environment variables.
+
+3. Run the API:
+ ```
+ python backend/main.py
+ ```
+
+The API will be available at `http://localhost:8000`.
+
+## License
+
+This project is licensed under the MIT License - see the LICENSE file for details.
diff --git a/docs/Technical_Documentation.md b/docs/Technical_Documentation.md
new file mode 100644
index 0000000..590f87d
--- /dev/null
+++ b/docs/Technical_Documentation.md
@@ -0,0 +1,150 @@
+# DS Task AI News - Technical Documentation
+
+## Architecture Overview
+
+The DS Task AI News application is built using a modular architecture with the following components:
+
+1. **FastAPI Backend**: Handles HTTP requests and serves HTML templates.
+2. **News Fetcher**: Fetches news articles from RSS feeds.
+3. **Embedding Generator**: Generates embeddings for articles using Cohere.
+4. **Vector Store**: Stores and retrieves article embeddings using Pinecone.
+5. **News Recommender**: Generates insights and recommendations using Groq.
+6. **HTML Templates**: Renders the user interface.
+
+## Component Details
+
+### 1. FastAPI Backend (`main.py`)
+
+The FastAPI backend serves as the entry point for the application. It handles HTTP requests and serves HTML templates. The backend includes the following endpoints:
+
+- `/`: Home page with links to other routes.
+- `/fetch-news`: Fetches news from RSS feeds and displays the latest articles.
+- `/recommend-news`: Gets news recommendations based on an article ID or search query.
+- `/article/{article_id}`: Gets a specific article and its summary.
+
+### 2. News Fetcher (`news_fetcher.py`)
+
+The News Fetcher component is responsible for fetching news articles from RSS feeds. It performs the following tasks:
+
+- Fetches articles from configured RSS feeds using the `feedparser` library.
+- Cleans HTML content to extract plain text.
+- Saves raw articles to JSON files.
+- Processes articles with embeddings.
+- Saves processed articles to JSON files.
+- Stores articles in the vector database.
+
+### 3. Embedding Generator (`embeddings.py`)
+
+The Embedding Generator component is responsible for generating embeddings for articles. It performs the following tasks:
+
+- Generates embeddings for article content using Cohere.
+- Processes articles to include embeddings.
+- Generates query embeddings for search queries.
+
+### 4. Vector Store (`vector_store.py`)
+
+The Vector Store component is responsible for storing and retrieving article embeddings. It performs the following tasks:
+
+- Stores article embeddings in the Pinecone vector database.
+- Retrieves similar articles based on query embeddings.
+- Upserts articles to update the vector database.
+
+### 5. News Recommender (`recommender.py`)
+
+The News Recommender component is responsible for generating insights and recommendations. It performs the following tasks:
+
+- Analyzes articles to generate insights using Groq.
+- Generates summaries for individual articles using Groq.
+
+### 6. HTML Templates
+
+The HTML templates are responsible for rendering the user interface. The templates include:
+
+- `base.html`: Base template with common layout elements.
+- `home.html`: Home page template.
+- `news.html`: Template for displaying news articles.
+- `recommendations.html`: Template for displaying recommended articles and insights.
+
+## Data Flow
+
+1. **Fetching News**:
+ - User requests the `/fetch-news` endpoint.
+ - The backend calls the News Fetcher to fetch articles from RSS feeds.
+ - The News Fetcher cleans the articles and saves them to JSON files.
+ - The News Fetcher calls the Embedding Generator to generate embeddings for the articles.
+ - The News Fetcher calls the Vector Store to store the articles in the vector database.
+ - The backend renders the `news.html` template with the fetched articles.
+
+2. **Recommending News**:
+ - User requests the `/recommend-news` endpoint with a query parameter.
+ - The backend calls the Embedding Generator to generate a query embedding.
+ - The backend calls the Vector Store to retrieve similar articles.
+ - The backend calls the News Recommender to generate insights for the articles.
+ - The backend renders the `recommendations.html` template with the recommended articles and insights.
+
+3. **Getting an Article**:
+ - User requests the `/article/{article_id}` endpoint.
+ - The backend calls the Vector Store to retrieve the article.
+ - The backend calls the News Recommender to generate a summary for the article.
+ - The backend returns the article and summary as JSON.
+
+## Configuration
+
+The application is configured using environment variables and configuration files:
+
+- `config.py`: Contains configuration variables for the application.
+- Environment variables: API keys and other sensitive information.
+
+## Dependencies
+
+The application relies on the following external services and libraries:
+
+- **FastAPI**: Web framework for building APIs.
+- **Jinja2**: Template engine for rendering HTML.
+- **feedparser**: Library for parsing RSS feeds.
+- **BeautifulSoup**: Library for parsing HTML.
+- **Cohere**: API for generating embeddings.
+- **Pinecone**: Vector database for storing and retrieving embeddings.
+- **Groq**: API for generating insights and summaries.
+
+## File Structure
+
+```
+ds_task_ai_news/
+├── backend/
+│ ├── main.py
+│ ├── news_fetcher.py
+│ ├── embeddings.py
+│ ├── vector_store.py
+│ ├── recommender.py
+│ ├── config.py
+│ └── templates/
+│ ├── base.html
+│ ├── home.html
+│ ├── news.html
+│ └── recommendations.html
+├── data/
+│ ├── raw_news/
+│ └── processed_news/
+├── docs/
+│ ├── API_Documentation.md
+│ └── Technical_Documentation.md
+└── requirements.txt
+```
+
+## Error Handling
+
+The application uses try-except blocks to handle errors gracefully. Errors are logged using the `logging` module and returned as HTTP responses with appropriate status codes.
+
+## Future Improvements
+
+Potential improvements for the application include:
+
+1. **Authentication**: Add user authentication to protect sensitive endpoints.
+2. **Rate Limiting**: Implement rate limiting to prevent abuse.
+3. **Caching**: Add caching to improve performance.
+4. **Testing**: Add unit and integration tests.
+5. **Deployment**: Deploy the application to a cloud provider.
+6. **Monitoring**: Add monitoring and alerting.
+7. **User Preferences**: Allow users to customize their news preferences.
+8. **Mobile App**: Develop a mobile app for the application.
\ No newline at end of file
diff --git a/docs/User_Guide.md b/docs/User_Guide.md
new file mode 100644
index 0000000..b7cebf7
--- /dev/null
+++ b/docs/User_Guide.md
@@ -0,0 +1,142 @@
+# DS Task AI News - User Guide
+
+## Introduction
+
+DS Task AI News is an AI-powered news application that fetches, processes, and recommends news articles based on your interests. The application uses advanced AI technologies to analyze news articles and provide personalized insights and recommendations.
+
+## Features
+
+- **Latest News**: View the latest news articles fetched from various RSS feeds.
+- **News Recommendations**: Get personalized news recommendations based on your interests.
+- **AI Insights**: Receive AI-generated insights about news articles.
+- **Article Summaries**: Get concise summaries of individual articles.
+
+## Getting Started
+
+### Prerequisites
+
+- Python 3.8 or higher
+- pip (Python package manager)
+- Internet connection
+
+### Installation
+
+1. Clone the repository:
+ ```
+ git clone https://github.com/yourusername/ds_task_ai_news.git
+ cd ds_task_ai_news
+ ```
+
+2. Install the required dependencies:
+ ```
+ pip install -r requirements.txt
+ ```
+
+3. Set up the required environment variables:
+ - Create a `.env` file in the root directory with the following content:
+ ```
+ GROQ_API_KEY=your_groq_api_key
+ PINECONE_API_KEY=your_pinecone_api_key
+ PINECONE_ENVIRONMENT=your_pinecone_environment
+ PINECONE_INDEX=your_pinecone_index
+ ```
+
+4. Run the application:
+ ```
+ python backend/main.py
+ ```
+
+5. Open your web browser and navigate to `http://localhost:8000`.
+
+## Using the Application
+
+### Home Page
+
+The home page provides links to the main features of the application:
+
+- **Latest News**: View the latest news articles.
+- **Technology News**: Get recommendations for technology-related news.
+- **AI News**: Get recommendations for AI-related news.
+
+### Latest News
+
+To view the latest news articles:
+
+1. Click on the "View Latest News" button on the home page.
+2. The application will fetch the latest news articles from the configured RSS feeds.
+3. The articles will be displayed in a grid layout with the following information:
+ - Title
+ - Content preview
+ - Source
+ - Publication date
+ - Categories
+ - "Read More" button
+
+### News Recommendations
+
+To get personalized news recommendations:
+
+1. Click on one of the recommendation buttons on the home page (e.g., "Technology News" or "AI News").
+2. Alternatively, you can navigate to `/recommend-news?query=your_search_query` to get recommendations based on a specific query.
+3. The application will display recommended articles and AI-generated insights.
+4. The insights section includes:
+ - Themes: Main topics and areas of focus in the news articles.
+ - Key Insights: Key takeaways and observations from the articles.
+ - Implications: Potential consequences and outcomes of the trends and developments.
+ - Related Areas: Other areas of interest connected to the themes and insights.
+
+### Article Details
+
+To view the details of a specific article:
+
+1. Click on the "Read More" button for an article.
+2. The article will open in a new tab with the full content.
+
+## Customization
+
+### Adding RSS Feeds
+
+To add or modify the RSS feeds:
+
+1. Open the `backend/config.py` file.
+2. Locate the `RSS_FEEDS` list.
+3. Add or remove RSS feed URLs as needed.
+
+### Changing the UI
+
+The application uses Tailwind CSS for styling. To modify the UI:
+
+1. Open the HTML templates in the `backend/templates` directory.
+2. Modify the HTML and CSS classes as needed.
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Application not starting**:
+ - Check if all dependencies are installed correctly.
+ - Verify that the environment variables are set correctly.
+ - Check the console for error messages.
+
+2. **No news articles displayed**:
+ - Check your internet connection.
+ - Verify that the RSS feeds are accessible.
+ - Check the console for error messages.
+
+3. **AI insights not displaying correctly**:
+ - Verify that the Groq API key is set correctly.
+ - Check the console for error messages.
+
+### Getting Help
+
+If you encounter any issues not covered in this guide, please:
+
+1. Check the console for error messages.
+2. Refer to the API and Technical documentation.
+3. Contact the development team for assistance.
+
+## Conclusion
+
+DS Task AI News is a powerful tool for staying informed about the latest news and trends. By leveraging AI technologies, it provides personalized insights and recommendations to help you make sense of the news.
+
+We hope you find this guide helpful. If you have any questions or feedback, please don't hesitate to contact us.
\ No newline at end of file