Update README and backend functionality for improved news application

- Enhanced README.md with a clearer project overview, features, technologies used, and installation instructions.
- Updated vector dimension in config.py from 4096 to 1024 for Cohere embeddings.
- Modified main.py to serve HTML responses for the home page, news fetching, and recommendations.
- Improved error handling and ensured articles have links in the responses.
- Cleaned up news_fetcher.py by removing unnecessary print statements.
- Updated recommender.py to refine insights generation and summary extraction.
- Added Jinja2 for templating and improved the project structure for better organization.
- Included API documentation for better understanding of endpoints and usage.
This commit is contained in:
boladeE
2025-04-15 11:59:39 +01:00
parent e3d00bb4dc
commit bc485b44b8
14 changed files with 957 additions and 108 deletions
+186
View File
@@ -0,0 +1,186 @@
# DS Task AI News API Documentation
## Overview
The DS Task AI News API is a FastAPI-based application that provides endpoints for fetching, processing, and recommending news articles. The API uses AI-powered analysis to generate insights and recommendations based on news articles from various RSS feeds.
## Base URL
```
http://localhost:8000
```
## Endpoints
### 1. Home Page
**Endpoint:** `/`
**Method:** `GET`
**Description:** Returns the home page with links to other routes.
**Response:** HTML page with navigation links to other endpoints.
**Example:**
```
GET /
```
### 2. Fetch News
**Endpoint:** `/fetch-news`
**Method:** `GET`
**Description:** Fetches news from RSS feeds, processes them, and stores them in the vector database. Returns a page displaying the latest news articles.
**Response:** HTML page displaying the latest news articles.
**Example:**
```
GET /fetch-news
```
### 3. Recommend News
**Endpoint:** `/recommend-news`
**Method:** `GET`
**Description:** Gets news recommendations based on an article ID or search query. Returns a page displaying recommended articles and AI-generated insights.
**Query Parameters:**
- `article_id` (optional): ID of an article to base recommendations on.
- `query` (optional): Search query to base recommendations on.
**Response:** HTML page displaying recommended articles and AI-generated insights.
**Example:**
```
GET /recommend-news?query=artificial%20intelligence
```
### 4. Get Article
**Endpoint:** `/article/{article_id}`
**Method:** `GET`
**Description:** Gets a specific article and its summary.
**Path Parameters:**
- `article_id`: ID of the article to retrieve.
**Response:** JSON object containing the article and its summary.
**Example Response:**
```json
{
"article": {
"title": "Example Article Title",
"content": "Example article content...",
"link": "https://example.com/article",
"published": "2023-01-01T12:00:00",
"source": "Example News",
"categories": ["Technology", "AI"],
"id": "article123"
},
"summary": "This is a summary of the article..."
}
```
**Example:**
```
GET /article/article123
```
## Data Models
### Article
```json
{
"title": "string",
"content": "string",
"link": "string",
"published": "string",
"source": "string",
"categories": ["string"],
"id": "string"
}
```
### Insights
```json
{
"themes": ["string"],
"insights": ["string"],
"implications": ["string"],
"related_areas": ["string"]
}
```
## Error Handling
The API uses standard HTTP status codes to indicate the success or failure of requests:
- `200 OK`: The request was successful.
- `400 Bad Request`: The request was invalid or cannot be served.
- `404 Not Found`: The requested resource was not found.
- `500 Internal Server Error`: An error occurred on the server.
Error responses include a JSON object with a `detail` field containing a description of the error:
```json
{
"detail": "Error message"
}
```
## Authentication
The API does not currently require authentication.
## Rate Limiting
The API does not currently implement rate limiting.
## Dependencies
The API relies on the following external services:
- **Groq API**: For generating article summaries and insights.
- **Pinecone Vector Database**: For storing and retrieving article embeddings.
## Configuration
The API can be configured by modifying the following environment variables:
- `GROQ_API_KEY`: API key for the Groq service.
- `PINECONE_API_KEY`: API key for the Pinecone vector database.
- `PINECONE_ENVIRONMENT`: Environment for the Pinecone vector database.
- `PINECONE_INDEX`: Index name for the Pinecone vector database.
## Development
To run the API locally:
1. Install the required dependencies:
```
pip install -r requirements.txt
```
2. Set the required environment variables.
3. Run the API:
```
python backend/main.py
```
The API will be available at `http://localhost:8000`.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
+150
View File
@@ -0,0 +1,150 @@
# DS Task AI News - Technical Documentation
## Architecture Overview
The DS Task AI News application is built using a modular architecture with the following components:
1. **FastAPI Backend**: Handles HTTP requests and serves HTML templates.
2. **News Fetcher**: Fetches news articles from RSS feeds.
3. **Embedding Generator**: Generates embeddings for articles using Cohere.
4. **Vector Store**: Stores and retrieves article embeddings using Pinecone.
5. **News Recommender**: Generates insights and recommendations using Groq.
6. **HTML Templates**: Renders the user interface.
## Component Details
### 1. FastAPI Backend (`main.py`)
The FastAPI backend serves as the entry point for the application. It handles HTTP requests and serves HTML templates. The backend includes the following endpoints:
- `/`: Home page with links to other routes.
- `/fetch-news`: Fetches news from RSS feeds and displays the latest articles.
- `/recommend-news`: Gets news recommendations based on an article ID or search query.
- `/article/{article_id}`: Gets a specific article and its summary.
### 2. News Fetcher (`news_fetcher.py`)
The News Fetcher component is responsible for fetching news articles from RSS feeds. It performs the following tasks:
- Fetches articles from configured RSS feeds using the `feedparser` library.
- Cleans HTML content to extract plain text.
- Saves raw articles to JSON files.
- Processes articles with embeddings.
- Saves processed articles to JSON files.
- Stores articles in the vector database.
### 3. Embedding Generator (`embeddings.py`)
The Embedding Generator component is responsible for generating embeddings for articles. It performs the following tasks:
- Generates embeddings for article content using Cohere.
- Processes articles to include embeddings.
- Generates query embeddings for search queries.
### 4. Vector Store (`vector_store.py`)
The Vector Store component is responsible for storing and retrieving article embeddings. It performs the following tasks:
- Stores article embeddings in the Pinecone vector database.
- Retrieves similar articles based on query embeddings.
- Upserts articles to update the vector database.
### 5. News Recommender (`recommender.py`)
The News Recommender component is responsible for generating insights and recommendations. It performs the following tasks:
- Analyzes articles to generate insights using Groq.
- Generates summaries for individual articles using Groq.
### 6. HTML Templates
The HTML templates are responsible for rendering the user interface. The templates include:
- `base.html`: Base template with common layout elements.
- `home.html`: Home page template.
- `news.html`: Template for displaying news articles.
- `recommendations.html`: Template for displaying recommended articles and insights.
## Data Flow
1. **Fetching News**:
- User requests the `/fetch-news` endpoint.
- The backend calls the News Fetcher to fetch articles from RSS feeds.
- The News Fetcher cleans the articles and saves them to JSON files.
- The News Fetcher calls the Embedding Generator to generate embeddings for the articles.
- The News Fetcher calls the Vector Store to store the articles in the vector database.
- The backend renders the `news.html` template with the fetched articles.
2. **Recommending News**:
- User requests the `/recommend-news` endpoint with a query parameter.
- The backend calls the Embedding Generator to generate a query embedding.
- The backend calls the Vector Store to retrieve similar articles.
- The backend calls the News Recommender to generate insights for the articles.
- The backend renders the `recommendations.html` template with the recommended articles and insights.
3. **Getting an Article**:
- User requests the `/article/{article_id}` endpoint.
- The backend calls the Vector Store to retrieve the article.
- The backend calls the News Recommender to generate a summary for the article.
- The backend returns the article and summary as JSON.
## Configuration
The application is configured using environment variables and configuration files:
- `config.py`: Contains configuration variables for the application.
- Environment variables: API keys and other sensitive information.
## Dependencies
The application relies on the following external services and libraries:
- **FastAPI**: Web framework for building APIs.
- **Jinja2**: Template engine for rendering HTML.
- **feedparser**: Library for parsing RSS feeds.
- **BeautifulSoup**: Library for parsing HTML.
- **Cohere**: API for generating embeddings.
- **Pinecone**: Vector database for storing and retrieving embeddings.
- **Groq**: API for generating insights and summaries.
## File Structure
```
ds_task_ai_news/
├── backend/
│ ├── main.py
│ ├── news_fetcher.py
│ ├── embeddings.py
│ ├── vector_store.py
│ ├── recommender.py
│ ├── config.py
│ └── templates/
│ ├── base.html
│ ├── home.html
│ ├── news.html
│ └── recommendations.html
├── data/
│ ├── raw_news/
│ └── processed_news/
├── docs/
│ ├── API_Documentation.md
│ └── Technical_Documentation.md
└── requirements.txt
```
## Error Handling
The application uses try-except blocks to handle errors gracefully. Errors are logged using the `logging` module and returned as HTTP responses with appropriate status codes.
## Future Improvements
Potential improvements for the application include:
1. **Authentication**: Add user authentication to protect sensitive endpoints.
2. **Rate Limiting**: Implement rate limiting to prevent abuse.
3. **Caching**: Add caching to improve performance.
4. **Testing**: Add unit and integration tests.
5. **Deployment**: Deploy the application to a cloud provider.
6. **Monitoring**: Add monitoring and alerting.
7. **User Preferences**: Allow users to customize their news preferences.
8. **Mobile App**: Develop a mobile app for the application.
+142
View File
@@ -0,0 +1,142 @@
# DS Task AI News - User Guide
## Introduction
DS Task AI News is an AI-powered news application that fetches, processes, and recommends news articles based on your interests. The application uses advanced AI technologies to analyze news articles and provide personalized insights and recommendations.
## Features
- **Latest News**: View the latest news articles fetched from various RSS feeds.
- **News Recommendations**: Get personalized news recommendations based on your interests.
- **AI Insights**: Receive AI-generated insights about news articles.
- **Article Summaries**: Get concise summaries of individual articles.
## Getting Started
### Prerequisites
- Python 3.8 or higher
- pip (Python package manager)
- Internet connection
### Installation
1. Clone the repository:
```
git clone https://github.com/yourusername/ds_task_ai_news.git
cd ds_task_ai_news
```
2. Install the required dependencies:
```
pip install -r requirements.txt
```
3. Set up the required environment variables:
- Create a `.env` file in the root directory with the following content:
```
GROQ_API_KEY=your_groq_api_key
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_ENVIRONMENT=your_pinecone_environment
PINECONE_INDEX=your_pinecone_index
```
4. Run the application:
```
python backend/main.py
```
5. Open your web browser and navigate to `http://localhost:8000`.
## Using the Application
### Home Page
The home page provides links to the main features of the application:
- **Latest News**: View the latest news articles.
- **Technology News**: Get recommendations for technology-related news.
- **AI News**: Get recommendations for AI-related news.
### Latest News
To view the latest news articles:
1. Click on the "View Latest News" button on the home page.
2. The application will fetch the latest news articles from the configured RSS feeds.
3. The articles will be displayed in a grid layout with the following information:
- Title
- Content preview
- Source
- Publication date
- Categories
- "Read More" button
### News Recommendations
To get personalized news recommendations:
1. Click on one of the recommendation buttons on the home page (e.g., "Technology News" or "AI News").
2. Alternatively, you can navigate to `/recommend-news?query=your_search_query` to get recommendations based on a specific query.
3. The application will display recommended articles and AI-generated insights.
4. The insights section includes:
- Themes: Main topics and areas of focus in the news articles.
- Key Insights: Key takeaways and observations from the articles.
- Implications: Potential consequences and outcomes of the trends and developments.
- Related Areas: Other areas of interest connected to the themes and insights.
### Article Details
To view the details of a specific article:
1. Click on the "Read More" button for an article.
2. The article will open in a new tab with the full content.
## Customization
### Adding RSS Feeds
To add or modify the RSS feeds:
1. Open the `backend/config.py` file.
2. Locate the `RSS_FEEDS` list.
3. Add or remove RSS feed URLs as needed.
### Changing the UI
The application uses Tailwind CSS for styling. To modify the UI:
1. Open the HTML templates in the `backend/templates` directory.
2. Modify the HTML and CSS classes as needed.
## Troubleshooting
### Common Issues
1. **Application not starting**:
- Check if all dependencies are installed correctly.
- Verify that the environment variables are set correctly.
- Check the console for error messages.
2. **No news articles displayed**:
- Check your internet connection.
- Verify that the RSS feeds are accessible.
- Check the console for error messages.
3. **AI insights not displaying correctly**:
- Verify that the Groq API key is set correctly.
- Check the console for error messages.
### Getting Help
If you encounter any issues not covered in this guide, please:
1. Check the console for error messages.
2. Refer to the API and Technical documentation.
3. Contact the development team for assistance.
## Conclusion
DS Task AI News is a powerful tool for staying informed about the latest news and trends. By leveraging AI technologies, it provides personalized insights and recommendations to help you make sense of the news.
We hope you find this guide helpful. If you have any questions or feedback, please don't hesitate to contact us.