Update README and backend functionality for improved news application
- Enhanced README.md with a clearer project overview, features, technologies used, and installation instructions. - Updated vector dimension in config.py from 4096 to 1024 for Cohere embeddings. - Modified main.py to serve HTML responses for the home page, news fetching, and recommendations. - Improved error handling and ensured articles have links in the responses. - Cleaned up news_fetcher.py by removing unnecessary print statements. - Updated recommender.py to refine insights generation and summary extraction. - Added Jinja2 for templating and improved the project structure for better organization. - Included API documentation for better understanding of endpoints and usage.
This commit is contained in:
@@ -0,0 +1,186 @@
|
||||
# DS Task AI News API Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
The DS Task AI News API is a FastAPI-based application that provides endpoints for fetching, processing, and recommending news articles. The API uses AI-powered analysis to generate insights and recommendations based on news articles from various RSS feeds.
|
||||
|
||||
## Base URL
|
||||
|
||||
```
|
||||
http://localhost:8000
|
||||
```
|
||||
|
||||
## Endpoints
|
||||
|
||||
### 1. Home Page
|
||||
|
||||
**Endpoint:** `/`
|
||||
|
||||
**Method:** `GET`
|
||||
|
||||
**Description:** Returns the home page with links to other routes.
|
||||
|
||||
**Response:** HTML page with navigation links to other endpoints.
|
||||
|
||||
**Example:**
|
||||
```
|
||||
GET /
|
||||
```
|
||||
|
||||
### 2. Fetch News
|
||||
|
||||
**Endpoint:** `/fetch-news`
|
||||
|
||||
**Method:** `GET`
|
||||
|
||||
**Description:** Fetches news from RSS feeds, processes them, and stores them in the vector database. Returns a page displaying the latest news articles.
|
||||
|
||||
**Response:** HTML page displaying the latest news articles.
|
||||
|
||||
**Example:**
|
||||
```
|
||||
GET /fetch-news
|
||||
```
|
||||
|
||||
### 3. Recommend News
|
||||
|
||||
**Endpoint:** `/recommend-news`
|
||||
|
||||
**Method:** `GET`
|
||||
|
||||
**Description:** Gets news recommendations based on an article ID or search query. Returns a page displaying recommended articles and AI-generated insights.
|
||||
|
||||
**Query Parameters:**
|
||||
- `article_id` (optional): ID of an article to base recommendations on.
|
||||
- `query` (optional): Search query to base recommendations on.
|
||||
|
||||
**Response:** HTML page displaying recommended articles and AI-generated insights.
|
||||
|
||||
**Example:**
|
||||
```
|
||||
GET /recommend-news?query=artificial%20intelligence
|
||||
```
|
||||
|
||||
### 4. Get Article
|
||||
|
||||
**Endpoint:** `/article/{article_id}`
|
||||
|
||||
**Method:** `GET`
|
||||
|
||||
**Description:** Gets a specific article and its summary.
|
||||
|
||||
**Path Parameters:**
|
||||
- `article_id`: ID of the article to retrieve.
|
||||
|
||||
**Response:** JSON object containing the article and its summary.
|
||||
|
||||
**Example Response:**
|
||||
```json
|
||||
{
|
||||
"article": {
|
||||
"title": "Example Article Title",
|
||||
"content": "Example article content...",
|
||||
"link": "https://example.com/article",
|
||||
"published": "2023-01-01T12:00:00",
|
||||
"source": "Example News",
|
||||
"categories": ["Technology", "AI"],
|
||||
"id": "article123"
|
||||
},
|
||||
"summary": "This is a summary of the article..."
|
||||
}
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```
|
||||
GET /article/article123
|
||||
```
|
||||
|
||||
## Data Models
|
||||
|
||||
### Article
|
||||
|
||||
```json
|
||||
{
|
||||
"title": "string",
|
||||
"content": "string",
|
||||
"link": "string",
|
||||
"published": "string",
|
||||
"source": "string",
|
||||
"categories": ["string"],
|
||||
"id": "string"
|
||||
}
|
||||
```
|
||||
|
||||
### Insights
|
||||
|
||||
```json
|
||||
{
|
||||
"themes": ["string"],
|
||||
"insights": ["string"],
|
||||
"implications": ["string"],
|
||||
"related_areas": ["string"]
|
||||
}
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
The API uses standard HTTP status codes to indicate the success or failure of requests:
|
||||
|
||||
- `200 OK`: The request was successful.
|
||||
- `400 Bad Request`: The request was invalid or cannot be served.
|
||||
- `404 Not Found`: The requested resource was not found.
|
||||
- `500 Internal Server Error`: An error occurred on the server.
|
||||
|
||||
Error responses include a JSON object with a `detail` field containing a description of the error:
|
||||
|
||||
```json
|
||||
{
|
||||
"detail": "Error message"
|
||||
}
|
||||
```
|
||||
|
||||
## Authentication
|
||||
|
||||
The API does not currently require authentication.
|
||||
|
||||
## Rate Limiting
|
||||
|
||||
The API does not currently implement rate limiting.
|
||||
|
||||
## Dependencies
|
||||
|
||||
The API relies on the following external services:
|
||||
|
||||
- **Groq API**: For generating article summaries and insights.
|
||||
- **Pinecone Vector Database**: For storing and retrieving article embeddings.
|
||||
|
||||
## Configuration
|
||||
|
||||
The API can be configured by modifying the following environment variables:
|
||||
|
||||
- `GROQ_API_KEY`: API key for the Groq service.
|
||||
- `PINECONE_API_KEY`: API key for the Pinecone vector database.
|
||||
- `PINECONE_ENVIRONMENT`: Environment for the Pinecone vector database.
|
||||
- `PINECONE_INDEX`: Index name for the Pinecone vector database.
|
||||
|
||||
## Development
|
||||
|
||||
To run the API locally:
|
||||
|
||||
1. Install the required dependencies:
|
||||
```
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
2. Set the required environment variables.
|
||||
|
||||
3. Run the API:
|
||||
```
|
||||
python backend/main.py
|
||||
```
|
||||
|
||||
The API will be available at `http://localhost:8000`.
|
||||
|
||||
## License
|
||||
|
||||
This project is licensed under the MIT License - see the LICENSE file for details.
|
||||
|
||||
@@ -0,0 +1,150 @@
|
||||
# DS Task AI News - Technical Documentation
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
The DS Task AI News application is built using a modular architecture with the following components:
|
||||
|
||||
1. **FastAPI Backend**: Handles HTTP requests and serves HTML templates.
|
||||
2. **News Fetcher**: Fetches news articles from RSS feeds.
|
||||
3. **Embedding Generator**: Generates embeddings for articles using Cohere.
|
||||
4. **Vector Store**: Stores and retrieves article embeddings using Pinecone.
|
||||
5. **News Recommender**: Generates insights and recommendations using Groq.
|
||||
6. **HTML Templates**: Renders the user interface.
|
||||
|
||||
## Component Details
|
||||
|
||||
### 1. FastAPI Backend (`main.py`)
|
||||
|
||||
The FastAPI backend serves as the entry point for the application. It handles HTTP requests and serves HTML templates. The backend includes the following endpoints:
|
||||
|
||||
- `/`: Home page with links to other routes.
|
||||
- `/fetch-news`: Fetches news from RSS feeds and displays the latest articles.
|
||||
- `/recommend-news`: Gets news recommendations based on an article ID or search query.
|
||||
- `/article/{article_id}`: Gets a specific article and its summary.
|
||||
|
||||
### 2. News Fetcher (`news_fetcher.py`)
|
||||
|
||||
The News Fetcher component is responsible for fetching news articles from RSS feeds. It performs the following tasks:
|
||||
|
||||
- Fetches articles from configured RSS feeds using the `feedparser` library.
|
||||
- Cleans HTML content to extract plain text.
|
||||
- Saves raw articles to JSON files.
|
||||
- Processes articles with embeddings.
|
||||
- Saves processed articles to JSON files.
|
||||
- Stores articles in the vector database.
|
||||
|
||||
### 3. Embedding Generator (`embeddings.py`)
|
||||
|
||||
The Embedding Generator component is responsible for generating embeddings for articles. It performs the following tasks:
|
||||
|
||||
- Generates embeddings for article content using Cohere.
|
||||
- Processes articles to include embeddings.
|
||||
- Generates query embeddings for search queries.
|
||||
|
||||
### 4. Vector Store (`vector_store.py`)
|
||||
|
||||
The Vector Store component is responsible for storing and retrieving article embeddings. It performs the following tasks:
|
||||
|
||||
- Stores article embeddings in the Pinecone vector database.
|
||||
- Retrieves similar articles based on query embeddings.
|
||||
- Upserts articles to update the vector database.
|
||||
|
||||
### 5. News Recommender (`recommender.py`)
|
||||
|
||||
The News Recommender component is responsible for generating insights and recommendations. It performs the following tasks:
|
||||
|
||||
- Analyzes articles to generate insights using Groq.
|
||||
- Generates summaries for individual articles using Groq.
|
||||
|
||||
### 6. HTML Templates
|
||||
|
||||
The HTML templates are responsible for rendering the user interface. The templates include:
|
||||
|
||||
- `base.html`: Base template with common layout elements.
|
||||
- `home.html`: Home page template.
|
||||
- `news.html`: Template for displaying news articles.
|
||||
- `recommendations.html`: Template for displaying recommended articles and insights.
|
||||
|
||||
## Data Flow
|
||||
|
||||
1. **Fetching News**:
|
||||
- User requests the `/fetch-news` endpoint.
|
||||
- The backend calls the News Fetcher to fetch articles from RSS feeds.
|
||||
- The News Fetcher cleans the articles and saves them to JSON files.
|
||||
- The News Fetcher calls the Embedding Generator to generate embeddings for the articles.
|
||||
- The News Fetcher calls the Vector Store to store the articles in the vector database.
|
||||
- The backend renders the `news.html` template with the fetched articles.
|
||||
|
||||
2. **Recommending News**:
|
||||
- User requests the `/recommend-news` endpoint with a query parameter.
|
||||
- The backend calls the Embedding Generator to generate a query embedding.
|
||||
- The backend calls the Vector Store to retrieve similar articles.
|
||||
- The backend calls the News Recommender to generate insights for the articles.
|
||||
- The backend renders the `recommendations.html` template with the recommended articles and insights.
|
||||
|
||||
3. **Getting an Article**:
|
||||
- User requests the `/article/{article_id}` endpoint.
|
||||
- The backend calls the Vector Store to retrieve the article.
|
||||
- The backend calls the News Recommender to generate a summary for the article.
|
||||
- The backend returns the article and summary as JSON.
|
||||
|
||||
## Configuration
|
||||
|
||||
The application is configured using environment variables and configuration files:
|
||||
|
||||
- `config.py`: Contains configuration variables for the application.
|
||||
- Environment variables: API keys and other sensitive information.
|
||||
|
||||
## Dependencies
|
||||
|
||||
The application relies on the following external services and libraries:
|
||||
|
||||
- **FastAPI**: Web framework for building APIs.
|
||||
- **Jinja2**: Template engine for rendering HTML.
|
||||
- **feedparser**: Library for parsing RSS feeds.
|
||||
- **BeautifulSoup**: Library for parsing HTML.
|
||||
- **Cohere**: API for generating embeddings.
|
||||
- **Pinecone**: Vector database for storing and retrieving embeddings.
|
||||
- **Groq**: API for generating insights and summaries.
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
ds_task_ai_news/
|
||||
├── backend/
|
||||
│ ├── main.py
|
||||
│ ├── news_fetcher.py
|
||||
│ ├── embeddings.py
|
||||
│ ├── vector_store.py
|
||||
│ ├── recommender.py
|
||||
│ ├── config.py
|
||||
│ └── templates/
|
||||
│ ├── base.html
|
||||
│ ├── home.html
|
||||
│ ├── news.html
|
||||
│ └── recommendations.html
|
||||
├── data/
|
||||
│ ├── raw_news/
|
||||
│ └── processed_news/
|
||||
├── docs/
|
||||
│ ├── API_Documentation.md
|
||||
│ └── Technical_Documentation.md
|
||||
└── requirements.txt
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
The application uses try-except blocks to handle errors gracefully. Errors are logged using the `logging` module and returned as HTTP responses with appropriate status codes.
|
||||
|
||||
## Future Improvements
|
||||
|
||||
Potential improvements for the application include:
|
||||
|
||||
1. **Authentication**: Add user authentication to protect sensitive endpoints.
|
||||
2. **Rate Limiting**: Implement rate limiting to prevent abuse.
|
||||
3. **Caching**: Add caching to improve performance.
|
||||
4. **Testing**: Add unit and integration tests.
|
||||
5. **Deployment**: Deploy the application to a cloud provider.
|
||||
6. **Monitoring**: Add monitoring and alerting.
|
||||
7. **User Preferences**: Allow users to customize their news preferences.
|
||||
8. **Mobile App**: Develop a mobile app for the application.
|
||||
@@ -0,0 +1,142 @@
|
||||
# DS Task AI News - User Guide
|
||||
|
||||
## Introduction
|
||||
|
||||
DS Task AI News is an AI-powered news application that fetches, processes, and recommends news articles based on your interests. The application uses advanced AI technologies to analyze news articles and provide personalized insights and recommendations.
|
||||
|
||||
## Features
|
||||
|
||||
- **Latest News**: View the latest news articles fetched from various RSS feeds.
|
||||
- **News Recommendations**: Get personalized news recommendations based on your interests.
|
||||
- **AI Insights**: Receive AI-generated insights about news articles.
|
||||
- **Article Summaries**: Get concise summaries of individual articles.
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Python 3.8 or higher
|
||||
- pip (Python package manager)
|
||||
- Internet connection
|
||||
|
||||
### Installation
|
||||
|
||||
1. Clone the repository:
|
||||
```
|
||||
git clone https://github.com/yourusername/ds_task_ai_news.git
|
||||
cd ds_task_ai_news
|
||||
```
|
||||
|
||||
2. Install the required dependencies:
|
||||
```
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
3. Set up the required environment variables:
|
||||
- Create a `.env` file in the root directory with the following content:
|
||||
```
|
||||
GROQ_API_KEY=your_groq_api_key
|
||||
PINECONE_API_KEY=your_pinecone_api_key
|
||||
PINECONE_ENVIRONMENT=your_pinecone_environment
|
||||
PINECONE_INDEX=your_pinecone_index
|
||||
```
|
||||
|
||||
4. Run the application:
|
||||
```
|
||||
python backend/main.py
|
||||
```
|
||||
|
||||
5. Open your web browser and navigate to `http://localhost:8000`.
|
||||
|
||||
## Using the Application
|
||||
|
||||
### Home Page
|
||||
|
||||
The home page provides links to the main features of the application:
|
||||
|
||||
- **Latest News**: View the latest news articles.
|
||||
- **Technology News**: Get recommendations for technology-related news.
|
||||
- **AI News**: Get recommendations for AI-related news.
|
||||
|
||||
### Latest News
|
||||
|
||||
To view the latest news articles:
|
||||
|
||||
1. Click on the "View Latest News" button on the home page.
|
||||
2. The application will fetch the latest news articles from the configured RSS feeds.
|
||||
3. The articles will be displayed in a grid layout with the following information:
|
||||
- Title
|
||||
- Content preview
|
||||
- Source
|
||||
- Publication date
|
||||
- Categories
|
||||
- "Read More" button
|
||||
|
||||
### News Recommendations
|
||||
|
||||
To get personalized news recommendations:
|
||||
|
||||
1. Click on one of the recommendation buttons on the home page (e.g., "Technology News" or "AI News").
|
||||
2. Alternatively, you can navigate to `/recommend-news?query=your_search_query` to get recommendations based on a specific query.
|
||||
3. The application will display recommended articles and AI-generated insights.
|
||||
4. The insights section includes:
|
||||
- Themes: Main topics and areas of focus in the news articles.
|
||||
- Key Insights: Key takeaways and observations from the articles.
|
||||
- Implications: Potential consequences and outcomes of the trends and developments.
|
||||
- Related Areas: Other areas of interest connected to the themes and insights.
|
||||
|
||||
### Article Details
|
||||
|
||||
To view the details of a specific article:
|
||||
|
||||
1. Click on the "Read More" button for an article.
|
||||
2. The article will open in a new tab with the full content.
|
||||
|
||||
## Customization
|
||||
|
||||
### Adding RSS Feeds
|
||||
|
||||
To add or modify the RSS feeds:
|
||||
|
||||
1. Open the `backend/config.py` file.
|
||||
2. Locate the `RSS_FEEDS` list.
|
||||
3. Add or remove RSS feed URLs as needed.
|
||||
|
||||
### Changing the UI
|
||||
|
||||
The application uses Tailwind CSS for styling. To modify the UI:
|
||||
|
||||
1. Open the HTML templates in the `backend/templates` directory.
|
||||
2. Modify the HTML and CSS classes as needed.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Application not starting**:
|
||||
- Check if all dependencies are installed correctly.
|
||||
- Verify that the environment variables are set correctly.
|
||||
- Check the console for error messages.
|
||||
|
||||
2. **No news articles displayed**:
|
||||
- Check your internet connection.
|
||||
- Verify that the RSS feeds are accessible.
|
||||
- Check the console for error messages.
|
||||
|
||||
3. **AI insights not displaying correctly**:
|
||||
- Verify that the Groq API key is set correctly.
|
||||
- Check the console for error messages.
|
||||
|
||||
### Getting Help
|
||||
|
||||
If you encounter any issues not covered in this guide, please:
|
||||
|
||||
1. Check the console for error messages.
|
||||
2. Refer to the API and Technical documentation.
|
||||
3. Contact the development team for assistance.
|
||||
|
||||
## Conclusion
|
||||
|
||||
DS Task AI News is a powerful tool for staying informed about the latest news and trends. By leveraging AI technologies, it provides personalized insights and recommendations to help you make sense of the news.
|
||||
|
||||
We hope you find this guide helpful. If you have any questions or feedback, please don't hesitate to contact us.
|
||||
Reference in New Issue
Block a user