Test/NewsIQ-AI-News-Intelligence-System

Fork 0

T

OwusuBlessing b8ce97caa2 first commit

2025-07-25 20:56:11 +01:00

README.md

first commit

2025-07-25 20:56:11 +01:00

README.md

NewsIQ - AI News Intelligence System

##Project Overview

NewsIQ is an AI-powered news intelligence platform that ingests RSS feeds, performs semantic analysis, and delivers intelligent recommendations. The system demonstrates your ability to design clean backend architectures and integrate AI capabilities with modern tooling.

🛠️ System Workflow

The platform consists of a 4-stage pipeline:

RSS Ingestion: Pulls articles from RSS feeds with deduplication and feed tracking
AI Processing: Performs embedding generation, summarization, sentiment analysis, entity extraction, and category classification
Vector Storage: Stores embeddings in a vector database for semantic search
Intelligent Retrieval: Enables semantic search and AI-driven recommendations

What the System Should Do

Core Features

Smart RSS Ingestion
- Parse articles from multiple feeds
- Track last updated time per feed
- Avoid reprocessing old or duplicate articles
AI-Based Content Analysis
- Generate embeddings using Cohere
- Extract named entities, sentiment, summaries, and categories via Groq
Semantic Vector Search
- Store and query article embeddings using vector similarity
- Filter results using metadata (source, sentiment, date)
Intelligent Recommendations
- Recommend articles based on similarity, recency, and category preferences
Category Filtering by User Preference
- Let users set categories of interest (e.g., sports, music)
- Only process articles that match user-defined categories
Robust API
- Well-structured endpoints for updates, search, recommendations, and analytics
- OpenAPI documentation for interactive testing

Tech Stack

Component	Tool
Backend	FastAPI
ORM	SQLAlchemy
DB	SQLite (production-ready switchable)
Embeddings	Cohere API
LLM Analysis	Groq API
Vector DB	Choose one: FAISS / Weaviate / Pinecone
Feeds	`feedparser`, `requests`
Env Mgmt	`python-dotenv`
Migrations	Alembic
Testing	`pytest`

RSS Feed Sources

https://www.nytimes.com/rss https://www.cnbc.com/id/100727362/device/rss/rss.html https://www.bbc.co.uk/sport/football/57000000 https://www.aljazeera.com/xml/rss/all.xml https://www.nytimes.com/svc/collections/v1/publish/https://www.nytimes.com/section/world/rss.xml https://globalnews.ca/world/feed/ https://feeds.skynews.com/feeds/rss/world.xml https://www.e-ir.info/feed/ https://www.thecipherbrief.com/feeds/feed.rss https://warontherocks.com/feed/


---

## User Settings (Category Preferences)

Let users define what types of news they care about — e.g., *sports*, *technology*, *music*.

### Settings API

```http
# Update user preferences
PUT /api/settings/categories

Request Body:
{
    "user_id": "user_123",
    "preferred_categories": ["sports", "technology", "music"]
}

Response:
{
    "status": "success",
    "message": "Preferences updated"
}

# Get user preferences
GET /api/settings/categories?user_id=user_123

Response:
{
    "user_id": "user_123",
    "preferred_categories": ["sports", "technology", "music"]
}

Filtering Logic

Only process and store articles that match the user’s categories
AI should classify articles into categories before deciding whether to continue processing
Defaults to processing all categories if no preferences are set

API Endpoints Overview

Article Updates

POST /api/updates/fetch-latest Fetch and process new articles from RSS feeds
GET /api/updates/status Get current ingestion status for all RSS feeds

Article Management

GET /api/articles/{article_id} Retrieve full article details
GET /api/articles/ Paginated, filterable list of articles
GET /api/articles/{id}/analysis Get AI-generated metadata (summary, entities, sentiment, etc.)

Semantic Search & Discovery

POST /api/search/semantic Perform embedding-based semantic search
GET /api/search/similar/{id} Find articles similar to the given one
POST /api/recommendations/ Generate personalized article recommendations

Analytics

GET /api/analytics/trends Get trending topics from recent articles
GET /api/analytics/sentiment Analyze sentiment distribution over time or source
GET /api/analytics/sources View performance/coverage of individual RSS sources

Key Technical Challenges

1. Incremental Updates

Avoid reprocessing the same articles
Track and sync timestamps per feed
Detect updated articles by comparing URL + content fingerprint

2. Duplicate Detection

Use URL match + fuzzy title match + content similarity
Handle near-duplicate and updated versions of the same story

3. Category Filtering

Classify articles during AI processing
Skip irrelevant categories based on user preferences
Ensure high accuracy in categorization

4. AI & Vector Sync

Ensure metadata and vector store are always in sync
Handle failures in vector DB gracefully
Implement cleanup of orphaned or outdated vectors

5. Performance

Index frequently filtered fields
Use batch processing and async operations
Optimize semantic search

Bonus Features (Optional)

Background job scheduler for automated fetches

✅ Success Criteria

Functional Requirements

RSS articles processed
Incremental updates work without duplication
Semantic search returns relevant results
User preferences respected in every processing cycle

Technical Requirements

Clean, modular, testable codebase
Proper use of SQLAlchemy, Pydantic, and environment configs
Documented Readme
Unit test coverage for core logic
Alembic-based migrations for schema changes

📁 Deliverables

✅ Working FastAPI backend with REST endpoints
✅ SQLAlchemy ORM models with Alembic migrations
✅ AI integration with Cohere and Groq
✅ Vector similarity search with metadata filters
✅ Smart RSS ingestion with category filtering
✅ API documentation via OpenAPI
✅ Clean README with setup and architecture overview (add architectural diagram)

Documentation Expectations

Describe high-level architecture and design decisions
Explain how article processing, filtering, and recommendations work
Document how category filtering is enforced
Include instructions for deployment, testing, and extending the system

README.md Unescape Escape

NewsIQ - AI News Intelligence System

🛠️ System Workflow

What the System Should Do

Core Features

Tech Stack

RSS Feed Sources

Filtering Logic

API Endpoints Overview

Article Updates

Article Management

Semantic Search & Discovery

Analytics

Key Technical Challenges

1. Incremental Updates

2. Duplicate Detection

3. Category Filtering

4. AI & Vector Sync

5. Performance

Bonus Features (Optional)

✅ Success Criteria

Functional Requirements

Technical Requirements

📁 Deliverables

Documentation Expectations

README.md