OwusuBlessing b8ce97caa2 first commit
2025-07-25 20:56:11 +01:00
2025-07-25 20:56:11 +01:00

NewsIQ - AI News Intelligence System

##Project Overview

NewsIQ is an AI-powered news intelligence platform that ingests RSS feeds, performs semantic analysis, and delivers intelligent recommendations. The system demonstrates your ability to design clean backend architectures and integrate AI capabilities with modern tooling.

🛠️ System Workflow

The platform consists of a 4-stage pipeline:

  1. RSS Ingestion: Pulls articles from RSS feeds with deduplication and feed tracking
  2. AI Processing: Performs embedding generation, summarization, sentiment analysis, entity extraction, and category classification
  3. Vector Storage: Stores embeddings in a vector database for semantic search
  4. Intelligent Retrieval: Enables semantic search and AI-driven recommendations

What the System Should Do

Core Features

  • Smart RSS Ingestion

    • Parse articles from multiple feeds
    • Track last updated time per feed
    • Avoid reprocessing old or duplicate articles
  • AI-Based Content Analysis

    • Generate embeddings using Cohere
    • Extract named entities, sentiment, summaries, and categories via Groq
  • Semantic Vector Search

    • Store and query article embeddings using vector similarity
    • Filter results using metadata (source, sentiment, date)
  • Intelligent Recommendations

    • Recommend articles based on similarity, recency, and category preferences
  • Category Filtering by User Preference

    • Let users set categories of interest (e.g., sports, music)
    • Only process articles that match user-defined categories
  • Robust API

    • Well-structured endpoints for updates, search, recommendations, and analytics
    • OpenAPI documentation for interactive testing

Tech Stack

Component Tool
Backend FastAPI
ORM SQLAlchemy
DB SQLite (production-ready switchable)
Embeddings Cohere API
LLM Analysis Groq API
Vector DB Choose one: FAISS / Weaviate / Pinecone
Feeds feedparser, requests
Env Mgmt python-dotenv
Migrations Alembic
Testing pytest

RSS Feed Sources

https://www.nytimes.com/rss https://www.cnbc.com/id/100727362/device/rss/rss.html https://www.bbc.co.uk/sport/football/57000000 https://www.aljazeera.com/xml/rss/all.xml https://www.nytimes.com/svc/collections/v1/publish/https://www.nytimes.com/section/world/rss.xml https://globalnews.ca/world/feed/ https://feeds.skynews.com/feeds/rss/world.xml https://www.e-ir.info/feed/ https://www.thecipherbrief.com/feeds/feed.rss https://warontherocks.com/feed/


---

## User Settings (Category Preferences)

Let users define what types of news they care about — e.g., *sports*, *technology*, *music*.

### Settings API

```http
# Update user preferences
PUT /api/settings/categories

Request Body:
{
    "user_id": "user_123",
    "preferred_categories": ["sports", "technology", "music"]
}

Response:
{
    "status": "success",
    "message": "Preferences updated"
}

# Get user preferences
GET /api/settings/categories?user_id=user_123

Response:
{
    "user_id": "user_123",
    "preferred_categories": ["sports", "technology", "music"]
}

Filtering Logic

  • Only process and store articles that match the users categories
  • AI should classify articles into categories before deciding whether to continue processing
  • Defaults to processing all categories if no preferences are set

API Endpoints Overview

Article Updates

  • POST /api/updates/fetch-latest Fetch and process new articles from RSS feeds
  • GET /api/updates/status Get current ingestion status for all RSS feeds

Article Management

  • GET /api/articles/{article_id} Retrieve full article details
  • GET /api/articles/ Paginated, filterable list of articles
  • GET /api/articles/{id}/analysis Get AI-generated metadata (summary, entities, sentiment, etc.)

Semantic Search & Discovery

  • POST /api/search/semantic Perform embedding-based semantic search
  • GET /api/search/similar/{id} Find articles similar to the given one
  • POST /api/recommendations/ Generate personalized article recommendations

Analytics

  • GET /api/analytics/trends Get trending topics from recent articles
  • GET /api/analytics/sentiment Analyze sentiment distribution over time or source
  • GET /api/analytics/sources View performance/coverage of individual RSS sources

Key Technical Challenges

1. Incremental Updates

  • Avoid reprocessing the same articles
  • Track and sync timestamps per feed
  • Detect updated articles by comparing URL + content fingerprint

2. Duplicate Detection

  • Use URL match + fuzzy title match + content similarity
  • Handle near-duplicate and updated versions of the same story

3. Category Filtering

  • Classify articles during AI processing
  • Skip irrelevant categories based on user preferences
  • Ensure high accuracy in categorization

4. AI & Vector Sync

  • Ensure metadata and vector store are always in sync
  • Handle failures in vector DB gracefully
  • Implement cleanup of orphaned or outdated vectors

5. Performance

  • Index frequently filtered fields
  • Use batch processing and async operations
  • Optimize semantic search

Bonus Features (Optional)

  • Background job scheduler for automated fetches

Success Criteria

Functional Requirements

  • RSS articles processed
  • Incremental updates work without duplication
  • Semantic search returns relevant results
  • User preferences respected in every processing cycle

Technical Requirements

  • Clean, modular, testable codebase
  • Proper use of SQLAlchemy, Pydantic, and environment configs
  • Documented Readme
  • Unit test coverage for core logic
  • Alembic-based migrations for schema changes

📁 Deliverables

  • Working FastAPI backend with REST endpoints
  • SQLAlchemy ORM models with Alembic migrations
  • AI integration with Cohere and Groq
  • Vector similarity search with metadata filters
  • Smart RSS ingestion with category filtering
  • API documentation via OpenAPI
  • Clean README with setup and architecture overview (add architectural diagram)

Documentation Expectations

  • Describe high-level architecture and design decisions
  • Explain how article processing, filtering, and recommendations work
  • Document how category filtering is enforced
  • Include instructions for deployment, testing, and extending the system
S
Description
No description provided
Readme 32 KiB