Skip to content

codevector-2003/iwb25-192-sharks

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

31 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ—žοΈ InFact Platform - AI-Powered News Desensationalization Engine

Ballerina Python 3.11+ FastAPI MongoDB License: MIT

Transform sensationalized news into factual, neutral reporting through advanced AI and NLP techniques

A full-stack news processing and analysis platform that extracts facts from news articles, clusters similar content, and presents desensationalized information through a modern architecture. Tired of clickbait headlines and biased spins? InFact cuts through the noise to deliver just the factsβ€”because who has time for drama in their daily news?

Built with Ballerina for robust API gateway services and Python FastAPI for advanced AI processing pipelines.

πŸ”— Original InFact Implementation

Screenshot 2025-08-31 000610 Screenshot 2025-08-31 000646 Screenshot 2025-08-31 000659 Screenshot 2025-08-31 001255 Screenshot 2025-08-31 000727

πŸ” Overview

InFact is your ultimate shield against sensationalized news! This platform automatically pulls articles from RSS feeds and external APIs, processes them with cutting-edge NLP to separate facts from opinions, clusters similar stories, and generates neutral summaries using AI. Built by a talented team from Sri Lanka, it's perfect for journalists, researchers, or anyone who wants unbiased information without the hype.

πŸ—οΈ Architecture

This implementation features a dual-service architecture:

  • πŸ”„ Ballerina Gateway: High-performance API gateway handling news aggregation, routing, and client interactions
  • 🧠 Python Pipeline: Advanced AI/ML processing engine for clustering, fact extraction, and content generation
πŸ“ InFact Platform/
β”œβ”€β”€ πŸ”„ ballerina-gateway/          # Ballerina API Gateway
β”‚   β”œβ”€β”€ main.bal                   # Main service endpoints
β”‚   β”œβ”€β”€ modules/
β”‚   β”‚   β”œβ”€β”€ config/                # Database & API configuration
β”‚   β”‚   β”œβ”€β”€ types/                 # Data models & schemas
β”‚   β”‚   └── utils/                 # Business logic utilities
β”‚   └── Config.toml                # Environment configuration
β”‚
β”œβ”€β”€ 🧠 python-pipeline/            # AI Processing Engine
β”‚   β”œβ”€β”€ main.py                    # FastAPI application entry
β”‚   β”œβ”€β”€ core/                      # Configuration & database
β”‚   β”œβ”€β”€ schemas/                   # Pydantic data models
β”‚   β”œβ”€β”€ services/                  # API endpoints & business logic
β”‚   └── utils/                     # NLP & AI processing tools
β”‚
β”œβ”€β”€ πŸ“Š frontend/                   # React Frontend (Optional)
└── πŸ““ notebook/                   # Research & Development

✨ Key Features

🧠 AI-Powered Processing

  • Smart Article Clustering - Groups related news stories using semantic similarity
  • Fact vs Opinion Classification - Separates factual information from editorial content
  • Neutral Article Generation - Creates unbiased summaries using Google Gemini AI
  • Sentiment Analysis - Identifies and neutralizes emotional language
  • Duplicate Detection - Automatic duplicate article detection and filtering

πŸ” Advanced Analytics

  • Trending Topic Detection - Identifies emerging news patterns
  • Source Bias Analysis - Tracks how different outlets cover the same story
  • Real-time Statistics - Comprehensive metrics and insights
  • Similarity Scoring - ML-based content similarity detection
  • Weekly Digests - Automated news summaries

πŸ—οΈ Production Architecture

  • Ballerina Gateway - Enterprise-grade API gateway with robust routing
  • Async FastAPI Backend - High-performance Python processing with background tasks
  • MongoDB Integration - Scalable document storage with intelligent clustering
  • Modular Design - Clean separation of concerns with comprehensive error handling
  • RSS Feed Automation - Automated news ingestion from configurable sources

πŸ“Š Rich Data Management

  • URL Tracking - Maintains links to original sources
  • Image Processing - Automatic image selection for clusters
  • Multi-source Aggregation - Combines articles from multiple news outlets
  • Historical Analysis - Tracks news evolution over time
  • Search & Filtering - Advanced query capabilities

πŸš€ Quick Start

Prerequisites

1. Clone & Setup

# Clone the repository
git clone <repository-url>
cd infact-ballerina

2. Configure Services

Ballerina Gateway Configuration

cd ballerina-gateway

# Create Config.toml
cat > Config.toml << EOF
[ballerina_gateway.config]
mongoUri = "mongodb://localhost:27017"
databaseName = "newsstore"

[ballerina_gateway.utils]
newsApiKey = "your-news-api-key-here"
EOF

Ballerina Gateway Configuration

For detailed setup instructions, please refer to the ballerina-gateway/README.md.

API Testing with Postman

For easy API testing and exploration, import the Postman collection: InFact API Collection

Python Pipeline Configuration

For detailed setup instructions, please refer to the python-pipeline/README.md.

3. Launch Services

Start Python Processing Pipeline

cd python-pipeline
python main.py
# Available at: http://localhost:8091

Start Ballerina Gateway

cd ballerina-gateway
bal run
# Available at: http://localhost:9090

πŸ“‹ API Documentation

πŸ”„ Ballerina Gateway Endpoints

The Ballerina gateway provides enterprise-grade APIs for news management:

News Management

# Fetch articles from News API
curl -X POST "http://localhost:9090/news/fetchArticles" \
  -H "Content-Type: application/json" \
  -d '{"query": "technology", "pageSize": 20}'

# Get recent articles with pagination
curl "http://localhost:9090/news/articles?limit=20&skip=0"

# Extract from RSS feeds
curl -X POST "http://localhost:9090/news/rss-extract" \
  -H "Content-Type: application/json" \
  -d '{"from_date": "2025-08-22", "max_articles": 50}'

Processing & Clustering

# Process articles with AI clustering
curl -X POST "http://localhost:9090/news/process-with-storage" \
  -H "Content-Type: application/json" \
  -d '{"articles": [...], "n_clusters": 3}'

# Auto-processing pipeline
curl -X POST "http://localhost:9090/news/scrape-process-store?days_back=7"

Analytics & Search

# Get trending topics
curl "http://localhost:9090/news/trending-topics?days_back=30"

# Search clusters
curl -X POST "http://localhost:9090/news/search" \
  -H "Content-Type: application/json" \
  -d '{"query": "climate change", "limit": 10}'

# Weekly digest
curl "http://localhost:9090/news/weekly-digest"

🧠 Python Pipeline Endpoints

Advanced AI processing capabilities:

# Direct processing with storage
curl -X POST "http://localhost:8000/api/v1/process-with-storage" \
  -H "Content-Type: application/json" \
  -d '{"articles": [...], "n_clusters": 3}'

# Get cluster statistics
curl "http://localhost:8000/api/v1/clusters/stats"

# Automated scraping and processing
curl -X POST "http://localhost:8000/api/v1/scrape-process-store?days_back=7"

πŸ“– Full API Documentation:

  • Ballerina Gateway: http://localhost:9090/news (OpenAPI spec available)
  • Python Pipeline: http://localhost:8000/docs (Interactive Swagger UI)

--

πŸ› οΈ Tech Stack

πŸ”„ Ballerina Gateway

  • Framework: Ballerina 2201.8.0+ (Cloud-native programming language)
  • Database: MongoDB with connection pooling
  • External APIs: News API, RSS feeds integration
  • Features: RESTful APIs, async processing, robust error handling

🧠 Python Pipeline

  • Framework: FastAPI (Python 3.11+)
  • NLP & ML: spaCy, sentence-transformers, scikit-learn, gensim
  • AI: Google Generative AI (Gemini 2.0 Flash)
  • Data Processing: NumPy, pandas, PyTorch, NLTK
  • Database: MongoDB (via pymongo)
  • Features: Async processing, background tasks, ML pipelines

πŸ“Š Frontend

  • Framework: React 19.1.1 + Vite 7.1.2
  • Styling: Tailwind CSS 4.1.12
  • Features: Responsive design, real-time updates

πŸ”„ Processing Pipeline

graph TD
    A[πŸ“° RSS/API Sources] --> B[πŸ”„ Ballerina Gateway]
    B --> C[πŸ“ Article Extraction]
    C --> D[🧠 Python Pipeline]
    D --> E[πŸ”€ Text Preprocessing]
    E --> F[🧠 Semantic Embeddings]
    F --> G[🎯 Clustering Algorithm]
    G --> H[πŸ” Similarity Check]
    H --> I{πŸ“Š Similar Cluster?}
    I -->|Yes| J[πŸ”— Merge Clusters]
    I -->|No| K[✨ Create New Cluster]
    J --> L[πŸ“‹ Fact Extraction]
    K --> L
    L --> M[πŸ€– AI Generation]
    M --> N[πŸ’Ύ Store in MongoDB]
    N --> O[πŸ“ˆ Update Analytics]
    O --> P[πŸ”„ Return to Gateway]
Loading

Pipeline Steps

  1. πŸ“‘ Data Ingestion - Ballerina gateway fetches from RSS feeds and News API
  2. πŸ“ Text Preprocessing - Tokenization, lemmatization, noise removal
  3. 🧠 Embedding Generation - Semantic vectors using sentence-transformers
  4. 🎯 Smart Clustering - KMeans with TF-IDF enhancement
  5. πŸ” Similarity Analysis - Compare with existing clusters
  6. πŸ”— Intelligent Merging - Combine similar clusters or create new ones
  7. πŸ“‹ Fact Extraction - NER + sentiment analysis for classification
  8. πŸ”„ Deduplication - Remove redundant information
  9. πŸ€– AI Generation - Create neutral summaries with Gemini
  10. πŸ’Ύ Persistent Storage - MongoDB with indexing
  11. πŸ–ΌοΈ Media Processing - Image selection and URL tracking

πŸ§ͺ Testing

Ballerina Gateway Tests

cd ballerina-gateway
bal test

Python Pipeline Tests

cd python-pipeline
pytest
pytest --cov=. --cov-report=html

Integration Testing

# Test complete pipeline
curl -X POST "http://localhost:9090/news/scrape-process-store?max_articles=5"


### Performance Optimization
- **Ballerina**: Connection pooling, async processing
- **Python**: GPU acceleration, batch processing, caching
- **MongoDB**: Proper indexing, sharding for scale


Development Setup

# Fork and clone
git clone <your-fork-url>
cd infact-ballerina

# Setup both services
cd ballerina-gateway && bal build
cd ../python-pipeline && pip install -r requirements-dev.txt

# Run tests
cd ballerina-gateway && bal test
cd ../python-pipeline && pytest

Contribution Guidelines

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ‘₯ Contributors

This project was built by an awesome team from the University of Moratuwa, Sri Lanka:

  • πŸš€ Backend Architect: HimathX (Dhanapalage Himath Nimpura Dhanapala) – Ballerina gateway & MongoDB integration
  • 🎨 Frontend Wizard: codevector-2003 (Haren Daishika) – React interface & user experience
  • 🧠 AI/ML Engineer: LazySeaHorse (Raj Pankaja) – NLP pipeline & AI processing

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

Built with ❀️ (and a bit of caffeine) by the InFact Team. Stay factual, folks! πŸš€

About

Transform sensationalized news into factual, neutral reporting through advanced AI and NLP techniques

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 42.5%
  • Ballerina 29.7%
  • JavaScript 23.3%
  • CSS 4.4%
  • HTML 0.1%