WEEK 10 DAY 01 CONCEPT & ANALYSIS

MyNewsAnalyzer

Aggregating global news and performing on-device summarization and classification using Small Language Models (SLMs) on iOS.

1. Executive Summary

This document outlines the Day 1 analysis for the mynewsanalyzer iOS mobile application. The application's core objective is to aggregate the latest news based on user-defined topics, summarize the content using on-device Small Language Models (SLMs), and classify the news into user-defined groups (e.g., "For" and "Against").

Core Pillars of Analysis:

News Aggregation & Crawling: Scalable ingestion strategies for global real-time news.
On-device Summarization: Leveraging Apple Silicon for privacy-first AI summaries.
On-device Classification: Zero-shot dynamic grouping using SLMs.

2. News Aggregation & Crawling

Recommended News APIs (2026)

NewsAPI.org / GNews.io

Excellent, developer-friendly JSON REST APIs. Granular filtering by keywords, language, and country. MVP Recommendation: Ease of integration and free developer tiers.

NewsData.io / Webz.io

Better suited for deep historical archives (7+ years) or direct sentiment analysis from the API source.

Crawling Fallback (ScrapingBee)

For sources not covered by APIs, ScrapingBee handles headless browsing, proxies, and JS rendering to prevent blocking during targeted scraping.

Data Ingestion Strategy

The backend polls APIs based on user topics, normalizes JSON payloads into a standard SQL schema, and serves them to iOS via a custom REST/GraphQL API.

3. News Summarization via SLMs

Running summarization on-device guarantees user privacy, reduces recurring cloud costs, and enables offline functionality.

Framework: Apple MLX

MLX (Apple's array framework) is the industry standard for Generative AI on iOS, leveraging the unified memory of Apple Silicon (A/M-series chips).

Optimal Models

Llama 3.2 (8B Instruct): Meta's SLM via MLX Swift.
Qwen2.5 (7B): Strong multilingual support.
Gemma 2 (9B): Highly optimized for local execution.

Summarization Strategy

Use mlx-swift to load 4-bit quantized models. Raw article text is fed into the local model with the prompt: "Summarize the following news article in 3 bullet points."

4. News Classification

Approach 1: Zero-Shot Prompting

Reuses the generative SLM (Llama 3.2 / Qwen2.5) for dynamic categories.

"Given the summary: [Text], classify this article into: [Cat A], [Cat B]. Answer with only the category name."

Approach 2: Core ML Fallback

Train a lightweight MLTextClassifier on user-labeled datasets for battery-efficient, high-speed execution.

5. System Consistency & Constraints

Strict Cross-Component Impact Analysis:

SQL Changes: Trigger review of Backend functions and iOS models.

API Spec Changes: Trigger updates to iOS networking and Backend controllers.

Any change in one component necessitates a holistic system analysis and continuous compilation checks.

Next Steps: Day 2

Day 2 will focus on Data Design, mapping out the SQL structures and object models needed to support this architecture.

WEEK 10 DAY 02 DATA DESIGN

Data Architecture

Designing a robust schema for real-time news aggregation, offline mobile storage, and SLM-driven analysis.

1. Overview

This document details the data architecture for the mynewsanalyzer application. The design must support real-time news aggregation from various sources, efficient storage for offline access on the iOS device, and structured data formats suitable for on-device Small Language Models (SLMs) to perform summarization and classification.

The architecture assumes a backend service (e.g., Node.js/PostgreSQL) for heavy lifting and a local database on the iOS app (e.g., SQLite/SwiftData) for caching and offline AI processing.

2. Core Entities & Relationships

2.1 User Management

Entity: User

id (UUID, PK)
email (String, Unique)
created_at (Timestamp)

2.2 Preferences

Topic

Tracked subjects (e.g., "AI"). Contains keywords array for API queries.

Source

News outlets. Type: API, RSS, or CRAWL.

2.3 Content & Analysis Storage

NewsArticle

The raw news item fetched from sources.

external_id: "prevent duplicates"
content_raw: "full text or snippet"
published_at: "source date"

ArticleAnalysis

SLM output (separated to allow re-analysis).

summary: "3-bullet point MLX output"
confidence_score: "0.0 to 1.0"
model_version: "e.g., llama-3.2-8b"

3. Database Schema (SQL)

PostgreSQL / Backend Schema

CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE topics (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID REFERENCES users(id) ON DELETE CASCADE,
    name VARCHAR(255) NOT NULL,
    keywords TEXT[]
);

CREATE TABLE sources (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    url VARCHAR(255),
    type VARCHAR(50) NOT NULL -- 'API', 'RSS', 'CRAWL'
);

CREATE TABLE categories (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    topic_id UUID REFERENCES topics(id) ON DELETE CASCADE,
    name VARCHAR(255) NOT NULL,
    description TEXT
);

CREATE TABLE news_articles (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    topic_id UUID REFERENCES topics(id) ON DELETE CASCADE,
    source_id UUID REFERENCES sources(id) ON DELETE SET NULL,
    external_id VARCHAR(255),
    title VARCHAR(512) NOT NULL,
    url TEXT NOT NULL,
    content_raw TEXT,
    published_at TIMESTAMP WITH TIME ZONE,
    fetched_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(source_id, external_id) -- Prevent duplicate ingestion
);

CREATE TABLE article_analysis (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    article_id UUID REFERENCES news_articles(id) ON DELETE CASCADE,
    summary TEXT,
    category_id UUID REFERENCES categories(id) ON DELETE SET NULL,
    confidence_score REAL,
    analyzed_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    model_version VARCHAR(100)
);

4. Local Data Handling (iOS)

Backend

Master Data

iOS Local

SQLite / SwiftData Cache

1. Sync

Request new NewsArticle records for user's topics.

2. On-Device AI

MLX loads model, processes content, generates summary.

3. Save & Sync Back

Results saved in ArticleAnalysis and optional cloud sync.

5. System Impact Constraints

Schema Changes

Any ALTER TABLE command MUST be accompanied by updates to Backend models and iOS data classes (e.g., Codable/JSON Serializable).

API Payloads

Changes to column visibility in REST APIs must be immediately reflected in iOS parsing logic to prevent null reference errors.

Next Steps: Day 3

Day 3 will focus on Technical Architecture & Information Flow, detailing the sequence of operations from news crawling to on-device analysis.

WEEK 10 DAY 03 USER FLOW & UI

User Flow & UI Design

Translating architectural decisions and data structures into a seamless, modern iOS experience powered by on-device AI.

1. Overview

This document outlines the User Flow and User Interface (UI) design for the mynewsanalyzer iOS application. The primary goal of the UI is to make the complex process of aggregating, summarizing, and classifying news using on-device AI feel seamless and intuitive to the user.

2. Process Flow (Swimlane)

This diagram illustrates the flow of data and actions across the different components of the system, highlighting the offline-first, on-device AI approach.

MyNewsAnalyzer User Flow Swimlane Diagram

3. UI Design & Screen Specifications

The application will follow Apple's Human Interface Guidelines (HIG) or Material Design 3 principles for a native, clean, and modern look.

3.1 Native Authentication (Firebase UI)

Leveraging the native Firebase Auth identity platform.

Implementation: Instead of a custom UI, the app utilizes FirebaseUI-iOS or the native FirebaseAuth SDK.
Standard Providers: Support for Email/Password, Google Sign-In, and Apple Sign-In.
Security: All encryption keys for the local SLM are tied to the uid provided by Firebase after a successful login.
Native Experience: Optimized for iOS keychain and password autofill integration.

Firebase Native UI

Default Auth Screen

3.2 Article Detail View

When a user taps an Article Card from the Dashboard.

Top Bar: Back button, Share button, "Read Full Article" action.
Header Image: Fetched from the news API.
Metadata: Title, Source, Date.
Classification Banner: A clear indicator of how the AI classified this article (e.g., "Classified as: Against with 85% confidence").
AI Summary Section: A distinct card highlighting the full 3-bullet point summary generated by the on-device model.
Raw Content Snippet: A brief extract of the original text.

3.3 Topic & Category Management

Where the user defines what to search for and how to classify it.

List View: Shows existing Topics.
Add New Topic:
- Input 1: Topic Name.
- Input 2: Search Keywords (Used by backend API polling).
Define Categories: The core feature. The user defines the perspectives they want the AI to look for (e.g., "Pro-Remote", "Anti-Remote").
(Advanced) Optional description field for each category to provide better context to the SLM prompt.
Save Button: Syncs configuration to the backend.

3.4 Source Management

Allows the user to blacklist certain sources or prioritize specific URLs.

Source Toggles: Enable or disable popular APIs vs Custom RSS/Crawl links.
Blacklist: Add domains to prevent unreliable news from entering the feed.

3.5 Settings & AI Model

Crucial for an on-device AI application.

Account Settings: Email, logout.
Processing Queue: Shows pending local summarization count and status.
AI Model Section: Displays the currently downloaded SLM (e.g., Llama 3.2 8B Quantized) and Storage Usage.
Action: Redownload Model or Update Model.
Battery Management: Visual warning when processing is paused to save battery.

4. UI/UX Considerations for On-Device AI

Battery & Thermal

The app should only run the MLX inference queue when the device has sufficient battery or is plugged in. The UI must reflect this (e.g., "Paused processing to save battery").

Progress Indicators

Local inference takes time. The Dashboard should show placeholder loading states or "Analyzing..." badges on new articles until the SLM finishes.

Impact Alignment

Changes to UI fields for "Categories" must immediately trigger an impact check to ensure Prompt construction logic and the Local DB schema are updated.

Next Steps: Day 4

Day 4 will focus on System Architecture & Source Analysis, defining the hybrid Cloud-to-Edge model.

WEEK 10 DAY 04 ARCHITECTURE

System & Application Architecture

Defining the hybrid Cloud-to-Edge model for scalable ingestion and localized, privacy-first AI analysis.

1. Overview

This document defines the high-level system architecture for mynewsanalyzer. The architecture relies on a hybrid Cloud-to-Edge model: a robust cloud backend handles the complex, high-volume task of global news ingestion, while the iOS edge device handles privacy-preserving, localized AI summarization and classification.

2. Component Architecture (Three Tiers)

Tier 1: Cloud Backend

Node.js/Python, PostgreSQL, Redis

Ingestion Engine: A scheduled cron service responsible for fetching news based on active user topics.
Database: Stores users, topics, configuration, and the normalized raw news articles.
API Gateway: Provides secure REST/GraphQL endpoints for the iOS app to sync configurations and download new raw articles.

Tier 2: The Edge (iOS)

Swift, SwiftUI, SQLite, Apple MLX

Sync Manager: Background task handler fetching new articles.
Local Data Store: Caches raw articles and final AI-generated outputs.
On-Device AI Engine (MLX): The core intelligence module. Loads quantized SLMs into Apple Silicon unified memory.
UI/UX Layer: Presents categorized data.

Tier 3: The News Web

The external data sources. The Ingestion Engine utilizes a tiered approach to gather news from structured APIs, direct RSS feeds, and targeted web scraping.

3. Data Ingestion Techniques (Tier 3)

To ensure comprehensive global coverage, the backend will implement a multi-layered ingestion strategy.

Method A: Primary Structured APIs (The "Broad Net")

Most reliable technique for gathering volume across thousands of publishers.

1. GNews API

Primary for general keyword topics.

/api/v4/search?q={KEYWORD}&apikey={KEY}

2. NewsAPI.org

Secondary for scale and volume.

/v2/everything?q={KEYWORD}&apiKey={KEY}

3. NewsData.io

Specialized for multi-lingual and historical context.

/api/1/news?apikey={KEY}&q={KEYWORD}

Method B: RSS Feeds (The "Direct Line")

Used for exact, trusted publishers, bypassing API algorithms via XML parsers.

BBC News: feeds.bbci.co.uk/news/rss.xml
NYT: rss.nytimes.com/services/xml/rss/nyt/HomePage.xml
Al Jazeera: aljazeera.com/xml/rss/all.xml

Method C: Web Scraping (The "Targeted Extraction")

Fallback when APIs only provide a snippet, requiring full body text for the SLM.

Diffbot (AI Extraction)

/v3/article?token={TOKEN}&url={URL}&mode=llm

Crucial Feature: Appending mode=llm forces Diffbot to return extracted text as clean Markdown—the perfect format for local iOS Llama/Qwen models.

Flow & Security

Ingestion: Cloud Backend polls sources -> Normalizes JSON/XML to SQL.
Sync: iOS App authenticates via JWT -> Downloads payload over HTTPS.
Inference: iOS App -> Loads MLX Model -> Passes raw text -> Saves locally.
Privacy Guarantee: Classifications NEVER leave the iOS device unless explicitly synced.

Architectural Constraints

Pros:

Resilient to UI changes, understands context.

Cons:

Relatively slow, potential for AI hallucinations.

WEEK 10 DAY 05 SOURCE STRATEGY

Target Source Matrix

Defining a strategically diverse list of publishers to ensure our AI models can classify perspectives across political, geographic, and domain spectrums.

1. Diversity Strategy

To ensure MyNewsAnalyzer provides a truly diverse and balanced view on any given topic, we must carefully select scraping targets that span across different political spectrums, geographic regions, and editorial focuses.

By feeding articles from these contrasting sources into our on-device SLM (Small Language Model), the application will be able to accurately classify perspectives (e.g., "For," "Against," "Neutral") and highlight how different publishers frame the exact same event.

2. Source Extraction Matrix

The "Neutral" Wire Services

Focus heavily on raw reporting with minimal editorializing. They serve as the factual baseline.

Reuters reuters.com
Associated Press (AP) apnews.com
Agence France-Presse afp.com

US Mainstream (Left/Center-Left)

Highlight social issues, progressive policies, and internationalist perspectives.

The New York Times nytimes.com
The Washington Post washingtonpost.com
NPR npr.org

US Mainstream (Right/Center-Right)

Focus on free-market economics, conservative policies, and traditionalist perspectives.

The Wall Street Journal wsj.com
Fox News foxnews.com
National Review nationalreview.com

Global & Non-Western

Prevent US/Euro-centric bias by including Global South and Asian perspectives.

Al Jazeera aljazeera.com
South China Morning Post scmp.com
The Hindu thehindu.com

Specialized: Tech, Business & Science

Crucial for deep tracking of specific topics like "AI Regulation" or "Climate Tech".

Bloomberg bloomberg.com
TechCrunch techcrunch.com

Wired wired.com
Nature nature.com/news

3. Implementation Strategy for Targets

1

RSS First (The Easy Wins)

Many targets (BBC, NYT, Al Jazeera) have excellent public RSS feeds. We will poll their XML endpoints first to securely and cheaply gather article URLs, publication dates, and basic metadata.

2

Diffbot Extraction (The Heavy Lifting)

Because sites like WSJ, NYT, and Bloomberg have strict paywalls or complex dynamic layouts, standard scraping will fail. We will pass the URLs gathered from RSS into Diffbot's Article API with mode=llm. Diffbot's computer vision excels at penetrating cookie-walls and extracting the raw article text into clean Markdown.

3

NewsAPI / GNews (The Long Tail)

We will use these managed APIs as a fallback to catch articles from the thousands of smaller regional publishers (e.g., local city newspapers) that aren't on our primary explicit target list.

4. On-Device AI Strategy

Once the diverse articles are extracted and synced to the iOS device's local database, they must be processed to provide value to the user. This is handled by the on-device Small Language Model (SLM).

4.1 Objective

Summarize: Compress lengthy, disparate articles into a concise, 3-bullet point format for quick reading.
Categorize: Analyze the article's perspective and assign it to one of the user-defined categories (e.g., "For," "Against," "Neutral") for a given topic.

4.2 Model Selection & Framework

Framework: Apple's MLX (specifically mlx-swift) will load and run the model natively on the device's Apple Silicon (GPU/Neural Engine) to ensure absolute privacy and offline capability.
Model: We will utilize a 4-bit quantized version of Llama 3.2 8B Instruct or Qwen2.5 7B Instruct. These offer the best balance of local performance, memory footprint, and logical reasoning for zero-shot classification.

4.3 Master Prompt Engineering

To perform both summarization and categorization in a single inference pass, we will employ a structured, JSON-enforced zero-shot prompt. The iOS application will dynamically inject the Topic, the user-defined Categories, and the Article Text.

Prompt Template

You are an expert, highly objective news analyst. Your task is to analyze the following news article regarding the topic: "{{TOPIC}}".

First, provide a highly concise summary of the article in exactly 3 bullet points.
Second, analyze the author's primary perspective, bias, or framing and classify the article into strictly ONE of the following categories: {{CATEGORIES}}.

You must respond ONLY with a valid JSON object in the following format, with no markdown formatting or extra text:
{
  "summary": [
    "Bullet point 1",
    "Bullet point 2",
    "Bullet point 3"
  ],
  "classification": "Selected Category Name",
  "confidence_reasoning": "A one sentence explanation of why this category was chosen."
}

--- ARTICLE TEXT ---
{{ARTICLE_TEXT}}
--- END ARTICLE TEXT ---

4.4 Processing Workflow

1

Queue

The iOS app identifies unanalyzed NewsArticle records in the local SQLite DB.

2

Inference

Constructs the prompt and sends it to the MLX model with a low temperature (temperature=0.1) for deterministic JSON formatting.

3

Parsing

Decodes the JSON response. If successful, creates an ArticleAnalysis record. If it fails (hallucination), it is flagged for retry.

4

UI Update

State management automatically moves the article into its classified tab (e.g., "Against") and displays the 3-bullet summary.

Next Steps: Day 6

Day 6 will focus on Backend Node.js Implementation, writing the actual ingestion scripts for the APIs and Diffbot.

WEEK 10 DAY 06 BACKEND & INFRA

Backend Implementation

Scaling the ingestion engine using Node.js and resolving infrastructure deployment constraints.

Infrastructure Debugging: GCP Deployment

During the deployment of the Cloud Functions for the ingestion engine, we encountered a 404 Service Account Not Found error. Below is the resolution strategy.

ERROR LOG:

ERROR: (gcloud.functions.deploy) ResponseError: status=[404], code=[], message=[Service account ... was not found.]

Root Cause & Fix:

Service Account Existence: The deployment attempted to use function-sa@mynewsanalyzer-202603142100 which had not been created in the new environment.

Resolution Command:

gcloud iam service-accounts create function-sa \
    --display-name="Cloud Function Service Account" \
    --project=mynewsanalyzer-202603142100

Role Assignment: Assign roles/cloudfunctions.admin and roles/iam.serviceAccountUser to the deployment principal to ensure the resource can be described and deployed.

News Ingestion Engine

The backend implementation utilizes Express.js with Sequelize ORM to manage the multi-threaded ingestion of news from GNews, NewsAPI, and Diffbot.

Core Function: fetchNews()

async function fetchAndNormalize(topic) {
  const gnewsUrl = `https://gnews.io/api/v4/search?q=${topic.keywords}&apikey=${process.env.GNEWS_KEY}`;
  const response = await axios.get(gnewsUrl);
  
  return response.data.articles.map(article => ({
    external_id: article.url, // Unique constraint
    title: article.title,
    content_raw: article.content,
    published_at: new Date(article.publishedAt),
    topic_id: topic.id
  }));
}

Next Steps: Day 7

Day 7 will showcase the Final Application Review, demonstrating the on-device inference latency and classification accuracy.

WEEK 10 DAY 07 FINAL PROTOTYPE & SOURCE

Final Prototype & Source Code

The complete codebase and configuration guide for the MyNewsAnalyzer on-device AI platform.

GitLab Repository

Complete Source Code

The MyNewsAnalyzer repository contains the full Flutter application code, the Node.js ingestion engine, and the deployment scripts for the GCP infrastructure.

View on GitLab

Firebase Configuration & Required Files

To protect sensitive keys and project identifiers, certain configuration files are excluded from the repository. Follow these steps to set them up:

firebase_options.dart

Why it's needed: Contains the cross-platform configuration for Android, iOS, and Web. It acts as the bridge between your Flutter code and the Firebase project.

How to setup

Run flutterfire configure in the project root to auto-generate this file.

GoogleService-Info.plist

Why it's needed: iOS-specific configuration required for the Firebase SDK to initialize on Apple devices. Without this, the app will crash on startup.

How to download

Download from Firebase Console -> Project Settings -> Your Apps (iOS Section).

firebase.json

Why it's needed: Directs the Firebase CLI on how to deploy functions, hosting, and security rules. It defines the project's cloud behavior.

How to setup

Generated when you run firebase init. Ensure it points to the correct functions source.