content-monitor

star 0

Use when building or operating multi-source intelligence pipelines across video, GitHub, Reddit, curated lists, search, and RSS feeds. Covers source adapters, discovery/transcription/summarization/evaluation phases, deduplication, schedules, model-stage choice, and actionable brief generation. Do NOT use for SEO keyword research (use `keywords`) or competitive product analysis (use `user-research`).

jacob-balslev By jacob-balslev schedule Updated 6/10/2026

name: content-monitor description: "Use when building or operating multi-source intelligence pipelines across video, GitHub, Reddit, curated lists, search, and RSS feeds. Covers source adapters, discovery/transcription/summarization/evaluation phases, deduplication, schedules, model-stage choice, and actionable brief generation. Do NOT use for SEO keyword research (use keywords) or competitive product analysis (use user-research)." license: MIT compatibility: "Markdown, Git, agent-skill runtimes" allowed-tools: Read Grep Bash metadata: relations: "{"related":["seo-strategy","keywords","positioning"],"verify_with":["evaluation"]}" subject: agent-ops scope: "Building and operating multi-source intelligence pipelines across video, GitHub, Reddit, curated lists, search, and RSS — source adapters, the discovery/transcription/summarization/evaluation phases, deduplication, schedules, per-stage model choice, and actionable brief generation. Portable across any content-monitoring pipeline; principle-grounded, not repo-bound. Excludes SEO keyword research (keywords) and competitive product analysis (user-research)." public: "true" taxonomy_domain: agent/ops stability: experimental keywords: "["content monitor","github trending","blog monitor","intelligence brief","youtube monitor","awesome list","trend tracking","content pipeline","content-monitor","rss feed"]" triggers: "["content-monitor-skill","research-mode"]" skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph" skill_graph_project: Skill Graph skill_graph_canonical_skill: skills/agent-ops/content-monitor/SKILL.md

Content Monitor

Concept of the skill

Building and operating multi-source intelligence pipelines across video, GitHub, Reddit, curated lists, search, and RSS — source adapters, the discovery/transcription/summarization/evaluation phases, deduplication, schedules, per-stage model choice, and actionable brief generation.

Domain Context

What is this skill? This skill provides expertise for a multi-source intelligence pipeline covering YouTube channels, GitHub trending/topic/search feeds, Reddit subreddits, awesome-lists, Google Search, and RSS/Atom feeds (including Hacker News, changelogs, blogs, arXiv, and podcasts). Covers source adapter patterns, unified evaluation pipeline, deduplication, scheduling cadence, and actionable intelligence brief generation. Use when adding new content sources to the pipeline, configuring monitoring schedules, understanding the discover/transcribe/summarize/evaluate phases, extending or debugging the discovery pipeline, or deciding which model to use at each pipeline stage. Do NOT use for SEO keyword research — use the keywords skill. Do NOT use for competitive product analysis — use user-research-synthesis.

Key Files

File Purpose
scripts/content-monitor/sources.json Multi-source configuration for the live monitor pipeline.
scripts/content-monitor/channels.json Legacy YouTube-channel configuration still referenced by the pipeline.
.content-monitor/seen-items.json Persistent deduplication state.
.content-monitor/backlog-evaluate-status.json Tracks historical scoring progress for batch backlog evaluation sweeps.
.content-monitor/resume-status.json Catch-up sweep progress used by resume flows.

Workflow

Use the ordered phases, checklists, and guardrails in the sections below as the canonical workflow for this skill. When multiple subsections describe steps, follow them in the order presented.

Coverage

The multi-source intelligence pipeline that tracks AI coding, e-commerce tooling, and agent infrastructure trends. Covers the live source groups configured in scripts/content-monitor/sources.json and scripts/content-monitor/channels.json, the 4-phase pipeline (Discover, Extract, Summarize, Evaluate), the MiniMax/GPT-5.4 model split, source adapter contract, deduplication via seen-items.json (at .content-monitor/), scheduling cadence per source type, evaluation rubric dimensions, and the daily digest output format. Source code lives at scripts/content-monitor/. State files live at .content-monitor/ (the Development repo root).

Philosophy of the skill

Staying current on AI tooling, agent patterns, and e-commerce integrations is essential for a system that competes on engineering velocity. Without this pipeline, the team would manually scan dozens of sources and miss emerging patterns that could inform skill creation, architecture decisions, or competitive positioning. The pipeline automates discovery and extraction (cheap) and reserves expensive model reasoning for evaluation (scarce), following the same constraint-awareness principle that governs the rest of the system.

Location Precision

State persistence is critical for this pipeline. Always refer to:

  • Source code and configuration: scripts/content-monitor/ (discover-*.js, evaluate.js, sources.json, run.js)
  • State and results: .content-monitor/ at the Development root (seen-items.json, digests, logs, reports)

This separation keeps persistent state separate from executable code, ensuring deduplication survives code updates.

1. When This Skill Applies

Use this skill for... Use something else for...
Adding a new YouTube channel, GitHub topic, Reddit subreddit, or search source SEO keyword research (use keywords skill)
Understanding why an extraction or summary is missing Competitive product teardowns (use user-research-synthesis)
Configuring evaluation context or scoring dimensions General web scraping patterns (use backend skill)
Diagnosing deduplication behaviour across source types Monitoring application uptime or error rates
Scheduling or adjusting cadence for a source type

1.5. Important: State vs. Source Code Locations

This is a critical distinction that prevents bugs:

  • Source code (scripts, adapters, runners): scripts/content-monitor/
  • State files (persistent data, results): .content-monitor/ at the Development repo root

The seen-items.json file used for deduplication MUST be read from and written to .content-monitor/seen-items.json, not scripts/content-monitor/. This is the single source of truth for what has already been processed.


2. Pipeline Architecture

The pipeline has 4 sequential phases. Each phase is independently runnable via --discover-only, --skip-evaluate, etc.

Phase Description Primary tool
Discover Source-specific adapters fetch new items (YouTube RSS, GitHub API, Reddit JSON) and write them to the state Adapter scripts — see section 3.5 registry
Extract Extract text content from each discovered item (YouTube: yt-dlp transcripts, GitHub: README extraction, Reddit: post body + comments) yt-dlp, GitHub REST API, Reddit API
Summarize MiniMax M2.5K produces structured JSON summaries from the extracted text OpenRouter free tier (MiniMax M2.5K)
Evaluate GPT-5.4 or Gemini-Flash scores summaries against the developer context and generates the actionable digest GPT-5.4 via Copilot or Gemini-Flash

Why this model split matters

MiniMax M2.5K is free (200 req/day via OpenRouter) and handles the volume of extraction-to-summary work without cost. GPT-5.4/Gemini is reserved for the evaluation step where quality is the bottleneck — scoring relevance and generating insights that the developer will act on.

Summary JSON schema (produced by Summarize phase)

Each summary contains:

keyFindings         — array of strings, concrete observations
toolsMentioned      — array of tool names
techniquesMentioned — array of technique names
actionableInsights  — array of strings, recommended actions
topicTags           — array of strings for deduplication grouping
oneLineSummary      — string, max 120 chars
estimatedRelevance  — integer 1-10, self-assessed by model

3. Source Types

Type Count Example Discovery Method Frequency Cost
YouTube Channel 19 IndyDevDan, Anthropic, Fireship RSS feed XML Daily Free
GitHub Trending/Topic 14 github.com/trending, topics/ai-agents HTML scrape + REST API Daily/Weekly Free
Reddit Subreddit 12 r/ClaudeCode, r/AI_Agents, r/vibecoding JSON API Daily Free
Awesome List 17 awesome-claude-code, awesome-mcp-servers README diff Weekly Free
Google Search 6 "multi-agent orchestration" Custom Search API Daily Free (100/day)
Hacker News 5 Claude Code, MCP, AI agents hnrss.org RSS Daily Free
Changelog 11 Claude Code releases, Vercel, Shopify GitHub Atom + vendor RSS On release Free
Blog/Newsletter 9 Simon Willison, Latent Space, DEV.to RSS/Atom Daily Free
arXiv Papers 3 cs.MA, cs.AI, cs.SE RSS Daily (weekdays) Free
Podcast 3 Latent Space, Practical AI, SE Daily RSS Weekly Free
Skills.sh 1 Global sitemap XML sitemap Weekly Free

3.5. Adapter Registry

Maps each source type (from section 3) to the adapter file that handles it. There is no 1-to-1 mapping — discover-rss.js consolidates five source types.

Source Type Adapter File Notes
YouTube Channel discover.js Primary — legacy YouTube-specific adapter using RSS feed (https://www.youtube.com/feeds/videos.xml?channel_id=...) plus yt-dlp transcript backfill
YouTube Channel discover-rss.js Also handles YouTube via the same RSS endpoint; prefer discover.js for YouTube-specific features
GitHub Trending/Topic discover-github.js Scrapes github.com/trending HTML and calls GitHub REST API for topic pages
Reddit Subreddit discover-reddit.js Calls https://www.reddit.com/r/<subreddit>.json (unauthenticated)
Awesome List discover-awesome.js Fetches README.md, diffs against cached version, extracts new entries
Google Search discover-google.js Calls Google Custom Search API (100 req/day free tier)
Hacker News discover-rss.js Uses hnrss.org RSS feeds
Changelog discover-rss.js GitHub Atom feeds (/releases.atom) + vendor RSS
Blog/Newsletter discover-rss.js Standard RSS/Atom feeds
arXiv Papers discover-rss.js arXiv RSS export by category (e.g. cs.MA, cs.AI)
Podcast discover-rss.js Standard podcast RSS feeds
Skills.sh discover-skillssh.js Fetches and diffs the XML sitemap

Key architecture note

discover-rss.js is the generic RSS handler. It handles Hacker News, changelogs, blogs, arXiv, podcasts, and can also ingest YouTube RSS — but it lacks YouTube-specific features (transcript extraction, channel metadata). When adding a new RSS-based source, route it through discover-rss.js. When adding a YouTube channel, use discover.js.

There is no discover-youtube.js. Do not create one — YouTube RSS is already handled by discover.js with transcript support.


4. State Files

All state files live in .content-monitor/ (the Development repo root directory), NOT in scripts/content-monitor/.

File Location Purpose
seen-items.json .content-monitor/seen-items.json Inventory of all processed items by source ID. Single source of truth for deduplication.
backlog-evaluate-status.json .content-monitor/backlog-evaluate-status.json Tracks historical scoring progress for batch backlog evaluation sweeps.
resume-status.json .content-monitor/resume-status.json Catch-up sweep progress — used when reconciling artifacts.
YYYY-MM-DD.md .content-monitor/digests/YYYY-MM-DD.md Daily output file — contains all scored items above the threshold.

5. Configuration

Config file: scripts/content-monitor/sources.json (multi-source) and channels.json (YouTube legacy).

Setting Type Description
maxAgeHours number Maximum age (hours) of items to process in each run
summarizeModel string Model for Summarize phase (default: minimax)
evaluateModel string Model for Evaluate phase (default: gpt-5.4)
maxTranscriptTokens number Hard cap on text length sent to summarizer
evaluationContext string Free-text description of the developer's focus — shapes all relevance scoring

6. Extending the Pipeline

Before creating a new adapter, check section 3.5 (Adapter Registry). If the new source is RSS/Atom-based (blogs, changelogs, podcasts, arXiv, Hacker News), route it through the existing discover-rss.js — no new file needed. Only create a new discover-<type>.js for source types with fundamentally different discovery mechanics (HTML scraping, vendor APIs, sitemaps).

To add a new source type that needs a new adapter:

  1. Create a new adapter file (e.g., discover-reddit.js) in the scripts/content-monitor/ directory implementing the standard adapter interface: fetchNew(config) → Array<{ id, url, title, rawText }>
  2. Register the new source in sources.json
  3. Implement text extraction in extract-text.js or transcribe.js
  4. Update section 3.5 of this skill to add the new adapter to the registry
  5. Test with --discover-only
  6. Run a full pass and verify output in seen-items.json

Adapter contract: The adapter must return a stable id per item. This is what seen-items.json keys on for deduplication.

Verification

After modifying the content monitor pipeline, verify:

  • New sources have a stable id field for deduplication (test with --discover-only)
  • seen-items.json correctly records discovered items without duplicates
  • Frequency matches the source's update cadence (daily for YouTube/Reddit/blogs, weekly for GitHub topics/awesome-lists)
  • Summarize phase produces all required JSON fields (keyFindings, toolsMentioned, actionableInsights, etc.)
  • Evaluate phase generates a scored digest entry in .content-monitor/digests/YYYY-MM-DD.md
  • No API rate limits are exceeded (GitHub: 60 req/hr unauthenticated, Google: 100/day)
  • Cost remains zero -- MiniMax for summarization, free APIs for discovery

Do NOT Use When

Instead of this skill Use Why
Doing SEO keyword research or clustering keywords Keywords skill owns keyword strategy; content-monitor is about trend discovery
Building competitive product teardowns user-research-synthesis Competitive analysis requires structured frameworks, not pipeline mechanics
Scraping arbitrary web pages for data backend General web scraping is a backend concern, not content monitoring
Monitoring application uptime or error rates error-tracking or agent-observability Those skills own runtime health; content-monitor tracks external intelligence
Analyzing content for SEO optimization seo-strategy SEO strategy owns page optimization; this skill discovers external trends

Skill Graph context

Classification

  • Subject: agent-ops
  • Public: true
  • Domain: agent/ops
  • Scope: Building and operating multi-source intelligence pipelines across video, GitHub, Reddit, curated lists, search, and RSS — source adapters, the discovery/transcription/summarization/evaluation phases, deduplication, schedules, per-stage model choice, and actionable brief generation. Portable across any content-monitoring pipeline; principle-grounded, not repo-bound. Excludes SEO keyword research (keywords) and competitive product analysis (user-research).

When to use

  • Triggers: content-monitor-skill, research-mode

Related skills

  • Verify with: evaluation
  • Related: seo-strategy, keywords, positioning

Keywords

  • content monitor, github trending, blog monitor, intelligence brief, youtube monitor, awesome list, trend tracking, content pipeline, content-monitor, rss feed
Install via CLI
npx skills add https://github.com/jacob-balslev/skill-graph --skill content-monitor
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator
jacob-balslev
jacob-balslev Explore all skills →