name: content-monitor
description: "Use when building or operating multi-source intelligence pipelines across video, GitHub, Reddit, curated lists, search, and RSS feeds. Covers source adapters, discovery/transcription/summarization/evaluation phases, deduplication, schedules, model-stage choice, and actionable brief generation. Do NOT use for SEO keyword research (use keywords) or competitive product analysis (use user-research)."
license: MIT
compatibility: "Markdown, Git, agent-skill runtimes"
allowed-tools: Read Grep Bash
metadata:
relations: "{"related":["seo-strategy","keywords","positioning"],"verify_with":["evaluation"]}"
subject: agent-ops
scope: "Building and operating multi-source intelligence pipelines across video, GitHub, Reddit, curated lists, search, and RSS — source adapters, the discovery/transcription/summarization/evaluation phases, deduplication, schedules, per-stage model choice, and actionable brief generation. Portable across any content-monitoring pipeline; principle-grounded, not repo-bound. Excludes SEO keyword research (keywords) and competitive product analysis (user-research)."
public: "true"
taxonomy_domain: agent/ops
stability: experimental
keywords: "["content monitor","github trending","blog monitor","intelligence brief","youtube monitor","awesome list","trend tracking","content pipeline","content-monitor","rss feed"]"
triggers: "["content-monitor-skill","research-mode"]"
skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
skill_graph_project: Skill Graph
skill_graph_canonical_skill: skills/agent-ops/content-monitor/SKILL.md
Content Monitor
Concept of the skill
Building and operating multi-source intelligence pipelines across video, GitHub, Reddit, curated lists, search, and RSS — source adapters, the discovery/transcription/summarization/evaluation phases, deduplication, schedules, per-stage model choice, and actionable brief generation.
Domain Context
What is this skill? This skill provides expertise for a multi-source intelligence pipeline covering YouTube channels, GitHub trending/topic/search feeds, Reddit subreddits, awesome-lists, Google Search, and RSS/Atom feeds (including Hacker News, changelogs, blogs, arXiv, and podcasts). Covers source adapter patterns, unified evaluation pipeline, deduplication, scheduling cadence, and actionable intelligence brief generation. Use when adding new content sources to the pipeline, configuring monitoring schedules, understanding the discover/transcribe/summarize/evaluate phases, extending or debugging the discovery pipeline, or deciding which model to use at each pipeline stage. Do NOT use for SEO keyword research — use the keywords skill. Do NOT use for competitive product analysis — use user-research-synthesis.
Key Files
| File | Purpose |
|---|---|
scripts/content-monitor/sources.json |
Multi-source configuration for the live monitor pipeline. |
scripts/content-monitor/channels.json |
Legacy YouTube-channel configuration still referenced by the pipeline. |
.content-monitor/seen-items.json |
Persistent deduplication state. |
.content-monitor/backlog-evaluate-status.json |
Tracks historical scoring progress for batch backlog evaluation sweeps. |
.content-monitor/resume-status.json |
Catch-up sweep progress used by resume flows. |
Workflow
Use the ordered phases, checklists, and guardrails in the sections below as the canonical workflow for this skill. When multiple subsections describe steps, follow them in the order presented.
Coverage
The multi-source intelligence pipeline that tracks AI coding, e-commerce tooling, and agent infrastructure trends. Covers the live source groups configured in scripts/content-monitor/sources.json and scripts/content-monitor/channels.json, the 4-phase pipeline (Discover, Extract, Summarize, Evaluate), the MiniMax/GPT-5.4 model split, source adapter contract, deduplication via seen-items.json (at .content-monitor/), scheduling cadence per source type, evaluation rubric dimensions, and the daily digest output format. Source code lives at scripts/content-monitor/. State files live at .content-monitor/ (the Development repo root).
Philosophy of the skill
Staying current on AI tooling, agent patterns, and e-commerce integrations is essential for a system that competes on engineering velocity. Without this pipeline, the team would manually scan dozens of sources and miss emerging patterns that could inform skill creation, architecture decisions, or competitive positioning. The pipeline automates discovery and extraction (cheap) and reserves expensive model reasoning for evaluation (scarce), following the same constraint-awareness principle that governs the rest of the system.
Location Precision
State persistence is critical for this pipeline. Always refer to:
- Source code and configuration:
scripts/content-monitor/(discover-*.js, evaluate.js, sources.json, run.js) - State and results:
.content-monitor/at the Development root (seen-items.json, digests, logs, reports)
This separation keeps persistent state separate from executable code, ensuring deduplication survives code updates.
1. When This Skill Applies
| Use this skill for... | Use something else for... |
|---|---|
| Adding a new YouTube channel, GitHub topic, Reddit subreddit, or search source | SEO keyword research (use keywords skill) |
| Understanding why an extraction or summary is missing | Competitive product teardowns (use user-research-synthesis) |
| Configuring evaluation context or scoring dimensions | General web scraping patterns (use backend skill) |
| Diagnosing deduplication behaviour across source types | Monitoring application uptime or error rates |
| Scheduling or adjusting cadence for a source type |
1.5. Important: State vs. Source Code Locations
This is a critical distinction that prevents bugs:
- Source code (scripts, adapters, runners):
scripts/content-monitor/ - State files (persistent data, results):
.content-monitor/at the Development repo root
The seen-items.json file used for deduplication MUST be read from and written to .content-monitor/seen-items.json, not scripts/content-monitor/. This is the single source of truth for what has already been processed.
2. Pipeline Architecture
The pipeline has 4 sequential phases. Each phase is independently runnable via --discover-only, --skip-evaluate, etc.
| Phase | Description | Primary tool |
|---|---|---|
| Discover | Source-specific adapters fetch new items (YouTube RSS, GitHub API, Reddit JSON) and write them to the state | Adapter scripts — see section 3.5 registry |
| Extract | Extract text content from each discovered item (YouTube: yt-dlp transcripts, GitHub: README extraction, Reddit: post body + comments) |
yt-dlp, GitHub REST API, Reddit API |
| Summarize | MiniMax M2.5K produces structured JSON summaries from the extracted text | OpenRouter free tier (MiniMax M2.5K) |
| Evaluate | GPT-5.4 or Gemini-Flash scores summaries against the developer context and generates the actionable digest | GPT-5.4 via Copilot or Gemini-Flash |
Why this model split matters
MiniMax M2.5K is free (200 req/day via OpenRouter) and handles the volume of extraction-to-summary work without cost. GPT-5.4/Gemini is reserved for the evaluation step where quality is the bottleneck — scoring relevance and generating insights that the developer will act on.
Summary JSON schema (produced by Summarize phase)
Each summary contains:
keyFindings — array of strings, concrete observations
toolsMentioned — array of tool names
techniquesMentioned — array of technique names
actionableInsights — array of strings, recommended actions
topicTags — array of strings for deduplication grouping
oneLineSummary — string, max 120 chars
estimatedRelevance — integer 1-10, self-assessed by model
3. Source Types
| Type | Count | Example | Discovery Method | Frequency | Cost |
|---|---|---|---|---|---|
| YouTube Channel | 19 | IndyDevDan, Anthropic, Fireship | RSS feed XML | Daily | Free |
| GitHub Trending/Topic | 14 | github.com/trending, topics/ai-agents | HTML scrape + REST API | Daily/Weekly | Free |
| Reddit Subreddit | 12 | r/ClaudeCode, r/AI_Agents, r/vibecoding | JSON API | Daily | Free |
| Awesome List | 17 | awesome-claude-code, awesome-mcp-servers | README diff | Weekly | Free |
| Google Search | 6 | "multi-agent orchestration" | Custom Search API | Daily | Free (100/day) |
| Hacker News | 5 | Claude Code, MCP, AI agents | hnrss.org RSS | Daily | Free |
| Changelog | 11 | Claude Code releases, Vercel, Shopify | GitHub Atom + vendor RSS | On release | Free |
| Blog/Newsletter | 9 | Simon Willison, Latent Space, DEV.to | RSS/Atom | Daily | Free |
| arXiv Papers | 3 | cs.MA, cs.AI, cs.SE | RSS | Daily (weekdays) | Free |
| Podcast | 3 | Latent Space, Practical AI, SE Daily | RSS | Weekly | Free |
| Skills.sh | 1 | Global sitemap | XML sitemap | Weekly | Free |
3.5. Adapter Registry
Maps each source type (from section 3) to the adapter file that handles it. There is no 1-to-1 mapping — discover-rss.js consolidates five source types.
| Source Type | Adapter File | Notes |
|---|---|---|
| YouTube Channel | discover.js |
Primary — legacy YouTube-specific adapter using RSS feed (https://www.youtube.com/feeds/videos.xml?channel_id=...) plus yt-dlp transcript backfill |
| YouTube Channel | discover-rss.js |
Also handles YouTube via the same RSS endpoint; prefer discover.js for YouTube-specific features |
| GitHub Trending/Topic | discover-github.js |
Scrapes github.com/trending HTML and calls GitHub REST API for topic pages |
| Reddit Subreddit | discover-reddit.js |
Calls https://www.reddit.com/r/<subreddit>.json (unauthenticated) |
| Awesome List | discover-awesome.js |
Fetches README.md, diffs against cached version, extracts new entries |
| Google Search | discover-google.js |
Calls Google Custom Search API (100 req/day free tier) |
| Hacker News | discover-rss.js |
Uses hnrss.org RSS feeds |
| Changelog | discover-rss.js |
GitHub Atom feeds (/releases.atom) + vendor RSS |
| Blog/Newsletter | discover-rss.js |
Standard RSS/Atom feeds |
| arXiv Papers | discover-rss.js |
arXiv RSS export by category (e.g. cs.MA, cs.AI) |
| Podcast | discover-rss.js |
Standard podcast RSS feeds |
| Skills.sh | discover-skillssh.js |
Fetches and diffs the XML sitemap |
Key architecture note
discover-rss.js is the generic RSS handler. It handles Hacker News, changelogs, blogs, arXiv, podcasts, and can also ingest YouTube RSS — but it lacks YouTube-specific features (transcript extraction, channel metadata). When adding a new RSS-based source, route it through discover-rss.js. When adding a YouTube channel, use discover.js.
There is no discover-youtube.js. Do not create one — YouTube RSS is already handled by discover.js with transcript support.
4. State Files
All state files live in .content-monitor/ (the Development repo root directory), NOT in scripts/content-monitor/.
| File | Location | Purpose |
|---|---|---|
seen-items.json |
.content-monitor/seen-items.json |
Inventory of all processed items by source ID. Single source of truth for deduplication. |
backlog-evaluate-status.json |
.content-monitor/backlog-evaluate-status.json |
Tracks historical scoring progress for batch backlog evaluation sweeps. |
resume-status.json |
.content-monitor/resume-status.json |
Catch-up sweep progress — used when reconciling artifacts. |
YYYY-MM-DD.md |
.content-monitor/digests/YYYY-MM-DD.md |
Daily output file — contains all scored items above the threshold. |
5. Configuration
Config file: scripts/content-monitor/sources.json (multi-source) and channels.json (YouTube legacy).
| Setting | Type | Description |
|---|---|---|
maxAgeHours |
number | Maximum age (hours) of items to process in each run |
summarizeModel |
string | Model for Summarize phase (default: minimax) |
evaluateModel |
string | Model for Evaluate phase (default: gpt-5.4) |
maxTranscriptTokens |
number | Hard cap on text length sent to summarizer |
evaluationContext |
string | Free-text description of the developer's focus — shapes all relevance scoring |
6. Extending the Pipeline
Before creating a new adapter, check section 3.5 (Adapter Registry). If the new source is RSS/Atom-based (blogs, changelogs, podcasts, arXiv, Hacker News), route it through the existing discover-rss.js — no new file needed. Only create a new discover-<type>.js for source types with fundamentally different discovery mechanics (HTML scraping, vendor APIs, sitemaps).
To add a new source type that needs a new adapter:
- Create a new adapter file (e.g.,
discover-reddit.js) in thescripts/content-monitor/directory implementing the standard adapter interface:fetchNew(config) → Array<{ id, url, title, rawText }> - Register the new source in
sources.json - Implement text extraction in
extract-text.jsortranscribe.js - Update section 3.5 of this skill to add the new adapter to the registry
- Test with
--discover-only - Run a full pass and verify output in
seen-items.json
Adapter contract: The adapter must return a stable id per item. This is what seen-items.json keys on for deduplication.
Verification
After modifying the content monitor pipeline, verify:
- New sources have a stable
idfield for deduplication (test with--discover-only) -
seen-items.jsoncorrectly records discovered items without duplicates - Frequency matches the source's update cadence (daily for YouTube/Reddit/blogs, weekly for GitHub topics/awesome-lists)
- Summarize phase produces all required JSON fields (
keyFindings,toolsMentioned,actionableInsights, etc.) - Evaluate phase generates a scored digest entry in
.content-monitor/digests/YYYY-MM-DD.md - No API rate limits are exceeded (GitHub: 60 req/hr unauthenticated, Google: 100/day)
- Cost remains zero -- MiniMax for summarization, free APIs for discovery
Do NOT Use When
| Instead of this skill | Use | Why |
|---|---|---|
| Doing SEO keyword research or clustering | keywords |
Keywords skill owns keyword strategy; content-monitor is about trend discovery |
| Building competitive product teardowns | user-research-synthesis |
Competitive analysis requires structured frameworks, not pipeline mechanics |
| Scraping arbitrary web pages for data | backend |
General web scraping is a backend concern, not content monitoring |
| Monitoring application uptime or error rates | error-tracking or agent-observability |
Those skills own runtime health; content-monitor tracks external intelligence |
| Analyzing content for SEO optimization | seo-strategy |
SEO strategy owns page optimization; this skill discovers external trends |
Skill Graph context
Classification
- Subject:
agent-ops - Public:
true - Domain:
agent/ops - Scope: Building and operating multi-source intelligence pipelines across video, GitHub, Reddit, curated lists, search, and RSS — source adapters, the discovery/transcription/summarization/evaluation phases, deduplication, schedules, per-stage model choice, and actionable brief generation. Portable across any content-monitoring pipeline; principle-grounded, not repo-bound. Excludes SEO keyword research (keywords) and competitive product analysis (user-research).
When to use
- Triggers:
content-monitor-skill,research-mode
Related skills
- Verify with:
evaluation - Related:
seo-strategy,keywords,positioning
Keywords
content monitor,github trending,blog monitor,intelligence brief,youtube monitor,awesome list,trend tracking,content pipeline,content-monitor,rss feed