name: cmd-rss-feed-review description: Review RSS feed generators and their XML output for broken selectors, missing error handling, stale cache logic, feed link conventions, empty/malformed feeds, and duplicate entries. Use when asked to "review feed", "check feed quality", "audit feeds", or after creating/modifying a feed generator. disable-model-invocation: true
RSS Feed Review
Review RSS feed generators and their output XML for correctness, robustness, and adherence to project conventions.
Instructions
- Determine scope — review all feed generators by default, or a specific one if the user specifies.
- Read the target generator(s) and their corresponding
feeds/feed_*.xmloutput files. - Read
feed_generators/utils.pyto understand shared helpers. - Evaluate against the checklists below. For every finding, cite
file_path:line_number. - If everything looks good, say so briefly.
Generator Code Review
Selectors & Parsing
- Are CSS selectors specific enough to survive minor site redesigns?
- Are selectors targeting semantic elements (article, h2) over generated class names?
- Is there fallback logic if a selector returns no results?
Error Handling
- Does
fetch_*usetimeout=on requests? - Are HTTP errors handled (
response.raise_for_status()or status check)? - Are Selenium waits using explicit waits (
WebDriverWait) rather thantime.sleep()? - Is the Selenium driver properly closed in a
finallyblock?
Feed Link Setup
Critical convention (from AGENTS.md):
from utils import setup_feed_links
setup_feed_links(fg, blog_url="https://...", feed_name="...")
- The main
<link>must point to the original blog URL, NOT the feed URL rel="self"must be set first,rel="alternate"must be set last- Generators should use the
setup_feed_links()helper fromutils.py - Flag any generator that sets links manually instead of using the helper
Cache Logic (Pagination & Selenium patterns only)
- Is cache loaded before fetching new articles?
- Are articles deduped by URL before saving?
- Is the cache sorted by date descending?
- Does the
--fullflag correctly bypass incremental logic?
Pattern Compliance
- Simple Static: No cache needed, fetches all posts each run
- Pagination + Caching: URL-based pagination with JSON cache in
cache/ - Selenium + Click: Uses
undetected-chromedriver, clicks load-more buttons, caches results - Is the generator using the right pattern for how the target site loads content?
Feed XML Output Review
Structure
- Does the feed have a
<title>,<link>, and<description>in<channel>? - Does every
<item>have at least<title>,<link>, and<pubDate>? - Is there an
<atom:link rel="self">pointing to the feed URL? - Does the main
<link>point to the blog (not the feed file)?
Content Quality
- Are there 0 items? (EMPTY — likely broken scraper)
- Is the newest item older than 60 days? (STALE — selectors may have broken)
- Are there duplicate
<link>values across items? - Are dates parseable as RFC 2822 (
pubDateformat)? - Are titles non-empty and non-duplicated?
Encoding
- Is the XML declaration present with
encoding="utf-8"? - Are special characters properly escaped in titles and descriptions?
Output Format
For each finding:
[SEVERITY] file_path:line_number — description
Severities: ERROR (broken/will fail), WARN (fragile/convention violation), INFO (suggestion)
End with a summary table:
| Feed | Generator | XML Output | Issues |
|---|---|---|---|
| name | OK/WARN/ERROR | OK/WARN/ERROR | brief note |