name: reddit description: Scrape and summarize Reddit subreddits via the browser tool — extract posts, comments, and trends using old.reddit.com
Reddit Scraping
Scrape subreddits to extract posts, comments, and trends. Uses the browser skill (must be available and running). Always use old.reddit.com — it's text-oriented and much better for automated extraction.
Critical: Use evaluate, Not snapshot
Never use browser snapshot for Reddit post listings. The accessibility tree is far too verbose — old Reddit's nav, sidebar, and per-post UI chrome (upvote buttons, share links, flair badges) eat the entire token budget before reaching actual content. Even --selector "#siteTable" produces ~40 lines of tree nodes per post.
Always use browser evaluate with JS to extract structured data:
browser evaluate 'JSON.stringify([...document.querySelectorAll("#siteTable .thing")].map(el => ({ title: el.querySelector("a.title")?.textContent, score: el.querySelector(".score.unvoted")?.textContent, comments: el.querySelector(".comments")?.textContent, flair: el.querySelector(".linkflairlabel")?.textContent, url: el.querySelector("a.title")?.href, permalink: el.getAttribute("data-permalink"), time: el.querySelector("time")?.getAttribute("title"), author: el.querySelector(".author")?.textContent })))'
The permalink field (from data-permalink) is a reliable fallback — it always contains the Reddit thread path (e.g., /r/ClaudeAI/comments/abc123/post_title/). Prepend https://reddit.com to it for output links.
This returns clean JSON in ~2-3KB instead of a 20KB+ truncated accessibility tree.
Workflow
1. Start the browser
browser status
browser start
If startup fails with a timeout, kill stale Chrome processes first:
pkill -f "remote-debugging-port"
browser start
2. Open a subreddit
browser open https://old.reddit.com/r/localllama
For different sort orders:
browser open https://old.reddit.com/r/localllama/new
browser open https://old.reddit.com/r/localllama/top?t=day
browser open https://old.reddit.com/r/localllama/top?t=week
3. Extract post listing
Use the evaluate JS snippet above. Each post returns:
title— post titlescore— upvote count ("•"means too new for a visible score)comments— e.g."89 comments"flair— post flair label if anyurl— link to post or external URLtime— submission timestampauthor— poster username
4. Filter posts by keyword (optional)
Don't use Reddit's restrict_sr=on URL param — it's unreliable on old Reddit and returns results from all of Reddit. Filter client-side instead:
browser evaluate 'JSON.stringify([...document.querySelectorAll("#siteTable .thing")].filter(el => el.querySelector("a.title")?.textContent?.toLowerCase().includes("keyword")).map(el => ({ title: el.querySelector("a.title")?.textContent, score: el.querySelector(".score.unvoted")?.textContent, comments: el.querySelector(".comments")?.textContent })))'
5. Pagination
Old Reddit shows 25 posts per page. To get the next page URL:
browser evaluate 'document.querySelector(".next-button a")?.href'
Then browser navigate to that URL and extract again.
6. Read individual threads
For detailed comment reading, snapshots work better than for listings — but still scope them:
browser navigate <thread-url>
browser snapshot --selector ".commentarea" --efficient
Or use JS for structured top-level comments:
browser evaluate 'JSON.stringify([...document.querySelectorAll(".commentarea > .sitetable > .comment")].slice(0, 10).map(el => ({ author: el.querySelector(".author")?.textContent, score: el.querySelector(".score.unvoted")?.textContent, body: el.querySelector(".md")?.textContent?.substring(0, 500) })))'
7. Clean up
browser close # close current tab when done with a subreddit
Daily Digest Workflow
1. browser start (kill stale chrome first if needed)
2. For each configured subreddit, scrape TWO pages:
a. browser open https://old.reddit.com/r/<sub>/top?t=day
- Extract posts — these are the highest-signal threads from the last 24h
b. browser open https://old.reddit.com/r/<sub>/new
- Extract posts — these catch long-tail/low-upvote threads that the algorithm buries
- Important: many relevant niche threads never reach "hot" or "top"
c. Deduplicate across both pages (by URL)
d. If topics are configured, filter posts: keep any post whose title contains
a topic keyword (case-insensitive substring match). If no topics configured,
keep all posts.
e. For high-signal posts (high score, relevant keywords), click through and extract top comments
3. Compile digest:
- Group by subreddit
- Rank by score/relevance
- Summarize key threads
- Flag posts worth commenting on or responding to
4. browser close tabs when done
Output Links
Every post in output MUST have a clickable link. No exceptions. If the extracted url field is missing or empty, construct a permalink from https://reddit.com + the post's data-permalink attribute.
Always use reddit.com (not old.reddit.com) in any links you include in output (emails, Telegram messages, digests). Old Reddit is unreadable on mobile. Scraping uses old.reddit.com internally, but strip the old. prefix before including URLs in any user-facing output.
Post Scoring Heuristic
- Score shown as
"•"= too new to have a visible score - 100+ upvotes in < 24h = notable
- High comment:score ratio = engagement/controversy
- Flair helps categorize: "Bug", "News", "Official" are usually higher signal than "Humor", "Praise"