extract-list - SKILL.md Agent Skill

name: extract-list description: Extract a list of items from a news article listicle or Reddit recommendation post and output a styled Excel spreadsheet. Use when the user provides a URL to an article listicle (e.g. "10 best places to eat") or a Reddit post with recommendations (books, music, places, etc.). argument-hint: [-o output.xlsx] allowed-tools: Bash(uv run ), Bash(rm _content.txt), Bash(rm *_analysis.json), Read, WebFetch

Extract List from Article/Reddit Skill

Extract a list of recommended items from a news article listicle or Reddit recommendation post and produce a styled Excel spreadsheet.

Usage

/extract-list https://www.reddit.com/r/newjersey/comments/.../best_pizza_in_nj/
/extract-list https://example.com/10-best-restaurants-in-hoboken -o restaurants.xlsx

Process

Follow these steps exactly:

1. Validate Input

Parse $ARGUMENTS to get the URL and optional -o output.xlsx. If -o is not provided, the output filename will be auto-derived from the page title.

2. Fetch the Page Content

Run the fetch script:

uv run .claude/skills/listicle-reddit-extractor/fetch_page.py "<url>"

This creates a <slug>_content.txt file with the extracted text.

If the fetch script fails (e.g. 403/blocking), fall back to using the WebFetch tool directly on the URL, then save the text output manually.

3. Read and Analyze

Read the content file using the Read tool. Analyze the text to determine:

Source type: Is this a Reddit post or a news/blog article?
List type: What kind of items are being recommended? (restaurants, books, albums, movies, places, products, etc.)
Columns: What metadata fields are appropriate? Common examples:
- Restaurants/Food → Name, Cuisine/Type, Location, Notes
- Books → Title, Author, Notes
- Albums/Music → Title, Artist, Notes
- Movies/TV → Title, Director/Creator, Year, Notes
- Places/Travel → Name, Location, Notes
- Products → Name, Brand, Price, Notes
Items: Extract every distinct recommendation. For Reddit posts, focus on items mentioned in the post body and top-level comments (and highly-upvoted replies). De-duplicate items that appear multiple times. If an item is mentioned by multiple commenters, note that in a "Mentions" or "Notes" column. Include items in rough order of popularity/prominence.

Key extraction rules:

For articles: Extract each numbered/listed item with its metadata.
For Reddit: Extract recommendations from comments. Prioritize items from higher-scored comments. Combine duplicate mentions. Capture any context the commenter provides (e.g. "their margherita is incredible" → Notes).
If metadata is unclear, give your best guess and append "(uncertain)".
Aim for completeness — extract ALL items mentioned, not just the top few.

4. Build Spreadsheet

Write the analysis as a JSON file named <slug>_analysis.json with this structure:

{
  "list_type": "restaurants",
  "columns": ["Name", "Cuisine", "Location", "Notes"],
  "items": [
    {"Name": "DeLorenzo's", "Cuisine": "Pizza", "Location": "Robbinsville, NJ", "Notes": "Multiple commenters recommend; tomato pie style"},
    {"Name": "Razza", "Cuisine": "Pizza", "Location": "Jersey City, NJ", "Notes": "Wood-fired, highly rated"}
  ],
  "notes": "Best pizza in NJ - extracted from Reddit r/newjersey thread",
  "source_url": "https://www.reddit.com/r/...",
  "source_type": "reddit"
}

Then run the spreadsheet builder:

uv run .claude/skills/listicle-reddit-extractor/build_spreadsheet.py <slug>_analysis.json -o <output.xlsx>

5. Clean Up

Remove the temporary files:

rm <slug>_content.txt
rm <slug>_analysis.json

6. Report

Tell the user: the output file path, the source type (Reddit/article), the list type detected, and how many items were extracted.