name: blogwatcher-db description: Query the blogwatcher-cli SQLite database for RSS scan results. Use pre-verified column names and query templates to avoid errors. category: research verified: 2026-06-02
Blogwatcher Database Query Skill
Pre-verified query templates for the blogwatcher-cli SQLite database. DO NOT guess column names — use only the schema documented below.
Database Location
- Primary path:
/opt/data/.blogwatcher/blogwatcher.db - Legacy/previous path:
/home/exedev/.blogwatcher-cli/blogwatcher-cli.db - Type: SQLite3
- ⚠️ DB name is
blogwatcher.db, NOTblogwatcher-cli.db— usefind / -name 'blogwatcher.db'for discovery, not the CLI binary name.
DB Discovery Fallback
If the expected path doesn't exist, discover the actual location:
find / -path '*.blogwatcher*' -name '*.db' 2>/dev/null
The database may be stored under /opt/data/.blogwatcher/ rather than the user home directory, especially when blogwatcher-cli runs from /opt/data/bin/.
Verified Schema
Table: blogs
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
INTEGER | PRIMARY KEY | Blog ID |
name |
TEXT | NOT NULL | Blog name (e.g., "simonwillison.net", "r/LocalLLaMA") |
url |
TEXT | NOT NULL, UNIQUE | Blog homepage URL |
feed_url |
TEXT | nullable | RSS/Atom feed URL |
scrape_selector |
TEXT | nullable | CSS selector for HTML scraping fallback |
last_scanned |
TIMESTAMP | nullable | Last scan timestamp (ISO 8601) |
Table: articles
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
INTEGER | PRIMARY KEY | Article ID |
blog_id |
INTEGER | NOT NULL, FK→blogs.id | Source blog ID |
title |
TEXT | NOT NULL | Article title |
url |
TEXT | NOT NULL, UNIQUE | Article URL |
published_date |
TIMESTAMP | nullable | When article was published (ISO 8601 from RSS) |
discovered_date |
TIMESTAMP | DEFAULT CURRENT_TIMESTAMP | When blogwatcher discovered it |
is_read |
BOOLEAN | DEFAULT FALSE (0/1) | Read/unread status |
categories column does NOT exist in the articles table as of the current DB schema (verified 2026-05-04 via PRAGMA table_info). Any query that references a.categories will fail with no such column: a.categories. If categories were present in an older schema version, the current DB has dropped them. Always verify the actual columns with PRAGMA table_info(articles) before building queries.
Table: schema_migrations
| Column | Type | Description |
|---|---|---|
version |
uint64 | Migration version |
dirty |
bool | Migration state |
⚠️ Columns That Do NOT Exist
These are commonly guessed but do not exist:
published_at→ usepublished_dateordiscovered_datesource→ useblog_idJOINblogs.nameauthor→ not trackedcontent/body/text→ not stored (URLs only)categories/tags→ not stored in this DB version. Removed from schema (confirmed 2026-05-04). Do NOT reference in queries.created_at→ usediscovered_dateupdated_at→ not tracked
Verified Query Templates
⚠️ sqlite3 CLI may not be available in the agent environment (no system sqlite3 binary). Use Python's built-in sqlite3 module instead for reliable queries (see Python Usage Pattern below). The SQL templates below are still correct — use them as SQL strings in Python's conn.execute().
Daily scan — articles discovered on a specific date
SELECT b.name, a.title, a.url, a.published_date, a.discovered_date
FROM articles a JOIN blogs b ON a.blog_id = b.id
WHERE DATE(a.discovered_date) = 'YYYY-MM-DD'
ORDER BY b.name, a.discovered_date DESC;
NOTE: a.categories does not exist in the current DB schema. Omit it from SELECT unless you've verified via PRAGMA table_info that it exists.
Unread articles (triage queue)
SELECT b.name, a.title, a.url, a.discovered_date
FROM articles a JOIN blogs b ON a.blog_id = b.id
WHERE a.is_read = 0
ORDER BY a.discovered_date DESC;
Articles by specific blog
SELECT a.title, a.url, a.published_date, a.discovered_date, a.is_read
FROM articles a JOIN blogs b ON a.blog_id = b.id
WHERE b.name = 'simonwillison.net'
ORDER BY a.discovered_date DESC;
Articles by blog + date range
SELECT a.title, a.url, a.published_date, a.discovered_date
FROM articles a JOIN blogs b ON a.blog_id = b.id
WHERE b.name = 'r/LocalLLaMA'
AND DATE(a.discovered_date) BETWEEN 'YYYY-MM-DD' AND 'YYYY-MM-DD'
ORDER BY a.discovered_date DESC;
Count articles per blog (all time)
SELECT b.name, COUNT(*) as cnt
FROM articles a JOIN blogs b ON a.blog_id = b.id
GROUP BY b.name
ORDER BY cnt DESC;
Count articles per blog on specific date
SELECT b.name, COUNT(*) as cnt
FROM articles a JOIN blogs b ON a.blog_id = b.id
WHERE DATE(a.discovered_date) = 'YYYY-MM-DD'
GROUP BY b.name
ORDER BY cnt DESC;
Filter by keyword in article title
Since categories column does not exist in the current DB, filter articles by title keywords instead:
SELECT b.name, a.title, a.url, a.discovered_date
FROM articles a JOIN blogs b ON a.blog_id = b.id
WHERE a.title LIKE '%AI%'
OR a.title LIKE '%llm%'
OR a.title LIKE '%agent%'
ORDER BY a.discovered_date DESC;
Note: This is a title-only match. For content-based filtering, scrape articles via web_extract after getting URLs from DB, then filter in Python.
Recent articles across all blogs (last N days)
SELECT b.name, a.title, a.url, a.published_date, a.discovered_date
FROM articles a JOIN blogs b ON a.blog_id = b.id
WHERE a.discovered_date > datetime('now', '-N days')
ORDER BY a.discovered_date DESC;
Python Usage Pattern
import sqlite3
import json
import os
DB_PATH = "/opt/data/.blogwatcher/blogwatcher.db"
# Fallback: discover DB if not at primary path
if not os.path.exists(DB_PATH):
import subprocess
result = subprocess.run(
["find", "/", "-path", "*.blogwatcher*", "-name", "*.db"],
capture_output=True, text=True, timeout=15
)
candidates = [p.strip() for p in result.stdout.strip().split("\n") if p.strip()]
if candidates:
DB_PATH = candidates[0]
print(f"Discovered DB at: {DB_PATH}", file=__import__("sys").stderr)
def query_daily_scan(date_str):
"""Get all articles discovered on a specific date, grouped by blog."""
conn = sqlite3.connect(DB_PATH)
conn.row_factory = sqlite3.Row
rows = conn.execute("""
SELECT b.name, a.title, a.url, a.published_date, a.discovered_date
FROM articles a JOIN blogs b ON a.blog_id = b.id
WHERE DATE(a.discovered_date) = ?
ORDER BY b.name, a.discovered_date DESC;
""", (date_str,)).fetchall()
# Group by blog
blogs = {}
for row in rows:
if row['name'] not in blogs:
blogs[row['name']] = []
blogs[row['name']].append({
'title': row['title'],
'url': row['url'],
'published_date': row['published_date'],
'discovered_date': row['discovered_date'],
})
conn.close()
return blogs
def generate_daily_report(date_str, blogs):
"""Generate markdown report for inbox/rss-scans/."""
total = sum(len(arts) for arts in blogs.values())
md = f"""# Daily RSS Scan Report — {date_str}
> Source: blogwatcher-cli RSS scan
> Total articles: {total}
"""
# Sort: individual blogs first, then Reddit subs
order = sorted(blogs.keys(), key=lambda x: (x.startswith('r/'), x.lower()))
for blog_name in order:
md += f"## {blog_name}\n\n"
for art in blogs[blog_name]:
md += f"- [{art['title']}]({art['url']})\n"
md += "\n"
return md
Report Output Path
- Directory:
~/ai-topics/inbox/rss-scans/ - Filename:
daily-scan-YYYY-MM-DD.md - After creation, commit:
cd ~/ai-topics && git add inbox/rss-scans/ && git commit -m "wiki: daily RSS scan YYYY-MM-DD" && git push
Execution Tips
⚠️ Cron Mode Restrictions (CRITICAL)
In cron mode (hermes cron run), execute_code is blocked (the security scanner denies subprocess calls without a user present). The query_daily_scan() Python function above uses execute_code and WILL FAIL in cron mode. Instead, use one of these patterns:
Pattern A — Simple queries: Use inline python3 -c inside terminal(). This works for single SELECT statements, counts, and aggregations:
terminal("python3 -c \"import sqlite3, json; conn=sqlite3.connect('/opt/data/.blogwatcher/blogwatcher.db'); conn.row_factory=sqlite3.Row; rows=conn.execute('SELECT COUNT(*) FROM articles').fetchall(); print(json.dumps([dict(r) for r in rows]))\"")
For date-filtered queries, escape single quotes inside the double-quoted shell string with \\':
terminal("python3 -c \"...WHERE DATE(a.discovered_date)=\\'2026-06-03\\'...\"")
Pattern B — Complex queries (preferred): Write a Python script to /tmp/ via write_file, then run it with terminal("python3 /tmp/script.py"). This avoids all quote-escaping:
# Step 1: write_file → /tmp/query_blogwatcher.py (full script with imports, sqlite3 queries, JSON output)
# Step 2: terminal → python3 /tmp/query_blogwatcher.py
Quick one-liner for counting articles
For a simple count (no complex SQL), this pattern works in both interactive and cron mode:
python3 -c "import sqlite3; print(sqlite3.connect(DB_PATH).execute('SELECT COUNT(*) FROM articles').fetchone()[0])"
Replace DB_PATH with the actual path.
Standalone script for complex queries (cron-mode safe)
For queries with multiple joins, grouping, or JSON output, write to /tmp/ as a .py file via write_file then run it with terminal. This is the safest approach in cron mode:
# Write script first (via write_file tool, not echo)
python3 /tmp/query_blogwatcher.py
Pitfalls
published_atdoes not exist → usepublished_dateordiscovered_datesourcedoes not exist → JOINblogs.nameviablog_idcategoriesis JSON, not comma-separated → useLIKE '%"tag"%'or parse withjson.loads()is_readis 0/1 → useWHERE is_read = 0notWHERE is_read = FALSEDATE()function →DATE(a.discovered_date) = 'YYYY-MM-DD'for date filtering- No article content stored → only URLs; scrape with
web_extractif needed published_datecan be NULL → articles without RSS pub date have this set to NULLlast_scannedon blogs → ISO 8601 format, use to check freshness- sqlite3 CLI may not be installed — use Python's sqlite3 module instead
- DB may not be at the expected path — run DB Discovery Fallback (see above) before assuming location
- DB filename is
blogwatcher.db, notblogwatcher-cli.db— adjustfindpatterns accordingly - Cron HOME mismatch — In the Hermes cron environment,
HOMEis set to/opt/data/.hermes/home(not/opt/data). Theblogwatcher-clibinary resolves~/.blogwatcher/blogwatcher.dbusing$HOME, so it looks at the wrong path. Thedaily_inbox_collect.pyscript (canonical:~/.hermes/scripts/daily_inbox_collect.py) must usePROFILE_ROOTinstead ofPath.home()for_BW_HOME, andrun_blogwatcher_scan()must setHOME=str(PROFILE_ROOT)in the subprocess env. Seereferences/cron-home-fix.mdfor the exact patch.