name: dreaming description: Knowledge consolidation cycle that analyzes collected articles and updates the wiki with significant findings while enforcing duplicate checks. category: research version: 2.0.0 author: Hermes Agent
Dreaming — Knowledge Consolidation Cycle
Automated consolidation process that analyzes recently collected articles and folds significant findings into the wiki knowledge base. Includes mandatory duplicate-check against adjacent scheduled jobs.
Based on the "context substrate" philosophy (Camp 2, per @witcheer 2026-04-16): memory isn't fact storage, it's structured context that compounds over time. Camp 2 tools solve compounding via read-context→work→write-back loops. This dreaming cycle follows Camp 2: wiki files are the source of truth, not hidden vector state.
Architecture
Pipeline
Phase 1: Data Collection (pre-run script)
- Script:
~/ai-topics/scripts/dreaming.py - Collects RSS scan articles, newsletter articles, existing wiki pages
- Outputs structured JSON to stdout
- Injected into cron prompt as context
Phase 2: Knowledge Consolidation (LLM processing)
- Runs on the configured cron schedule
- Receives collected data via prompt injection
- Analyzes, creates/updates wiki pages
- Commits changes to git
Step 0: Duplicate Check (MANDATORY — runs first every time)
Before processing, review what adjacent scheduled jobs have already completed:
- Daily Inbox Update (23:00 JST) — RSS scan + Newsletter triage + Wiki ingest
- Daily Wiki Update Report (20:00 JST) — Wiki update summary
- Daily Active Knowledge Crawl (00:00 JST) — Hot topic concept discovery
- Skill Inventory Check (01:00 JST) — New skill assessment
Duplicate-check rules:
- If an article was already processed by Daily Inbox Update, do NOT re-process it
- If a concept page was already created by Daily Active Knowledge Crawl, do NOT duplicate — only enrich if you have significant new insights
- If a skill was already assessed by Skill Inventory Check, reference that assessment rather than re-doing it
- If a wiki page was already updated and reported by Daily Wiki Update Report, skip redundant updates
- Only proceed with wiki consolidation for items NOT already handled by adjacent scheduled jobs
- If you find gaps or missed items from other jobs, fill them and note it in the report
Dream Cycle Phases
1. Light Sleep — Screening & Grouping
- Review articles NOT already processed by adjacent scheduled jobs
- Group by semantic themes (shared entities, related concepts, events)
- Identify recurring patterns across multiple sources
- Flag articles that appear in multiple sources (higher significance)
2. REM — Flat Synthesis (Writer Phase A: unbiased consolidation)
Score each theme/group using weighted signals WITHOUT newsjacking bias:
- relevance (0.30): Direct AI/LLM/agent relevance
- frequency (0.25): Number of mentions across sources
- query_diversity (0.15): Different sources discussing similar concepts
- recency (0.15): How recent the discussion is
- consolidation (0.10): How well it fits existing knowledge
- conceptual_richness (0.05): Depth and novelty of insights
Promotion thresholds:
- Score ≥ 0.65: Create or update wiki page
- Score 0.45-0.65: Add to existing page or log for review
- Score < 0.45: Skip (minor mention)
CRITICAL: This phase is for knowledge consolidation, not distribution selection. Do NOT apply newsjacking filtering here — capture everything that meets the threshold.
3. NJ Delivery Filter (Writer Phase B: distribution selection)
After flat synthesis, apply Newsjacking lens to select what to deliver/report:
Newsjacking Signal Scoring (0-5):
- 5/5: Trending topic + contrarian take + high debate potential (e.g., "X is dead" backed by data)
- 4/5: Riding viral wave + in-group resonance (e.g., Claude Code patterns, local LLM breakthroughs)
- 3/5: Pattern interrupt + novelty (unusual topic from trusted source)
- 2/5: Standard insight, well-executed but not debate-generating
- 1/5: Incremental update, low engagement potential
- 0/5: Noise, link dump, or already saturated topic
Delivery prioritization:
- NJ Score ≥ 4: Lead story — featured prominently in report, detailed analysis
- NJ Score 3: Secondary — included with context and cross-references
- NJ Score 2: Brief mention — one-liner in summary section
- NJ Score ≤ 1: Omit from delivery — wiki updated but not highlighted in report
This two-stage approach ensures:
- Wiki receives comprehensive, unbiased knowledge consolidation (Phase A)
- Reports are curated for maximum engagement and signal (Phase B)
3. Deep Sleep — Replay-Safe Integration
For each promoted theme:
- Check existing pages: Search wiki for related content
- Create new pages if significant enough:
- Follow wiki format: frontmatter + content + cross-references
- Link to at least 2 existing pages
- Add to appropriate category (entities, concepts, comparisons, queries)
- Update existing pages with new information:
- Append new findings with dates
- Note contradictions if present
- Bump
updateddate in frontmatter
- Update navigation:
- Add new pages to
wiki/index.md - Update total page count
- Append to
wiki/log.md
- Add new pages to
- Git commit & push:
cd ~/ai-topics && git add wiki/ && git commit -m "dreaming: consolidation YYYY-MM-DD" && git push
Workflow
Cron Job Configuration
name: Dreaming
schedule: "<configured cron schedule>"
script: dreaming.py
skill: dreaming
deliver: "discord:1233771389367095377:1491801814222504169"
Output Format
After processing, deliver a summary like:
# Dreaming Report — YYYY-MM-DD
## Duplicate Check Summary
- Items skipped (already processed by other jobs): N
- Gaps filled: N
- Overlapping areas identified: [list]
## Consolidation Summary
- Articles processed: N
- Themes identified: N
- Pages created: N
- Pages updated: N
## New Wiki Pages
- [[concepts/new-page]]: Brief description
## Updated Pages
- [[entities/existing]]: What changed
Sub-Patterns
Pattern A (CORE): Existing Coverage Depth Check
After Phase A identifies themes meeting the promotion threshold (score ≥ 0.65), do NOT automatically create/update wiki pages. First perform a depth check:
- Read the existing page(s) in full — not just check their existence
- Assess coverage depth: Does the existing page already cover the theme's core insight? Compare against the dreaming theme's summary
- Search for raw articles on disk (
~/wiki/raw/articles/) matching the theme's URLs or titles to see if content was already ingested - Only update if there are genuine gaps — don't update "because the threshold says so"
- Document in the report why each high-scoring theme was skipped (e.g., "Already covered comprehensively" vs "Minor detail gap filled")
Rationale: The dreaming checkpoint aggregates articles across 2+ day windows. By the time the dreaming cycle runs, the daily RSS pipeline or newsletter-ingest may have already processed many of these articles. The wiki's existing pages may already reflect the key insights.
Pattern B (NEWSLETTER): Newsletter Noise Filtering
When processing articles from newsletters (substack, beehiiv, etc.), apply these filters BEFORE scoring:
| Signal Type | Pattern | Action |
|---|---|---|
| Substack UI | play_audio=, post-comment, submitLike=, share=, redirect/app-store |
Skip |
| Substack UI | utm_campaign=email-read-in-app, @username mentions |
Skip |
| Redirect chains | substack.com/redirect/UUID |
Try web_extract or skip if no body |
| Beehiiv tracking | link.mail.beehiiv.com/v1/c/... |
Extract destination via web_extract |
| Duplicate URLs | Same raw_article_path appearing multiple times | Deduplicate — process once |
Pattern C (ENTITY): Batch Entity Discovery
When articles reference recurring people/companies without dedicated entity pages:
- Search existing entities first:
search_files "name" path=~/wiki/entities target=files - Check index.md slug_lookup: Verify if entity already catalogued under different name
- Create entity page if missing: Use standard frontmatter (title, created, updated, tags, related, sources)
- Minimum entity page content: Role/affiliation, key contributions, timeline highlights, 2+ cross-references, sources list
- Batch commit: Create all new entity pages first, then update index.md/log.md in a single commit
Pattern D (DEDUP): Duplicate Detection Matrix
Before creating or updating any page, check ALL sources:
| Check | Method | What it catches |
|---|---|---|
| Filename | search_files "name" path=~/wiki/entities |
Exact matches |
| Index entry | Read wiki/index.md slug_lookup |
Catalogued under different name |
| Content grep | search_files "name" target=content path=~/wiki |
Mentioned inside other pages |
| Recent sessions | session_search "name" |
Processed in previous dreaming cycles |
Pitfalls
Duplicate detection is MANDATORY — never re-process what other jobs handled
Don't create pages for everything — follow the scoring thresholds
Always check existing pages first — avoid duplicates
Cross-references are mandatory — isolated pages are useless
Commit in the same session — don't leave changes uncommitted
Report what changed — user needs to know what happened
Handle contradictions explicitly — don't silently overwrite
Keep pages scannable — split if over 200 lines
Update index.md and log.md — navigation backbone
Database Schema Constraints:
blogwatcher-cli.dbarticles table usespublished_dateordiscovered_date,is_readis integer 0/1,categoriesis JSON arrayScript Execution:
dreaming.pyoutputs JSON to stdout. If it times out, cron still fires but with missing contextPre-run Script JSON Parse Failure: When the pre-run script outputs
{"ok": false, "error": "failed to parse JSON response from dreaming-group output"}, the grouped themes are still available as a fallback at/opt/data/.hermes/cron/data/dreaming/grouped_themes_latest.json. Read this file directly to extract themes, articles, and run metadata. The checkpoint data is valid even whenokis false — it contains the same structure as a successful run. Do NOT stop processing; extract themes from the fallback file and proceed.Stale Dreaming Themes: The dreaming checkpoint may be 2-3 days old by the time the cycle runs. Themes identified in the checkpoint may have been processed by the daily RSS pipeline, newsletter-ingest, or other adjacent jobs in the interim. Cross-reference raw article file dates (
~/wiki/raw/articles/) and recentlog.mdentries to avoid re-processing. If a theme's key article was already fetched and corresponded to an existing wiki page with adequate coverage, skip the update.Log.md Corruption via Patch: When using
patchonlog.md, theread_fileoutput format (LINE|content) can cause accidental|prefix insertion. This happens because the patchold_stringmay include a pipe character from the read_file separator. Fix: Always verify log.md formatting after patching by re-reading the file. If||-appears instead of-at the start of a bullet line, run a corrective patch. To prevent: when reading log.md, mentally subtract theLINE|prefix before constructing old_string values.