name: knowledge-extraction description: Extract, synthesize, and validate information from multiple sources with cross-verification, deduplication, and proper source attribution. Use when researching APIs, libraries, frameworks, verifying facts, or consolidating information from PDFs, web pages, and code files.
Knowledge Seeker
Overview
Systematic information extraction, synthesis, and validation skill that transforms raw data from multiple sources into verified, citable knowledge. Implements triangulation-based verification with confidence scoring and proper source attribution.
Key distinction:
rd2:knowledge-extraction= Information extraction, verification, and synthesis (knowledge work)- Other tools = Information access (WebSearch, WebFetch, grep, ref, etc.)
Core Principles
1. Verification Before Synthesis [CRITICAL]
Never present information without verification.
EXTRACT → VERIFY → CONSOLIDATE → CITE
- Extract information from primary source
- Cross-verify with 2+ independent sources
- Flag conflicts for manual resolution
- Always attribute sources
2. Triangulation Methodology
Use multiple sources to enhance credibility:
| Confidence Level | Requirement | Evidence |
|---|---|---|
| HIGH (>90%) | Direct quotes from official docs (2024+) | Official docs, API references |
| MEDIUM (70-90%) | Synthesized from 2+ sources | Cross-verified claims |
| LOW (<70%) | Single source or outdated | Flag for manual review |
| UNVERIFIED | No sources found | Do not present |
3. Progressive Disclosure
- Keep SKILL.md concise (overview + workflows)
- Move detailed methodologies to
references/ - Link to specifics when needed
4. MCP Tools Priority
- API/Library Docs →
ref(MCP) - GitHub Code →
mcp__grep__searchGitHub - Recent Facts (<6mo) →
WebSearch - Local Codebase →
Read/Grep - Specific URL →
ref_read_url(MCP)
See Tool Selection Summary below for complete table with fallbacks.
Quick Start
Basic extraction workflow:
# Extract information about a specific aspect
Source: file.pdf or https://example.com
Aspect: "authentication mechanisms"
Step 1: EXTRACT - Identify relevant information
- Scan source for aspect-related content
- Extract key facts, patterns, relationships
Step 2: VERIFY - Cross-check with other sources
- Use
reffor official documentation - Use
WebSearchfor recent validation - Use
mcp__grep__searchGitHubfor code examples
Step 3: CONSOLIDATE - Merge and deduplicate
- Combine verified information
- Remove duplicates
- Resolve conflicts
Step 4: CITE - Add source attribution
- Format: Source: Title | Verified: YYYY-MM-DD
- Include confidence level and reasoning
Extraction Workflows
Workflow 1: Single Source Extraction
Use for: Simple lookups, quick facts
1. Load source (file or URL)
2. Extract aspect-related information
3. Verify with 1-2 additional sources
4. Output with single citation
Workflow 2: Multi-Source Synthesis
Use for: Complex topics, conflicting information
1. Extract from primary source
2. Extract from 2+ secondary sources
3. Cross-verify claims
4. Identify conflicts
5. Consolidate with conflict notes
6. Output with multiple citations
Workflow 3: Aspect-Based Extraction
Use for: Targeted information retrieval
1. Define aspect clearly (e.g., "OAuth2 flows")
2. Scan sources for aspect matches
3. Extract relevant sections only
4. Filter by relevance score
5. Verify extracted content
6. Consolidate by aspect
Workflow 4: Consolidation From Multiple Sources
Use for: Merging related information
1. Extract from all sources
2. Identify unique information per source
3. Find overlapping information
4. Merge duplicates (keep best version)
5. Resolve conflicts (prioritize credibility)
6. Create unified knowledge structure
Tool Selection Summary
Priority Order
- MCP Tools (ref, searchGitHub, brave-search) - Fast, credible, specialized
- wt:magent-browser - JS-rendered content, screenshots, forms
- WebFetch - Static content, specific URLs
- WebSearch - Recent information, general queries
- Grep/Read - Local codebase analysis
Quick Reference
| Information Type | Primary Tool | Fallback |
|---|---|---|
| API/Library Docs | ref (MCP) |
WebFetch → WebSearch |
| GitHub Code | mcp__grep__searchGitHub |
WebFetch → WebSearch |
| Recent Facts (<6mo) | WebSearch |
ref → WebFetch |
| Local Codebase | Read / Grep |
- |
| Specific URL | ref_read_url (MCP) |
WebFetch |
For detailed tool selection guidance: references/tool-selection.md
Validation Summary
Triangulation Methodology
See Core Principles for confidence scoring framework (HIGH/MEDIUM/LOW/UNVERIFIED).
Credibility Hierarchy
- Official documentation (python.org, react.dev, etc.) - HIGHEST
- Official engineering blogs (Google, Meta, OpenAI) - HIGH
- Well-maintained GitHub repos - MEDIUM-HIGH
- Peer-reviewed papers - MEDIUM
- Technical blogs - MEDIUM
- StackOverflow, forums - LOW-MEDIUM
- Unclear sources - LOW
For detailed validation methods: references/validation-methods.md
Output Format
Standard Output Format
## [Aspect] Information Summary
### Extracted Information
[Consolidated information from verified sources]
### Sources
- [Source 1 Title](URL1) | **Verified**: YYYY-MM-DD
- [Source 2 Title](URL2) | **Verified**: YYYY-MM-DD
- [Source 3 Title](URL3) | **Verified**: YYYY-MM-DD
### Confidence
**Level**: HIGH/MEDIUM/LOW
**Reasoning**: [Brief justification for confidence level]
### Conflicts
[List any conflicting information with source attributions]
### Recommendations
[Actionable insights or next steps]
Citation Format
Inline citations:
**Source**: [React Server Components](https://react.dev/reference/react/18/server-components) | **Verified**: 2025-01-15
Multiple sources:
**Sources**:
- [Source 1](URL1) (verified: 2025-01-15)
- [Source 2](URL2) (verified: 2025-01-16)
Conflict Resolution Summary
Disagreement Types
| Type | Description | Resolution |
|---|---|---|
| Factual | Different dates, versions, numbers | Prioritize most recent, check official |
| Interpretive | Different approaches, recommendations | Present multiple viewpoints |
| Temporal | Information changed over time | Note temporal context, provide latest |
| Scope | Different contexts/environments | Clarify applicability context |
Resolution Protocol: Identify → Assess → Check → Present → Flag
For detailed conflict resolution: references/conflict-resolution.md
Deduplication Summary
Content Matching
| Type | Handling |
|---|---|
| Exact duplicates | Remove entirely, keep single instance |
| Near duplicates | Merge overlapping, keep most comprehensive |
| Semantic duplicates | Consolidate, note source variations |
Information Merging
- Sources agree: Present single fact, cite all sources
- Sources disagree: Present conflicts, attribute to sources
- Sources complement: Synthesize comprehensive view
For detailed deduplication strategies: references/deduplication.md
Anti-Patterns to Avoid
Anti-Pattern 1: Single Source Verification
Problem: Presenting information from only one source
Gate function:
# HIGH importance requires 2+ sources
# CRITICAL claims require 3+ sources
if claim_importance == "HIGH" and sources_count < 2:
return False, "Need 2+ sources for HIGH importance"
if claim_importance == "CRITICAL" and sources_count < 3:
return False, "Need 3+ sources for CRITICAL claims"
Anti-Pattern 2: Ignoring Conflicts
Problem: Presenting consensus without noting conflicts
Gate function:
# Conflicts must be resolved or explicitly noted
if conflicts and resolution_status == "IGNORED":
return False, "Conflicts must be resolved or flagged"
Anti-Pattern 3: Outdated Sources
Problem: Using outdated information without verification
Gate function:
# Sources older than threshold_months require verification
age_months = months_since(source_date)
if age_months > threshold_months:
return False, f"Source is {age_months} months old (limit: {threshold_months})"
Anti-Pattern 4: Circular Attribution
Problem: Citing sources that quote each other as independent
Gate function:
# Trace quotations to original sources
primary_sources = find_originals(source_chain)
if len(primary_sources) < 2:
return False, "Circular attribution - single original source"
Anti-Pattern 5: Missing Confidence Levels
Problem: Presenting information without confidence assessment
Gate function:
# All outputs must include confidence level
if "confidence" not in output.lower():
return False, "Output must include confidence level"
Integration with rd2 Ecosystem
Related Skills
rd2:tdd-workflow- Test-driven development for implementationrd2:task-decomposition- Task planning and breakdownrd2:super-coder- Code implementation
Usage Pattern
1. Use `rd2:knowledge-extraction` to research and validate information
2. Use findings to inform task decomposition
3. Use `super-coder` to implement based on verified knowledge
4. Use `rd2:tdd-workflow` to test implementation
Best Practices
DO
- Always verify information with 2+ sources when possible
- Use official documentation as primary source
- Include publication dates in citations
- Assign confidence levels based on verification
- Present conflicting information with attributions
- Use MCP tools when available (ref, searchGitHub)
- Follow tool priority (MCP > built-in)
- Cross-reference local codebase with grep/Read
DON'T
- Present information without verification
- Use outdated sources without checking
- Ignore conflicting information
- Skip confidence scoring
- Use single source for important claims
- Present synthesis as direct quotes without attribution
- Assume information is current without date verification
Verification Checklist
Before presenting extracted information:
- Primary source identified and loaded
- Cross-verified with 2+ sources
- Publication dates checked
- Conflicts identified and resolved
- Credibility assessed
- Confidence level assigned
- Sources properly attributed
- Output format follows standard template
Related Skills
rd2:tasks- Task file management for research tasksrd2:tdd-workflow- Test-driven development for implementing based on researchrd2:task-decomposition- Breaking down research tasks into subtasks
Common Use Cases
API Research
- Extract API endpoints from documentation
- Verify with code examples
- Cross-check multiple API versions
Framework Comparison
- Extract features from each framework
- Synthesize comparison table
- Verify with official sources
Best Practices Research
- Extract patterns from multiple guides
- Identify consensus approaches
- Flag outliers for manual review
Code Pattern Research
- Find implementation patterns via GitHub
- Verify with official documentation
- Synthesize best practice recommendations
For More Details
Detailed references for advanced usage:
references/tool-selection.md- MCP tool usage, decision trees, source type handlingreferences/validation-methods.md- Triangulation, credibility assessment, confidence scoringreferences/conflict-resolution.md- Handling disagreements across sources (factual, interpretive, temporal, scope)references/deduplication.md- Content merging and duplicate elimination strategies
Remember: Verification before synthesis. Never present information as fact without cross-checking. Multiple sources > single source. Always cite your sources with dates.