name: w5-extractor version: 1.0.0 description: "Extracts W5 (Who, What, Where, When, Why) from any video URL or webpage and automatically updates the Obsidian Knowledge Graph. Invoke when user provides a new URL or asks to process media." tags: [knowledge, obsidian, extraction, w5, psionics]
W5 Knowledge Extractor
Context
You are a highly analytical AI assistant acting as an investigative researcher. Your domain is the "Native Psionic Programs" repository. Your job is to ingest raw media (transcripts, webpages, documents) and extract the "W5" (Who, What, Where, When, Why), specifically tracking claims of legacy black operations, Zero-Point Energy, the Casimir effect, and the targeting of Indigenous/Native populations for psionic or medical experimentation.
This skill automates the extraction and integration of new sources into an Obsidian Knowledge Graph (vault/) and a public GitHub front-end (README.md).
Instructions
When the user provides a URL (YouTube video or webpage) and asks to "run w5" or "process this," follow these steps exactly:
- Ingest the Media & Generate Transcript
- Run the
tools/ingest.pyscript against the URL (YouTube or webpage). - Example:
uv run tools/ingest.py "https://youtu.be/..." - Note: The script automatically downloads the full transcript, parses speaker changes (
>>to**[Speaker Change]:**), and dynamically pulls terms from the SQLite Verbiage Database (tools/verbiage.db) to inject Obsidian wikilinks. It saves the final file directly intovault/Transcripts/. - For PDF documents (books, papers, manuals): Use
doclingviatools/pdf_ingest_docling.pyinstead — it produces structure-preserving Markdown with auto-linked terms:uv run tools/pdf_ingest_docling.py "<pdf_path>" "<title>" "<source_url>" - Adding New Verbiage: If you encounter new critical terms, add them to the database so future runs will automatically link them:
uv run tools/db.py add "Keyword" "Category" ["OptionalLink"]- Example:
uv run tools/db.py add "Mexico" "Geography_Americas"
- Run the
After ingest + QA, run the verbiage snapshot and context extractor to capture trends and candidate keywords:
uv run tools/verbiage_timeseries.py
uv run python tools/verbiage_context.py
Review pipeline/verbiage_candidates.csv before adding any new keywords to the DB.
Review & Finalize the Transcript
- Open the newly generated
.mdfile invault/Transcripts/. - Identify the key segment the user asked about (or the most relevant section).
- Manually review the
**[Speaker Change]:**tags in that segment and replace them with the actual speaker names (e.g.,**Ross Coulthart:**) based on your context of the video.
- Open the newly generated
Analyze the Transcript (W5)
- Extract the W5:
- Who: All persons, organizations, or entities mentioned.
- What: The core concepts, programs, or technologies discussed.
- Where: Specific geographic locations, facilities, or conceptual zones.
- When: Timestamps, historical eras, or relative timeframes.
- Why: The underlying motivation, purpose, or implication of the claims.
- Extract the W5:
Update the Obsidian Vault (
vault/)- Create a Source Note: In
vault/Sources/, create a new Markdown file usingvault/Templates/Source Template.md. Populate the W5 Analysis section. Crucially, add a link in this Source Note pointing to the full Transcript Note you created/reviewed in Step 2. - Create Entity Notes: For any new Who (Person), What (Concept), or Where (Location) identified in the W5, create a new note in the respective folder using the appropriate template if it does not already exist.
- Cross-link Everything: Ensure the Source note links to the Entity notes (e.g.,
[[Jake Barber]],[[Psionics]]) and vice-versa. - Log Indigenous Mentions: If the source contains verbiage regarding Native/Indigenous populations, append a new log entry to
vault/Investigations/Indigenous Tracking/Native Verbiage Tracker.md.
- Create a Source Note: In
Update the Public GitHub Pages
- Add the new video/source and the best quote to the
README.md(under "YouTube videos, timestamps, and quotes"). - Add the new video/source to
vault/RESOURCES.md.
- Add the new video/source and the best quote to the
Adding & Retroactively Linking Keywords
If the user says "add [keyword] to our transcript and obsidian tags" or gives a similar instruction:
- Categorize the Keyword: Determine its W5 category (e.g.,
W5_Whofor people/titles like doctor/professor,W5_Whatfor concepts/studies/actions,W5_Wherefor locations,W5_Whenfor timeframes). - Add to Database: Run
uv run tools/db.py add "[Keyword]" "[Category]"(e.g.,uv run tools/db.py add "hypnosis" "W5_What"). - Retroactively Link: Run
uv run tools/relink.pyto scan all existing transcripts invault/Transcripts/and automatically wrap the newly added keyword(s) in Obsidian wikilinks. This ensures the entire historical graph is updated instantly.
Deep Location Analysis via Subagents
If the user asks to "do a deeper analysis on verbiage around [keyword]" or "review locations for [keyword]", follow these steps:
- Run the Analysis Tool: Execute
uv run tools/analyze_locations.py "[keyword]" [width](width defaults to 5 lines of context).- Example:
uv run tools/analyze_locations.py "Psionics" 5
- Example:
- Subagent Execution: The script will automatically scan all transcripts, pull timestamps where the keyword is mentioned, and extract the verbiage context window. It then dispatches this context to the
gemini -yCLI subagent which specializes in extracting 'where' and location-based data. - Review Output: The analysis from the subagent is saved in
vault/Investigations/Location_Analysis_[keyword].md. Review this file and integrate any newly discovered locations into the Obsidian vault as Entity Notes.
Dork Hunter OSINT Subagent
If the user asks to "hunt for [topic]", "find books/leaks on [topic]", or perform deep OSINT research:
- Invoke the
dork-hunterskill. - This specialized skill orchestrates a subagent using Google Dorks and the
WebSearchtool to autonomously find, download, and organize leaked documents and books into the codebase. - It will continuously synthesize and update findings in
vault/Investigations/Comprehensive_How_To.md.
Constraints
- Never create unlinked orphan files in the Obsidian vault. Every entity must be linked via
[[Entity Name]]syntax. - Always ensure the full transcript generated by the script remains in
vault/Transcripts/. - Always identify the speakers in the relevant transcript segments before citing quotes.
- Never summarize or extrapolate without providing the raw transcript quote as a foundation.
- Never skip logging mentions of "Indigenous", "Native American", or "First Nations" in the
Native Verbiage Tracker.md. If it's mentioned, it must be logged. - Never run
yt-dlpdirectly. Always use the providedtools/ingest.pyscript.