name: literature_research description: General-purpose research skill for establishing state of the art, defining concepts, and collecting references for thesis chapters. Produces structured references.md files with data tables and bib-key lookups.
Thesis Literature & Context Research
A unified research skill for gathering state-of-the-art context, theoretical foundations, and specific references for thesis chapters. This skill merges the capabilities of the former literature_review and theory_context skills.
Target Notebook
- Name:
thesis references - ID:
1b7df790-7858-4fc8-879c-39f41238c4ae - Strict Rule: Exclusively use this notebook. Do not query other notebooks.
When to Use This Skill
Use this skill when:
- Establishing the state of the art (e.g., "What is the evidence for Dark Matter?")
- Defining a concept or standard physics (e.g., "What is WIMP annihilation?", "How is the J-factor defined?")
- Reviewing the current landscape (e.g., "What are the current limits on WIMP annihilation?")
- Collecting references for a chapter section (e.g., "Find reviews and key papers for Chapter 1")
- Citing standard results (e.g., "Who first calculated the Tremaine-Gunn bound?")
Do NOT use this skill for extracting details from the author's own papers (001)). Use paper_analysis instead.
Do NOT use this skill for deep extraction from review articles (002)). Use review_analysis instead. This skill identifies relevant reviews; review_analysis extracts from them.
Prerequisites
- Run
source_registryfirst to obtain the Review Articles (002)) source IDs. - This skill uses a two-phase query strategy (see below).
Two-Phase Query Strategy
Research follows two phases: start narrow with curated reviews, then widen to all sources for gap-filling.
Phase 1: Reviews Only (Big Picture)
Query only Review Articles (002) source IDs from source_registry) to establish the big picture, consensus definitions, and structural context.
mcp_notebooklm_notebook_query(
notebook_id="1b7df790-7858-4fc8-879c-39f41238c4ae",
query="<your question>",
source_ids=<review_002_ids> # only the ~8 review source IDs
)
Why reviews first? Reviews are curated, authoritative, and provide the "big picture." Starting here ensures the foundation is built on consensus before drilling into specifics.
Phase 2: All Sources (Gap-Filling & Specifics)
For follow-up questions, clarifications, or finding specific papers that reviews mentioned but didn't detail — query all sources by simply omitting source_ids (defaults to the entire notebook).
mcp_notebooklm_notebook_query(
notebook_id="1b7df790-7858-4fc8-879c-39f41238c4ae",
query="<your follow-up or specific question>",
conversation_id=<previous_conversation_id> # maintains context from Phase 1
)
Why omit source_ids? With 170+ general references, passing them individually is impractical. Omitting the parameter lets NotebookLM search the entire corpus, which naturally includes reviews + all specific papers.
When to Use Each Phase
| Phase | Use when... | source_ids |
|---|---|---|
| Phase 1 (Reviews) | Establishing concepts, definitions, state of the art, structural guidance | review_002_ids only |
| Phase 2 (All) | Finding specific papers, getting arXiv numbers, clarifying details, filling gaps | Omit entirely |
Conversation Continuity
Use conversation_id (returned by each query) for follow-up questions. This is especially useful for the Phase 1 → Phase 2 transition:
- Phase 1: "What are the main approaches to indirect DM detection?" → get
conversation_id - Phase 2: "For each approach you mentioned, list specific papers with arXiv numbers" → pass
conversation_id, omitsource_ids
Source Provenance Classification
Every reference identified during research falls into one of two categories:
| Type | Symbol | Meaning | Implication |
|---|---|---|---|
| Direct Source | ✅ | Paper is an individual source in the NotebookLM notebook | Content is directly queryable; we can extract details, equations, and context |
| Referenced Source | ❌ | Paper is only cited within a review or other notebook source | Content is NOT directly queryable; we must cite via the review that discusses it |
How to Determine Provenance
- Run
source_registry— this gives the full list of notebook sources with titles (usually containing arXiv IDs) - For each reference found during queries, check whether its arXiv ID appears as a separate notebook source filename
- If it does → ✅ Direct Source. If not → ❌ Referenced Source
Why This Matters
When writing thesis text, we can only paraphrase and build arguments from sources we can actually read. For ❌ Referenced Sources, we rely on the review's discussion of that paper. The "Cited In" column in the data table tells us which review(s) to query for context about that paper.
External Lookup via paper_lookup Skill
For ❌ Referenced Sources, use the paper_lookup skill to retrieve additional information. Common uses:
- Quick metadata/abstract:
paper_lookupRecipe 1 (InspireHEPget_paper_details) - Full paper text:
paper_lookupRecipe 2 (arXivdownload_paper→read_paper) - BibTeX entry:
paper_lookupRecipe 4 (InspireHEPget_bibtex) - Figures:
paper_lookupRecipe 3 (InspireHEPget_paper_figures+ download)
⚠️ Guardrails: External lookups are a triage tool, not a content source. Use them to decide whether to cite a paper, not to write based on an abstract alone. For building arguments, always rely on the review(s) that discuss the paper (listed in "Cited In"), or add the paper to the NotebookLM notebook as a full source.
Phase 3: Figure Scouting
After Phases 1–2 identify the key references, query NotebookLM for figures from the literature that could illustrate thesis content:
mcp_notebooklm_notebook_query(
notebook_id="1b7df790-7858-4fc8-879c-39f41238c4ae",
query="Which figures from the papers we discussed are considered
canonical or frequently-referenced illustrations of [topic]?
For each, state the paper (arXiv ID), figure number,
and what it shows.",
conversation_id=<previous_conversation_id>
)
Record figure candidates in references.md (see Figure Candidates Table in the Output section). These feed into section_drafting Step 4b and paper_lookup Recipe 3 for download.
Citation Preference Hierarchy
When citing a claim or result, follow this priority order:
- Prefer review papers (✅ Direct Sources,
002)prefix) — they provide authoritative, synthesized context that is directly queryable - Complement with the specific original paper (often ❌ Referenced) — adds scholarly precision and credit to the original work
- Use notebook general references (✅ Direct Sources without
002)) when available — these give us direct content access - Last resort: cite a paper only mentioned in passing in a review, without detailed discussion
Non-Peer-Reviewed Source Policy
Rule: PhD theses and unpublished preprints must NEVER be the sole citation for a specific claim. Always pair with the original peer-reviewed paper.
| Source Type | Reliability | Usage |
|---|---|---|
| Published books (Hooper, Dodelson) | ✅ Fully reliable | Can be cited as sole reference for standard results and derivations |
| PhD theses (e.g., Pinetti 2021) | ⚠️ Not peer-reviewed | Cite for structural guidance, but always pair with original peer-reviewed paper |
| Large preprints (e.g., Cirelli 2024) | ⚠️ Widely cited but unpublished | Acceptable as review reference; complement with original papers for specific claims |
Example citation pattern (LaTeX):
The NFW profile~\cite{Navarro:1996gj} is the standard parametrization
for CDM halos (see~\cite{Cirelli:2024ssz} for a review).
Here Cirelli:2024ssz is ✅ (we can query it for details), while Navarro:1996gj is ❌ (cited within Cirelli). Both are cited, but the review provides processable context.
Query Best Practices
Granular Querying: Never ask for an entire chapter at once. Break requests by sub-section (e.g., "1.1 Cosmological Context", "1.2 Particle Nature").
Specific Prompt Engineering:
- Dual-Reference Standard: Always request at least 2 distinct references for each topic — complementary perspectives (e.g., Theory vs. Observation, or two contrasting reviews).
- Corpus Priority: Prioritize sources already in the NotebookLM corpus (review articles, books) over external textbooks from general knowledge. Only cite external sources if the notebook lacks coverage.
- arXiv Numbers: Explicitly ask for "Specific Papers with arXiv numbers" to get primary sources for citation.
- Relevance Justification: Ask "Why is it relevant?" to ensure each source fits the narrative.
- Additional Sources: Always request a list of 5–8 Additional Sources per subsection, including authors, year, arXiv number, and a 5–10 word summary.
Follow-up Queries: Use
conversation_idacross both phases. For example:- Phase 1: "List relevant reviews on Indirect Detection" (with
source_ids=<review_ids>) - Phase 2: "For the reviews you just listed, which specific papers do they cite for gamma-ray line searches?" (omit
source_ids)
- Phase 1: "List relevant reviews on Indirect Detection" (with
Output
1. references.md
Save to chapter_XX/references.md. Follow the structure in resources/references_structure.md:
- Reviews & Textbooks — general consensus
- Key Specific Papers — primary sources
- References Breakdown by Section — detailed mapping
- Reference Data Table — structured table with provenance tracking
2. Reference Data Table
Include as Section 4 of references.md. Format:
| Paper Name | Bib Key | In NB | Cited In |
|---|---|---|---|
| Section Header | |||
| Planck 2018 VI | Aghanim:2018eyx |
✅ | — |
| NFW Profile | Navarro:1996gj |
❌ | Cirelli, Hooper, Pinetti |
| Some missing paper | N/A | ❌ | Cirelli |
Column definitions:
- Bib Key: looked up from
bibliography.bib(or use InspireHEP MCP). N/A = needs adding. - In NB: ✅ = separate source in NotebookLM, ❌ = only cited within a review
- Cited In: for ❌ papers, which review(s) discuss this paper (short names: Cirelli, Hooper, Dodelson, Pinetti, Bullock, etc.)
Bib key lookup procedure:
- Extract the arXiv number from the NotebookLM response
- Search
bibliography.bibfor the arXiv number using:grep_search(query="<arxiv_number>", SearchPath="bibliography.bib", Includes=["*.bib"]) - If found, extract the bib key (the string after
@article{or@book{etc.) - If NOT found, use
mcp_inspirehep_get_bibtexto fetch the entry and append tobibliography.bib - If InspireHEP also fails, write N/A — this signals the user needs to add the entry manually
Provenance lookup procedure:
- Get notebook sources from
source_registryoutput - For each reference, check if its arXiv ID appears in any source title
- Mark ✅ if found, ❌ if not
- For ❌ entries, note which review(s) cited the paper based on the query response
3. Figure Candidates Table
If Phase 3 identified relevant figures, include as Section 5 of references.md:
| Figure | Paper | Bib Key | Description | Section |
|---|---|---|---|---|
| Fig. 1 | arXiv:XXXX.XXXXX | Author:2020abc |
Rotation curve of NGC 6503 | 1.1.1 |
| Fig. 3 | arXiv:YYYY.YYYYY | Author:2022def |
DM density profiles comparison | 1.2.2 |
Column definitions:
- Figure: Figure number in the original paper
- Paper: arXiv ID (or DOI for non-arXiv papers)
- Bib Key: from
bibliography.biblookup - Description: What the figure shows (1 line)
- Section: Which thesis section would use this figure
These candidates are consumed by section_drafting Step 4b, which attempts download via paper_lookup Recipe 3.
4. Knowledge Insights
REQUIRED: After producing references.md, use the knowledge skill (save mode) to persist key insights to .agent/knowledge/. The knowledge skill defines the standard file format (YAML frontmatter + body) and handles deduplication.
Usage Examples
Scenario: Finding General Reviews
- User: "Find reviews on Indirect Detection."
- Query:
"List the most relevant review articles and books on Indirect Detection of Dark Matter. For each, explain why it is relevant."
Scenario: Finding Specific Citations
- User: "Who established the limits on neutrino masses?"
- Query:
"Provide a list of specific papers establishing limits on neutrino masses (e.g., Tremaine-Gunn), including arXiv numbers and a summary of the finding."
Scenario: Defining Standard Physics
- User: "Write the equations for the NFW profile."
- Query:
"Provide the mathematical definition of the NFW density profile and explain its parameters using standard references."
Scenario: Full Chapter Research
- User: "Research sources for Chapter 1."
- Action:
- Read
outline.mdto identify sub-topics for Chapter 1. - Iterate through each sub-topic (1.1, 1.2, 1.3...).
- For each, run a query combining the strategies above.
- Compile into
chapter_01/references.mdwith data table. - Save key insights to
.agent/knowledge/.
- Read