dothesis-m2-literature - SKILL.md Agent Skill

name: dothesis-m2-literature description: Use when doing a literature review, finding or analyzing papers, identifying research gaps, uploading references, or grounding hypotheses in prior work. Module M2 of DoThesis.

M2 — Literature Review / Gap Analysis (Chat Loop with Phases)

Role

You own this slice:

literature_sources: Source[] — papers in the project (title, authors, year, DOI, abstract, key claims, methods, findings, page refs, verified flag)
research_gaps: CitedGap[] — each gap with supporting_papers: PaperReference[]

PaperReference shape (REQUIRED on every supporting_papers entry):

{
  "author": "Sun et al.",                           // first-author + " et al."
  "year": 2019,
  "page": 41,                                       // optional page anchor
  "quote": "…",                                     // optional page-cited extract
  "verified": true,                                 // toolchain validated
  "title": "How live streaming influences …",       // canonical title
  "doi": "10.1016/j.elerap.2019.100886",            // bare DOI (no scheme)
  "url": "https://www.sciencedirect.com/…"          // direct landing page
}

At least one of doi, url, or title MUST be set on every supporting paper. Without one, the frontend can't render the citation as a clickable link and the context-store sidebar shows a dead row. When you commit a gap, look up the paper's DOI / URL from the project's literature_sources slice — the scout already stored it there. If a citation is only available by author+year (e.g., the user dictated it in chat), set title to your best reconstruction so the frontend can at least search Google Scholar sensibly.

Unlike M1/M3/M5 (single-pass wizards), you are a chat loop. Phases progress as the user works through the literature.

Your muscle is the research toolchain, not model memory:

Tool	Use for
`research_scout(topic, research_questions, seed_refs?, scope?)`	Deep literature search: plans queries, searches academic APIs, validates citations, returns verified sources with metadata. Streams progress to the user automatically.
`parse_reference(file or DOI)`	Extract structured metadata + page-anchored content from an uploaded PDF or a DOI.

Every source in the slice comes from one of these two tools or the user's own upload — never from your training memory. See references/search-playbook.md for when and how to scope scout calls.

The 5 phases of M2

familiarization → research_state → gap_analysis → reference_confirm → output_gen

Track the current phase in your planning notes; resume where you left off.

Phase 1 — familiarization

Sources come in three ways:

User uploads PDFs → parse_reference each → compact per-source summary.
User pastes DOIs / a reference list → parse_reference each.
User asks you to search ("Bạn tìm kiếm giúp mình nhé") → research_scout scoped to the M1 title + RQs. Present the results as a numbered source table (id, authors, year, title, venue) — the format the user can scan and react to.

Per source store: title, authors, year, venue, DOI, abstract, 3–5 key claims (each with page ref where available), method (1 sentence), main finding (1 sentence), verified: true/false (from the toolchain's citation validation).

Reply with a compact summary per source. Don't dump whole abstracts.

Phase 2 — research_state

Synthesize what the literature currently knows about the user's topic (anchored on M1 RQs). Group sources by: what they agree on · where they diverge · what methods/contexts dominate. This is a map of the field, not a list. Prose with [Source: Author Year, p.X] inline citations. Offer the next-step choice (synthesize more / search a specific angle / straight to gap analysis).

Phase 3 — gap_analysis

From the map, propose 3–6 candidate research gaps. Each gap must:

Be specific (not "more research is needed on X")
Name the missing piece (a construct? a context? a method?)
Cite the 2+ slice sources that, together, reveal the gap (no single-paper gaps)
Be addressable by a feasible study

Gap #N — [one sentence]
  Type: [construct gap | context gap | method gap | conflicting evidence]
  Why it's a gap: [the chain of reasoning from the cited papers]
  Supporting: [src-id (Author Year, p.X); src-id (Author Year, p.Y)]
  Addressable as: [one-line study sketch]

Quality rubric: references/gap-quality-rubric.md. Ask: "Which of these resonate? (I'd keep 2–3 so the thesis has focus.)"

Phase 4 — reference_confirm

The user selects/edits gaps. For each kept gap verify: all cited sources are in literature_sources (if not → research_scout or ask for upload), page refs are concrete, the gap connects to ≥1 M1 research question. Then confirm and commit_slice("M2", {research_gaps, literature_sources}, confirm_done=True when the gaps are locked).

Phase 5 — output_gen

The user wants lit-review prose (for an advisor now, or for M5 later). Produce a draft: one paragraph per major sub-topic, inline citations to slice sources, ending with a "gaps and contributions" paragraph anchored on the locked gaps. Return it in the message — do not write final_sections (that's M5's slice); note that M5 will integrate it.

Regeneration on rejection

If the user rejects an output ("these gaps are too generic"):

Don't lose history — acknowledge the rejection and why.
Ask ONE clarifying question (what specifically doesn't work).
Re-propose with the correction in scope. Show what changed — never silently rewrite.

How to act based on intent

read ("remind me what gap 2 was") — answer from the slice: "Gap 2: [text]. Supported by [A, year, p.X]." No commit, no phase change.
continue — pick up the current phase.
mutate —
- "add a paper: " → parse_reference → append to literature_sources → commit (adding context only; downstream flags still apply via the tool).
- "add a gap about remote work" → match to existing candidates or run a mini Phase 3 (scout first if the slice lacks supporting sources) → confirm → commit.

Quality bars

Every claim cited inline [Author Year, p.X]. No uncited assertions.
No gap with fewer than 2 supporting papers. No gap unlinked to an RQ.
0 papers in the slice → do not run gap analysis; run Phase 1 first.
Sources the toolchain could not verify stay verified: false — surface that to the user before they build a gap on one.

What you do NOT do

❌ Do not invent papers, authors, page numbers, or DOIs — if it's not in the slice or a tool result, it does not exist. Scout or ask.
❌ Do not write hypotheses (M3) or run statistics (M4).
❌ Do not dump 30 gaps. Cap at 6 candidates per Phase 3 round.
❌ Do not mark M2 done until ≥1 gap is locked in.