name: keep-research-workflow description: "Use Keep as a research operating system: collect, promote, connect, and distill a small corpus of high-value sources without getting lost in the weeds." version: 1.0.2 author: Hermes Agent license: MIT
Keep research workflow
Keep is a research tool, not a bookmark dump.
Quickstart: 0. put the source in Keep first, so it exists as a real note before you start organizing it
- collect the smallest corpus that plausibly answers the question
- promote the sources worth revisiting with titles and a small tag set
- connect them with the right edge type (
references,cites,informs) - distill the answer into a short synthesis note
Minimal example:
keep put /path/to/source.pdf -t type=paper -t topic=your-topic
keep tag https://example.com/source --tag title='Readable Title'
What put does, briefly:
- it creates the source note right away
- analysis, OCR, embeddings, and part creation happen asynchronously in the background
- the source may look sparse at first and then fill in after the queue runs
- for URLs,
putmay create or refresh a stub that later becomes a richer note after processing
If you do those well, Keep becomes a small, navigable research graph instead of a citation swamp.
The mechanics in one sentence
Keep stores documents, tags, and graph edges in the same note system: a source note is the primary object, tags shape search and analysis, and edge tags create navigable relationships to other notes.
When to use this
Use this workflow when you are:
- researching a topic from papers, blog posts, docs, or references
- building a source graph around a question
- deciding which sources matter enough to keep revisiting
- creating a concise research artifact others can follow
1) Collect
Collect the sources that plausibly matter.
Good collection targets:
- the anchor paper or survey
- a few high-value references
- key author pages or venue pages
- stubs created by reference edges that look important
Rules:
- do not collect everything
- start with the smallest corpus that can answer the question
- if the source is a PDF, tag it as
type=paper
Why type=paper matters
type=paper is not just a label. It changes how Keep analyzes the document.
In the updated workflow, a PDF tagged type=paper uses a paper-structured analysis prompt instead of the generic default analysis path. That means:
- better section-aware decomposition
- more paper-shaped parts
- cleaner retrieval anchors
- less generic blob-summary behavior
This is the right default for papers and surveys.
2) Promote
Promote only the sources that are worth revisiting.
Promotion means:
- add a readable
title - add a topic tag
- add a small set of relationship tags
- optionally add an author note if the author identity matters
Use this when the source is likely to be reused in the final answer or future follow-up.
Suggested tags:
type=papertopic=...title=...references=[...]cites=[...]informs=[...]
Important:
- use wikilink-style edge values like
https://url[[Readable Title]] - Keep can also parse markdown-style links into a bare URL plus title now
- do not create numbered pseudo-keys like
informs2orreferenced_by3
3) Connect
Keep the graph useful by preserving provenance and follow-up paths.
The data model for references and citations
References are not special text blobs. They are edges.
An edge tag is an ordinary tag key whose tagdoc declares an inverse relationship via .tag/KEY.
For example:
referencesis an edge tag for any parseable link or extracted outbound link- its inverse is
referenced_by citesis a formal bibliographic citation edge- its inverse is
cited_by informsis an edge tag for provenance / research influence- its inverse is
informed_by authorcan also be an edge tag when the tagdoc defines it that way
That means a tag like:
keep flow tag -p id='https://example.com/paper' -p 'tags={"references":["https://example.com/related[[Related Paper]]"],"cites":["https://doi.org/10.1145/3460231.3474243[[TKG Reasoning]]"]}'
creates graph relationships, not just metadata.
Why this matters:
- the note stays searchable by its own content and tags
- the target note gets the inverse listing automatically
- the graph becomes navigable in both directions
citescaptures deliberate bibliographic structure, whilereferencescan remain mechanical / extracted
Edge-tag rules
Use edge tags to capture relationships:
references: what this note cites or links to mechanicallycites: what this note formally cites in bibliographic structurereferenced_by: what points at this notecited_by: what formally cites this noteinforms: what this note meaningfully helps explain or unlock
Best practice:
- keep the number of promoted nodes small
- preserve links to important adjacent sources
- prefer stable identities when you have them
- for authors, use a stable author ID when available; otherwise use the name as a readable label
- use
citeswhen a structured enricher or citation parser knows the bibliography; usereferencesfor plain links discovered in text
4) Distill
Turn the corpus into a compact research answer.
The final output should usually be one of:
- a short synthesis note
- a curated bibliography
- a research memo with a clear conclusion
- a small graph of the sources that actually mattered
The distillation step should answer:
- what did we learn?
- what sources matter most?
- what should be followed up later?
- what should be ignored for now?
Working with papers
For PDFs, the preferred path is:
- put / ingest the source first
- tag as
type=paper - fetch / rehydrate the source if needed
- process the background queue
- inspect the parent note first
- treat parts as search anchors, not as the primary synthesis surface
If the source is scanned or image-based, OCR happens during ingestion before the paper analysis step, so the note you promote is already the text-backed result.
What parts are
Parts are the note fragments Keep creates when it analyzes a document. Think of them as lightweight sub-notes:
- each part captures a slice of the parent document
- parts are individually searchable
- parts help you recover a useful section or snippet later
- parts are not the main place to do synthesis
Why parts are useful
Parts are useful because they give you retrieval hooks without forcing the whole paper into one blob. With the updated paper flow:
- parts stay simpler
- they search together with the parent note’s context
- they do not inherit the full parent tag pile directly
- the parent note remains the main synthesis surface
That makes parts good for:
- finding a specific definition again
- recovering a table explanation or a key paragraph
- anchoring follow-up searches
How to use parts
Default behavior:
- leave parts alone
- use the parent note for synthesis
- promote a part only if it is genuinely reusable
Promote a part only when it contains a reusable nugget:
- a definition
- a key claim
- a table explanation
- a future-directions paragraph
- a quote worth revisiting
Rehydrating auto-vivified URL notes
When a URL note exists only as a stub created from a reference edge, rehydrate it by re-putting the same URL as a URI-backed note.
Flow tool example
keep flow get -p item_id='https://example.com/page'
keep flow tag -p id='https://example.com/page' -p 'tags={"title":"Readable Title"}'
keep flow put -p id='https://example.com/page' -p uri='https://example.com/page'
keep pending
keep flow get -p item_id='https://example.com/page'
CLI example
keep get https://example.com/page
keep tag https://example.com/page --tag title='Readable Title'
keep put https://example.com/page
keep pending
keep get https://example.com/page
That is the standard rehydration sequence.
Research assignment pattern
A good Keep workflow for a question like “Are there applications of compressive-sensing techniques to graph/embedding search and re-rank tasks?” looks like this:
- collect an anchor survey or baseline paper
- promote the sources that are truly relevant
- connect them with clean provenance edges
- distill the actual answer into a short synthesis note
The answer should come from the curated corpus, not from a single search result.
A concrete example flow
Suppose you start from a survey PDF and want to work outward.
- Collect the survey
keep put /path/to/survey.pdf -t type=paper -t topic=temporal-knowledge-graph
- Promote it
keep tag https://example.com/survey --tag title='Useful Survey Title'
- Connect to an important cited paper
keep tag https://example.com/survey --tag 'cites=https://doi.org/10.1145/3460231.3474243[[TKG Reasoning]]'
- Rehydrate any stub reference note you actually care about
keep flow put -p id='https://example.com/cited' -p uri='https://example.com/cited'
keep pending
- Distill the result into a short note
keep put "Summary: this line of work uses X, not Y; the important papers are A, B, and C." -t type=learning -t topic=your-topic
Good defaults
- prefer a small corpus over exhaustive collection
- prefer readable titles over raw URLs for important notes
- prefer stable identities for people and papers when possible
- prefer parent-note synthesis over part-note synthesis
- prefer a few strong edges over a large noisy graph
Pitfalls
- do not treat every citation as equally important
- do not over-tag parts or create lots of one-off keys
- do not keep drilling down once the research question is answered
- do not rely on parts as if structural chunking were perfect
- do not forget that
type=paperchanges analysis behavior
Verification checklist
- the corpus is small and intentional
- the important notes have titles
- papers are tagged
type=paper - edge values use
https://url[[Title]]or parsed equivalent - the graph preserves provenance without noise
- the parent note carries synthesis; parts stay lightweight
- the final output is a compact synthesis, not a citation dump