name: ma-search-bibliography description: Conduct literature searches for meta-analysis using Python with uv, query PubMed and other databases, deduplicate results, and store round-based bibliographies with notes. Use when building or updating the evidence corpus.
Ma Search Bibliography
Overview
Run reproducible database searches, capture the search strategy, and produce versioned .bib files.
Inputs
01_protocol/search-plan.md01_protocol/pico.yaml
Outputs
02_search/round-01/queries.txt02_search/round-01/results.bib02_search/round-01/dedupe.bib02_search/round-01/log.md
Workflow
- Translate PICO terms into database-specific queries and save them in
queries.txt.- Read from
01_protocol/pico.yaml(L1-8: population, intervention, comparison fields) - Write to
02_search/round-01/queries.txt
- Read from
- Initialize Python tooling with uv in
tooling/python/usinguv initanduv adddependencies. - Run search scripts with
uv runfromtooling/python/(avoid directpython3calls).- Use
scripts/pubmed_fetch.py(L105-155: fetch_pubmed_records function) - Use
scripts/scopus_fetch.pyfor Scopus API
- Use
- Use
uv toolfor any one-off CLI utilities (do not install them globally). - Always search PubMed + Scopus as the mandatory minimum (PRISMA requires ≥2 databases). Optionally extend to Embase and Cochrane if defined in the protocol.
- Export results to
.biband record the run date, database, and query hash inlog.md.- Use
scripts/pubmed_fetch.py(L54-103: build_bib_entry function) - Write to
02_search/round-01/results.bib - Write metadata to
02_search/round-01/log.md
- Use
- Deduplicate by DOI, PMID, and title, then save
dedupe.bib.- Use
scripts/dedupe_bib.py - Write to
02_search/round-01/dedupe.bib
- Use
- If running updates, increment the round folder name and record deltas.
- Generate PRISMA flowchart after search completion using
generate_prisma_flowchart.R(see below).- Use
scripts/generate_prisma_flowchart.R - Output:
02_search/prisma-flow.png(300 DPI minimum)
- Use
PubMed Implementation Notes
- Use
scripts/pubmed_fetch.pyfor the default PubMed pipeline withuv run. - Set an email and API key, respect rate limits, and use history for batch retrieval.
- See
references/pubmed-eutils.mdfor a compact tutorial and API notes. - Read API keys from
.envin the project root.
Resources
Python Scripts (Search & Deduplication)
scripts/pubmed_fetch.pyfetches PubMed records and writes BibTeX.scripts/dedupe_bib.pyremoves duplicate records based on DOI, PMID, or title.scripts/build_queries.pybuilds multi-DB queries frompico.yaml.scripts/mesh_expand.pyexpands terms via the MeSH RDF lookup service.scripts/expand_terms.pyexpands PICO terms using MeSH and optional Emtree synonyms.scripts/run_multi_db_search.pyruns multi-DB search, merge, and counts.scripts/multi_db_dedupe.pymerges and deduplicates multiple BibTeX files.scripts/db_counts.pysummarizes per-database counts for PRISMA.scripts/search_report.pygenerates a per-database query + count report.scripts/search_audit.pygenerates a JSON audit with query hashes and parameters.scripts/scopus_fetch.pyfetches Scopus Search API results.scripts/embase_fetch.pyfetches Embase Search API results.scripts/cochrane_fetch.pyfetches Cochrane ReviewDB API results.scripts/bib_subset_by_ids.pyextracts a BibTeX subset from CSV record IDs.scripts/zotero_fetch.pyfetches records from a Zotero collection.scripts/zotero_sync.pysyncs a.bibfile back to a Zotero collection.scripts/env_utils.pyloads.envcredentials.
R Scripts (PRISMA Flowchart)
scripts/generate_prisma_flowchart.Rgenerates PRISMA 2020 compliant flow diagrams in PNG/PDF/SVG/HTML formats.
Reference Documentation
references/pubmed-eutils.mdsummarizes the E-utilities workflow.references/database-auth.mdsummarizes authentication per database.references/emtree-synonyms-template.csvprovides a template for Emtree synonyms.references/prisma-flowchart-guide.mdprovides complete PRISMA 2020 flowchart generation guide.
Notes
- Keep all rounds. Do not overwrite prior
.bibfiles. - Add a short note in each
.bibentry for the round (example:note = {round-01}).
PRISMA Flowchart Generation
Purpose: Generate publication-quality PRISMA 2020 flow diagrams automatically.
When: After completing database searches and deduplication.
Command:
cd ma-search-bibliography/scripts
Rscript generate_prisma_flowchart.R \
DB_RECORDS \
SCREENED \
EXCLUDED \
FULLTEXT \
INCLUDED \
[PARTICIPANTS] \
[OUTPUT_DIR]
Example (after search completion, before screening):
# Count database records
DB_RECORDS=$(wc -l < ../../projects/<project-name>/02_search/round-01/dedupe.bib | xargs)
# Generate initial flowchart (screening numbers TBD)
Rscript generate_prisma_flowchart.R $DB_RECORDS 0 0 0 0 NA ../../projects/<project-name>/figures/
Outputs:
prisma_flowchart.png(300 DPI, for manuscript)prisma_flowchart.pdf(vector, for publication)prisma_flowchart.svg(scalable, for presentations)prisma_flowchart_interactive.html(interactive, for supplementary materials)
Time: 30 seconds to 2 minutes
See: references/prisma-flowchart-guide.md for complete documentation
Validation
- Confirm query coverage matches the protocol scope.
- Verify dedupe retains the best metadata per record.
- Generate PRISMA flowchart to visualize search results and verify counts.
Pipeline Navigation
| Step | Skill | Stage |
|---|---|---|
| Prev | /ma-topic-intake |
01 Protocol & PICO |
| Next | /ma-screening-quality |
03 Screening & Quality |
| All | /ma-end-to-end |
Full pipeline orchestration |