name: citation-management description: Comprehensive citation management for academic research; use when you need to discover papers (Google Scholar/PubMed), extract/verify metadata (DOI/PMID/arXiv/URL), and produce validated, clean BibTeX for manuscripts. license: MIT author: AIPOCH
When to Use
- You need to find relevant or highly cited papers on a topic using Google Scholar or PubMed.
- You have identifiers (e.g., DOI, PMID, arXiv ID, URL) and must convert them into correct BibTeX.
- You want to verify citation accuracy (DOI resolution, required fields, consistency with CrossRef/PubMed).
- You need to clean, deduplicate, sort, and standardize an existing
.bibfile before submission. - You are preparing a thesis/manuscript and need a reproducible workflow from search → extraction → formatting → validation.
Key Features
- Paper discovery
- Google Scholar search with year filtering, pagination, and citation-count sorting.
- PubMed search with MeSH terms, field tags, publication-type filters, and date ranges.
- Metadata extraction
- Resolve DOI/PMID/arXiv/URL to structured metadata via CrossRef, PubMed E-utilities, and arXiv APIs.
- Batch processing from files containing mixed identifiers.
- BibTeX generation & cleanup
- Generate BibTeX entries with appropriate entry types and required fields.
- Format, sort (key/year/author), and deduplicate BibTeX libraries.
- Citation validation
- DOI resolution checks and metadata cross-checking.
- Required-field checks by entry type, syntax validation, duplicate detection, and optional auto-fix.
- Workflow integration
- Produces submission-ready
.bibfiles for LaTeX/Overleaf workflows and complements literature review pipelines.
- Produces submission-ready
Dependencies
- Python: 3.10+ (recommended)
- Python packages:
requests>=2.31.0scholarly>=1.7.11(optional; required only for Google Scholar automation)
Example Usage
A complete, end-to-end workflow that searches, extracts metadata, formats, deduplicates, and validates a bibliography:
# 1) Search PubMed (biomedical focus)
python scripts/search_pubmed.py \
--query '"CRISPR-Cas Systems"[MeSH] AND "Gene Editing"[MeSH]' \
--date-start 2020-01-01 \
--date-end 2024-12-31 \
--limit 200 \
--output crispr_pubmed.json
# 2) Search Google Scholar (broad coverage)
python scripts/search_google_scholar.py "CRISPR gene editing therapeutics" \
--year-start 2020 \
--year-end 2024 \
--limit 100 \
--output crispr_scholar.json
# 3) Extract metadata from search outputs (or mixed identifiers)
cat crispr_pubmed.json crispr_scholar.json > combined_results.json
python scripts/extract_metadata.py \
--input combined_results.json \
--output combined.bib
# 4) Add known papers by DOI (append)
python scripts/doi_to_bibtex.py 10.1038/s41586-021-03819-2 >> combined.bib
python scripts/doi_to_bibtex.py 10.1126/science.aam9317 >> combined.bib
# 5) Format + deduplicate + sort (newest first)
python scripts/format_bibtex.py combined.bib \
--deduplicate \
--sort year \
--descending \
--output formatted.bib
# 6) Validate + auto-fix common issues + emit report
python scripts/validate_citations.py formatted.bib \
--auto-fix \
--report validation.json \
--output final_references.bib
# 7) Inspect validation results
cat validation.json
Implementation Details
1) Search (Discovery)
Google Scholar (
scripts/search_google_scholar.py)- Supports query operators such as exact phrases (
"deep learning"), author filters (author:LeCun), title-only (intitle:"neural networks"), exclusions (-survey), and year ranges. - Typical parameters:
--year-start,--year-end: constrain publication years--limit: cap results--sort-by citations: prioritize highly cited papers (when supported by the script)
- Supports query operators such as exact phrases (
PubMed (
scripts/search_pubmed.py)- Uses NCBI E-utilities (e.g., ESearch/EFetch/ESummary) to retrieve PMIDs and metadata.
- Typical parameters:
--query: supports MeSH terms, field tags, and Boolean logic--date-start,--date-end: publication date filtering--publication-types: e.g.,Clinical Trial,Review--format: JSON or BibTeX output (if supported)
(See: references/google_scholar_search.md, references/pubmed_search.md)
2) Metadata Extraction (Normalization)
- Identifier inputs: DOI, PMID, arXiv ID, URL, or mixed lists/files.
- Primary sources:
- CrossRef API for DOI-centric journal metadata
- PubMed E-utilities for biomedical records (PMID/PMCID, MeSH, abstracts)
- arXiv API for preprints and versioned records
- DataCite API for datasets/software DOIs (if implemented/used)
- Field mapping goals:
- Required:
author,title,year - Articles:
journal,volume,number,pages,doi - Conferences:
booktitle,pages - Preprints: repository + identifier (e.g.,
eprint,archivePrefix)
- Required:
(See: references/metadata_extraction.md)
3) BibTeX Formatting (Quality & Consistency)
- Entry types commonly produced:
@article,@inproceedings,@book,@misc. - Formatting rules enforced/encouraged:
- Page ranges use
--(e.g.,123--145) - Protect capitalization in titles with braces (e.g.,
{CRISPR}) - Consistent author formatting (
Last, First and Last, First) - Stable citation keys (project convention; often
FirstAuthorYearKeyword)
- Page ranges use
(See: references/bibtex_formatting.md)
4) Validation (Correctness)
Validation typically checks:
- DOI validity: resolves via
doi.organd matches CrossRef metadata. - Required fields: present per entry type; no empty critical fields.
- Consistency: year format, numeric volume/issue, page-range syntax, URL accessibility.
- Duplicates: same DOI, near-identical titles, or same author/year/title combinations.
- BibTeX syntax: braces/quotes, commas, unique keys, special character handling.
Outputs may include a machine-readable report (e.g., JSON) with errors and warnings.
(See: references/citation_validation.md)
When Not to Use
- Do not proceed when required input files, identifiers, parameters, or context are missing — ask the user to provide them first.
- Do not assume capabilities beyond this skill's declared scope when the user requests external operations or inferences.
- Do not proceed without user confirmation when overwriting existing results, executing high-cost batch operations, or expanding task scope.
Required Inputs
| Field | Required | Format/Source | Example | If Missing |
|---|---|---|---|---|
| User task description | Yes | Text | Research question, writing goal, analysis objective | Stop and ask user to provide |
| Primary input material | Depends on task | Text, file path, ID, table, or literature | PMID, PDF, CSV, DOCX, keywords, etc. | Specify which material type is missing |
| Output preference | No | Text | Language, format, target journal, template | Use skill default format |
Output Contract
- Primary output: Structured result or target file aligned with this skill's objective.
- Optional output: Intermediate check notes, issue list, supplementary suggestions, or generated file paths.
- Format requirement: Unless the user specifies otherwise, prefer stable, reviewable Markdown or JSON; if the skill's bundled script requires a fixed format, use that format.
- If partially complete: Must explicitly mark as PARTIAL and state which steps are completed and which remain.
Failure Handling
- Missing critical input: Explicitly state which fields, files, or identifiers are missing and pause.
- Script, template, or resource execution failure: Report the failing step, likely cause, and recovery suggestions — do not silently degrade.
- Partial completion only: Return the verified portion first, then list remaining blockers and suggested next steps.
User Checkpoints
- Before executing batch processing, overwriting files, long-running searches, or multi-stage generation, confirm scope and output format with the user.
- Before proceeding when a key judgment is ambiguous, evidence is insufficient, or the workflow is entering the next stage, confirm with the user.
Input Validation
This skill accepts requests that match the documented purpose of citation-management and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
citation-managementonly handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
Quick Validation
- Check that key scripts, templates, or reference file paths this skill depends on exist.
- Check that the final output contains the core fields, sections, or files specified for this task.
- Check that results clearly mark assumptions, limitations, and incomplete items.