name: computational-pathology-research description: > Researches computational pathology and health AI literature using HuggingFace Papers MCP and Exa web search. Use when user says "find papers on", "research", "what's the latest on", "literature review", "survey", or asks about methods like MIL, SSL, WSI analysis, attention-based pooling, patch embedding, weakly supervised learning in pathology, or asks about specific tasks like tumor classification, survival prediction, biomarker detection, or lymphoma subtyping. Also activate for: "what has been done on X in comp path", "how do people approach Y in WSI analysis", "find related work for Z". Do NOT use for: general coding help, non-research queries, clinical deployment questions without a research component, or drug discovery / genomics topics outside pathology. context: fork
Computational Pathology Research Skill
Overview
Researches computational pathology and health AI literature. Runs parallel searches across HuggingFace Papers (academic, arXiv same-day) and Exa (web, practitioner content, implementation notes). Triages by relevance and recency, fetches full methodology and results sections, and synthesizes findings into actionable research context. Teaches the model how to think about the literature, not just retrieve it.
Requires: hf-papers MCP + exa MCP both configured. See mcp-tools.md Research Stack section.
Workflow
Step 1: Decompose the query
Before searching, decompose the user's topic into 4-5 keyword variants covering:
- The task (e.g. "tumor classification", "survival prediction")
- The method family (e.g. "multiple instance learning", "self-supervised pretraining")
- The data modality (e.g. "whole slide image", "WSI", "histopathology")
- Any model architecture angle (e.g. "vision transformer", "attention pooling", "CONCH", "UNI")
- Any recent trend angle (e.g. "foundation model pathology", "weakly supervised pathology 2024")
Step 2: Run parallel searches
Search HF Papers MCP and Exa simultaneously using the keyword variants.
HF Papers MCP (hf-papers) - use for:
- Academic papers, arXiv preprints, peer-reviewed work
- Same-day coverage of new preprints
- Finding papers by method name, task, or dataset
Exa MCP (exa, web_search_exa + deep_researcher) - use for:
- Practitioner blog posts, implementation notes
- GitHub repos with working code
- Broader web coverage for applied/industry content
- Cross-reference when HF Papers coverage is sparse
For each result, note: title, authors, date, abstract snippet, venue/preprint.
Priority venues to weight higher:
- MICCAI, CVPR, NeurIPS, ICLR, ECCV (top ML/vision)
- Nature Methods, Nature Medicine, Cancer Cell (high-impact clinical)
- Medical Image Analysis, IEEE TMI (domain journals)
- arXiv cs.CV, eess.IV (preprints - check date, weight recent)
Deprioritize: workshop papers without follow-up, preprints older than 18 months with low citation signal.
Step 3: Triage by relevance + recency
For each paper, score on three axes (quick pass, not deep read):
- Task match - does it directly address the user's problem?
- Method relevance - does it use or compare against relevant methods?
- Recency - 2023-2025 preferred; older papers only if foundational
Keep top 5-8 papers. Discard the rest.
Step 4: Fetch and read methodology + results
For the top papers, fetch full content via HF Papers or Exa crawl. Read specifically:
- Methodology: what exactly they did, what datasets, what backbone/architecture
- Results: key numbers (AUC, F1, C-index, etc.) and what they compared against
- Limitations: what they explicitly acknowledge doesn't work
- Code availability: note if repo exists
Do NOT summarize abstracts. Read the actual methods and results sections.
Step 5: Synthesize
Organize findings into:
- State of the art - what's the current best approach for this task and what numbers does it achieve?
- Method landscape - what families of approaches exist and what are their tradeoffs?
- Key datasets - what benchmarks are used (TCGA, CAMELYON, PANDA, internal cohorts)?
- Open problems - what do papers identify as unsolved or underexplored?
- Directly relevant papers - ranked list with one-line summaries and links
Step 6: Research framing (the thinking layer)
After synthesis, answer: given this landscape, how should the user think about their specific problem?
- What's the right method family to start from?
- What baselines are essential to compare against?
- What failure modes should they anticipate?
- What dataset would best validate their approach?
Domain Knowledge (Hard-coded Priors)
Method Families to Know
- Attention-based MIL (ABMIL, TransMIL, DSMIL) - standard weakly supervised WSI classification baseline
- SSL pretraining on patches (SimCLR, DINO, MAE applied to histology) - patch encoder quality drives downstream performance
- Foundation models (UNI, CONCH, Prov-GigaPath, CHIEF) - pretrained on large pathology corpora; often best starting point in 2024+
- Graph-based WSI (PatchGCN, Hierarchical Image Pyramid Transformer) - captures spatial relationships between patches
- Survival prediction (SurvPath, MCAT, PORPOISE) - multimodal (WSI + genomics) is the current frontier
Key Benchmarks to Reference
- TCGA - multi-cancer, multi-task gold standard; subtype classification + survival
- CAMELYON16/17 - lymph node metastasis detection
- PANDA - prostate cancer grading
- BRACS - breast cancer subtyping
- NLST - lung cancer screening (CT, not WSI, but relevant for clinical context)
- In-house cohorts - note when papers use private data; limits reproducibility
Common Failure Modes in This Domain
- Label noise in TCGA - pathologist labels aren't always clean; SSL results can be inflated
- Train/test leakage across slides from same patient - always check if splits are patient-level
- Overfitting on small cohorts - <500 slides is high-risk territory
- Stain normalization sensitivity - models often fail cross-site without normalization
- Patch resolution matters - 20x vs 40x changes what features are visible
Aravind's Prior Work Context
- NYU Langone: SSL pretraining + attention-based MIL on lymphoma WSI datasets
- Familiar with: ABMIL, patch-level DINO pretraining, TCGA lymphoma splits
- Current interest: bridging SSL pretraining quality -> downstream MIL performance
- Relevant comparisons: always benchmark against vanilla ABMIL + ImageNet ViT baseline
Output Format
- Lead with state of the art (1 paragraph)
- Method landscape as a comparison table where possible
- Key papers as a ranked list:
[Title](link) - one-line summary. Key result. - Open problems as bullet points
- Research framing at the end, clearly labeled
- Total length: 400-800 words unless deep dive requested
- Do NOT include: paper abstracts verbatim, citation counts, venue prestige signaling
Edge Cases
- Too broad a query (e.g. "research deep learning in pathology"): narrow to one task or method family before searching. Ask the user if unclear.
- Very recent topic with few papers (e.g. a method from last month): note the gap, fall back to closest related work, flag that the field is moving fast.
- User asks for a specific paper: fetch it directly via HF Papers MCP, don't run the full pipeline.
- Conflicting results across papers: surface the conflict explicitly. Don't pick a winner - describe what conditions favor each approach.
- HF Papers MCP unavailable: fall back to Exa
deep_researcherfor academic content. Note the fallback in output.