name: help description: "Use when the user asks general questions about DerivaML, Deriva, the deriva MCP server, or what they can do with these tools — including 'what is DerivaML', 'how do I use Deriva', 'what can you help me with', 'how does this work', or 'where do I start'. Also use for broad orientation questions about catalogs, datasets, experiments, hydra-zen configuration, ML workflows, or the MCP server when the user is asking 'how do I approach this' rather than requesting a specific action." disable-model-invocation: true
DerivaML Capabilities Guide
When the user asks what's possible or needs orientation, present the following guide. Tailor your response to their context — if they mention a specific area, focus on that section. If they're brand new, give the full overview.
What I Can Help You With
Set Up Your Environment
Set up a new DerivaML project from a template
Install Jupyter kernels and configure notebook dependencies
Authenticate with Deriva/Globus
Check if your DerivaML ecosystem is up to date — versioning content lives in two troubleshooting skills:
/deriva:troubleshoot-deriva-errors(deriva-skills) — "Versioning and updates" section covers the foundation: deriva-py + deriva-mcp-core + the deriva plugin/deriva-ml:troubleshoot-execution(this plugin) — "Versioning and updates" section covers the DerivaML layer: deriva-ml + deriva-ml-mcp + deriva-ml-skills (this plugin)
Check the foundation first; the DerivaML stack depends on it. Or just ask "check versions" / "am I up to date?".
Configure linting, docstrings, and coding standards
Just ask: "help me set up my environment", "am I up to date?", or "check deriva versions"
Define Your Catalog Structure
- Create tables for your domain data (images, subjects, samples, etc.)
- Create asset tables for storing files (images, model weights, CSVs)
- Add columns, foreign keys, and constraints
- Set up controlled vocabularies with terms and synonyms
- Customize how tables appear in the Chaise web UI
Just ask: "create a table for patient images" or "set up a vocabulary for diagnosis types"
Explore Your Catalog
- Discover what's in your catalog using natural language search — tables, features, vocabularies, datasets, and experiments are all indexed and searchable via
rag_search - Query and filter catalog tables
- Look up records by RID
- Count records, sample data, browse vocabularies
| Question | How to find out |
|---|---|
| "What tables exist?" | rag_search("tables and their purpose", doc_type="catalog-schema") |
| "What features are defined?" | rag_search("feature definitions", doc_type="catalog-schema") |
| "What datasets are available?" | rag_search("datasets", doc_type="catalog-data") |
| "What vocabulary terms can I use?" | rag_search("vocabulary terms", doc_type="catalog-schema") |
| "How do I create a dataset?" | rag_search("how to create a dataset", include_schema=False, include_data=False) |
Just ask: "what's in this catalog?", "show me the first 20 images where Diagnosis is Normal", or "what features exist on Image?"
Organize Data for ML
- Create datasets and add members from catalog tables
- Split datasets into training/testing/validation partitions
- Create features for labeling and annotation (classification, ground truth, confidence scores)
- Manage dataset versions for reproducibility
- Download and prepare data for ML frameworks (denormalize, BDBag, restructure for PyTorch)
- Track asset provenance — find which execution created a file
Just ask: "create a labeled dataset and split it 80/20" or "denormalize my dataset into a DataFrame"
Run Experiments
- Run ML experiments with full provenance tracking
- Configure experiment presets and hyperparameter sweeps
- Do dry runs to test configuration before committing
- Run Jupyter notebooks with execution tracking
- Create new model functions and wire them into the project
- Write and validate Hydra-Zen configuration files
Just ask: "run the cifar10_quick experiment" or "create a new model for image classification"
Troubleshoot Problems
- Debug execution failures (authentication, timeouts, missing files)
- Fix stuck executions
- Diagnose missing data in dataset exports
- Resolve version mismatches
Just ask: "my execution is stuck in Running" or "my dataset bag is missing images"
Write Scripts for Catalog Operations
- Generate Python scripts for batch data loading, ETL, and feature population
- Scripts include provenance tracking and dry-run support
- Committed scripts ensure reproducibility
Just ask: "write a script to load annotations from a CSV"
Tips
- This is the DerivaML front door. For ML work (datasets, workflows, executions, features, experiments) you're in the right place. If your task is pure generic-catalog onboarding with no ML layer — first connection, schema exploration, a safe first mutation, loading rows — the foundation's
/deriva:getting-started(deriva-skills) is the more focused walkthrough. When both plugins are loaded, start here; this guide routes you out to the/deriva:skills for the generic steps. - Start with
rag_searchfor any "what is" or "what exists" question — it searches schema, data, and docs in one call - You don't need to know command names — just describe what you want in plain language
- I'll guide you through the steps — each capability includes best practices and common pitfalls
- Tools are stateless — every MCP tool takes
hostname=andcatalog_id=arguments explicitly. There's no "connect" step; just tell me which catalog you want to work with (e.g., "work with the cifar10 catalog on dev.derivacloud.org") - Use dry runs when experimenting — add "dry run" to any request to preview without making changes