name: dicom2fmriprep description: Generate scripts for the full fMRI preprocessing pipeline from raw DICOM files through BIDS conversion (heudiconv) to fMRIPrep, including HPC/SLURM execution via BABS. Use this skill whenever someone needs to preprocess fMRI data, convert DICOMs to BIDS, write heudiconv heuristics, run fMRIPrep on a cluster, set up BABS projects, fix BIDS validation errors, or generate any scripts related to the DICOM-to-fMRIPrep pipeline.
fMRI DICOM-to-fMRIPrep Pipeline
This skill helps users generate scripts for the complete fMRI preprocessing pipeline:
Raw DICOMs → heudiconv (BIDS conversion) → BIDS validation → fMRIPrep → preprocessed data
↑
(optionally via BABS on HPC)
When to Use This Skill
- Converting unsorted DICOM files to BIDS format
- Writing heudiconv heuristic files from scratch
- Fixing BIDS validation errors
- Running fMRIPrep (locally or on HPC with Singularity)
- Setting up BABS projects for large-scale fMRIPrep on SLURM clusters
- Generating SLURM submission scripts for neuroimaging pipelines
Pipeline Overview
Step 1: DICOM to BIDS with heudiconv
heudiconv wraps dcm2niix and handles the full conversion from DICOMs to a BIDS-compliant dataset. The process has two passes:
Pass 1 — Reconnaissance (discover what's in the DICOMs):
heudiconv \
--files /path/to/dicoms/{subject}/*/*/*.dcm \
-o /path/to/bids_output \
-f convertall \
-s SUBJECT_ID \
-c none
This produces .heudiconv/{subject}/info/dicominfo.tsv — a table of every DICOM series with metadata (dimensions, TR, protocol name, etc.). The user needs this to write their heuristic.
Pass 2 — Convert (apply the heuristic):
heudiconv \
--files /path/to/dicoms/{subject}/*/*/*.dcm \
-o /path/to/bids_output \
-f /path/to/heuristic.py \
-s SUBJECT_ID \
-ss SESSION_LABEL \
-c dcm2niix -b --minmeta --overwrite
Writing a Heuristic File
The heuristic maps DICOM series to BIDS filenames. Always ask the user to share their dicominfo.tsv first — it tells you exactly what series exist and how to match them.
import os
def create_key(template, outtype=('nii.gz',), annotation_classes=None):
if template is None or not template:
raise ValueError("Template must be a valid format string")
return (template, outtype, annotation_classes)
def infotodict(seqinfo):
"""Map DICOM series to BIDS filenames based on series metadata."""
# Define BIDS keys — adjust paths based on what the dataset contains
t1w = create_key('sub-{subject}/{session}/anat/sub-{subject}_{session}_T1w')
func_rest = create_key(
'sub-{subject}/{session}/func/sub-{subject}_{session}_task-rest_run-{item:02d}_bold'
)
func_task = create_key(
'sub-{subject}/{session}/func/sub-{subject}_{session}_task-TASKNAME_run-{item:02d}_bold'
)
fmap_ap = create_key('sub-{subject}/{session}/fmap/sub-{subject}_{session}_dir-AP_epi')
fmap_pa = create_key('sub-{subject}/{session}/fmap/sub-{subject}_{session}_dir-PA_epi')
dwi = create_key('sub-{subject}/{session}/dwi/sub-{subject}_{session}_dir-AP_dwi')
info = {t1w: [], func_rest: [], func_task: [], fmap_ap: [], fmap_pa: [], dwi: []}
for s in seqinfo:
# Filter out motion-corrected and derived series
if s.is_motion_corrected or s.is_derived:
continue
# Match by protocol_name, dimensions, TR — adapt to your scanner's naming
if 'mprage' in s.protocol_name.lower() and s.dim3 >= 160:
info[t1w].append(s.series_id)
elif 'rest' in s.protocol_name.lower() and s.dim4 > 50:
info[func_rest].append(s.series_id)
elif 'task' in s.protocol_name.lower() and s.dim4 > 50:
info[func_task].append(s.series_id)
elif 'distortion' in s.protocol_name.lower() and 'AP' in s.protocol_name:
info[fmap_ap] = [s.series_id]
elif 'distortion' in s.protocol_name.lower() and 'PA' in s.protocol_name:
info[fmap_pa] = [s.series_id]
elif 'dti' in s.protocol_name.lower() or 'dwi' in s.protocol_name.lower():
info[dwi].append(s.series_id)
return info
Key matching fields from seqinfo: protocol_name, series_description, sequence_name, dim1-dim4, TR, TE, image_type, is_motion_corrected, is_derived.
Important heuristic rules:
- Use
.append()for series that may have multiple runs (BOLD, DWI) - Use
= [s.series_id]for single-occurrence series (T1w, fieldmaps) - Always filter
is_motion_correctedandis_derivedfor functional data - Use
.lower()on protocol names — scanner naming is inconsistent - For single-session studies, omit
{session}from templates entirely
For the complete SeqInfo field reference and advanced patterns (multi-echo, IntendedFor population, ReproIn), see references/heudiconv-details.md.
Step 2: BIDS Validation
After conversion, validate the dataset. Present the user with options:
Option A — CLI validator (recommended for scripted workflows):
# Install
npm install -g bids-validator
# or: pip install bids-validator
# Run
bids-validator /path/to/bids_dataset
Option B — Web validator (good for quick checks, no install): Direct users to https://bids-standard.github.io/bids-validator/
Common BIDS fixes the skill should help generate scripts for:
- Missing
dataset_description.json→ generate it - Misnamed files → rename script
- Missing sidecar JSON fields (e.g.,
TaskNamefor func,IntendedForfor fieldmaps) → patch script - Extra files that aren't BIDS-compliant → add to
.bidsignore
Step 3: fMRIPrep
Running Locally with Singularity
# Build the image (do this once)
singularity build /path/to/fmriprep-<VERSION>.sif docker://nipreps/fmriprep:<VERSION>
# Pre-fetch TemplateFlow templates (required for offline HPC nodes)
export TEMPLATEFLOW_HOME=/path/to/templateflow
python -c "
from templateflow.api import get
get(['MNI152NLin2009cAsym', 'MNI152NLin6Asym', 'OASIS30ANTs', 'fsaverage', 'fsaverage5', 'fsaverage6', 'fsLR'])
"
# Run fMRIPrep
export SINGULARITYENV_FS_LICENSE=$HOME/.freesurfer.txt
export SINGULARITYENV_TEMPLATEFLOW_HOME="/templateflow"
singularity run --cleanenv \
-B ${BIDS_DIR}:/data:ro \
-B ${OUTPUT_DIR}:/out \
-B ${WORK_DIR}:/work \
-B ${TEMPLATEFLOW_HOME}:/templateflow \
/path/to/fmriprep-<VERSION>.sif \
/data /out participant \
--participant-label ${SUBJECT} \
-w /work \
--output-spaces MNI152NLin2009cAsym:res-2 \
--fs-license-file /opt/freesurfer/license.txt \
--nthreads ${SLURM_CPUS_PER_TASK:-8} \
--omp-nthreads 8 \
--mem_mb 30000 \
--skip-bids-validation \
--notrack
Commonly Forgotten Flags
Always ask the user about these and help them decide:
| Flag | What it does | When to use |
|---|---|---|
--output-spaces |
Where to resample results | Always specify explicitly. Common: MNI152NLin2009cAsym:res-2. Add MNI152NLin6Asym:res-2 if ICA-AROMA needed later |
--fs-license-file |
FreeSurfer license path | Always required. Free from https://surfer.nmr.mgh.harvard.edu/registration.html |
--fs-no-reconall |
Skip FreeSurfer surfaces | Saves hours; use when surfaces aren't needed |
--cifti-output 91k |
Output CIFTI dense timeseries | For HCP-style surface+volume analyses |
--dummy-scans N |
Discard initial volumes | When auto-detection isn't appropriate |
--fd-spike-threshold |
Motion outlier threshold | Default 0.5mm; stricter = 0.2mm |
--use-syn-sdc warn |
Fieldmap-less distortion correction | When no fieldmaps available |
--ignore fieldmaps |
Skip fieldmap correction | When fieldmaps are bad/unusable |
--anat-only |
Only anatomical processing | For running FreeSurfer first, then func later |
--low-mem |
Reduce memory at cost of disk I/O | When memory-constrained |
Resource Guidelines
| Scenario | CPUs | Memory | Walltime | Disk (work) |
|---|---|---|---|---|
| With FreeSurfer | 4-8 | 30 GB | 12-24h | 15-30 GB/sub |
| Without FreeSurfer | 4-8 | 16 GB | 1-4h | 5-15 GB/sub |
| Anat-only | 2-4 | 12 GB | 6-16h | 10-20 GB/sub |
For detailed fMRIPrep flags, troubleshooting, and output structure, see references/fmriprep-details.md.
Step 4 (Optional): Large-Scale Processing with BABS
BABS (BIDS App Bootstrap) automates large-scale fMRIPrep runs on SLURM clusters with DataLad-based provenance tracking.
BABS Workflow
1. Prepare inputs (BIDS as DataLad dataset + Singularity container as DataLad dataset)
2. Write configuration YAML
3. babs init → creates project
4. babs check-setup --job-test → verify everything works
5. babs submit --count N → submit jobs
6. babs status → monitor
7. babs merge → collect results
Preparing Inputs
# 1. Make BIDS dataset a DataLad dataset (if not already)
cd /path/to/bids_dataset
datalad create -f -D "My BIDS dataset" .
# 2. Create container DataLad dataset
singularity build fmriprep-24.1.1.sif docker://nipreps/fmriprep:24.1.1
datalad create -D "fMRIPrep container" fmriprep-container
cd fmriprep-container
datalad containers-add --url /full/path/to/fmriprep-24.1.1.sif fmriprep-24-1-1
Configuration YAML
Help the user fill in this template by asking about their cluster setup:
input_datasets:
BIDS:
required_files:
- "func/*_bold.nii*"
- "anat/*_T1w.nii*"
is_zipped: false
origin_url: "/path/to/bids_datalad_dataset"
path_in_babs: inputs/data/BIDS
cluster_resources:
interpreting_shell: "/bin/bash"
hard_memory_limit: 32G
temporary_disk_space: 200G
number_of_cpus: "6"
hard_runtime_limit: "24:00:00"
customized_text: |
#SBATCH -p YOUR_PARTITION
#SBATCH --nodes=1
#SBATCH --ntasks=1
script_preamble: |
source "${CONDA_PREFIX}"/bin/activate babs
module load singularity
job_compute_space: "${TMPDIR}"
singularity_args:
- --cleanenv
bids_app_args:
$SUBJECT_SELECTION_FLAG: "--participant-label"
-w: "$BABS_TMPDIR"
--fs-license-file: "/path/to/license.txt"
--output-spaces: "MNI152NLin2009cAsym:res-2"
--force-bbr: ""
--n_cpus: "6"
--mem-mb: "30000"
--skip-bids-validation: ""
--notrack: ""
zip_foldernames:
fmriprep: "24-1-1"
freesurfer: "24-1-1"
alert_log_messages:
stdout:
- "fMRIPrep failed"
- "Cannot allocate memory"
- "Excessive topologic defect encountered"
When asking the user about cluster config, help them decide:
- Partition: ask what's available on their cluster (
sinfo -s) - Memory: 32G is safe default; 16G if
--fs-no-reconall - CPUs: 4-8 is typical; diminishing returns beyond 16
- Walltime: 24h with FreeSurfer, 6h without
- Temp disk: 200G is generous; 100G usually enough
- Modules: what module system they use (
module avail singularity)
Running BABS
# Initialize project
babs init \
--container_ds /path/to/fmriprep-container \
--container_name fmriprep-24-1-1 \
--container_config /path/to/config.yaml \
--processing_level subject \
--queue slurm \
/path/to/my_babs_project
# Verify setup (always do this first!)
babs check-setup /path/to/my_babs_project --job-test
# Submit jobs (start small to verify)
babs submit /path/to/my_babs_project --count 2
# Check status
babs status /path/to/my_babs_project
# Once all jobs succeed, merge results
babs merge /path/to/my_babs_project
# Clone output for downstream use
datalad clone \
ria+file:///path/to/my_babs_project/output_ria#~data \
my_fmriprep_outputs
For the full BABS YAML reference, advanced configurations (anat-only + ingressed FreeSurfer workflow, multi-session handling), see references/babs-details.md.
Guiding New Users
Many users will be new to this pipeline. Don't assume they know the tools — walk them through it step by step. Start every interaction by understanding where they are:
- Assess their starting point: Ask what they have (raw DICOMs? already in BIDS? already ran fMRIPrep but need HPC scaling?). Don't dump the whole pipeline on someone who only needs one step.
- Gather their data details before generating anything:
- How are the DICOMs organized? (flat directory, by subject, by session?)
- What scanner and modalities? (Siemens/GE/Philips, T1w, BOLD, fieldmaps, DWI?)
- How many subjects and sessions?
- Where will they run fMRIPrep — local machine or HPC cluster?
- If HPC: what scheduler (SLURM), what partitions, what's their scratch space?
- Explain each step as you go: Briefly tell them why each step matters (e.g., "heudiconv's first pass doesn't convert anything — it just catalogs your DICOM series so we can write the mapping rules"). Users who understand the reasoning can troubleshoot on their own later.
- Generate one stage at a time: Don't produce all scripts at once. Generate the heudiconv heuristic first, have them run the reconnaissance pass, share the
dicominfo.tsv, then refine the heuristic together. Move to BIDS validation only after conversion works. - Offer to explain unfamiliar concepts: Terms like "BIDS", "heuristic file", "DataLad dataset", "RIA store" may be new. Define them naturally when they first come up.
Generating Scripts
When generating scripts, follow these principles:
- Generate modular scripts: separate scripts for each pipeline stage so users can run/debug independently
- Include error handling: check for missing files, validate outputs
- Add comments: explain what each section does, especially heudiconv matching logic
- Make paths configurable: use variables at the top of scripts, not hardcoded paths
- Support both bash and Python: generate whichever the user prefers
Recommended script structure:
scripts/
├── 01_dicom_to_bids.sh # heudiconv conversion
├── heuristic.py # heudiconv heuristic file
├── 02_validate_bids.sh # BIDS validation + fixes
├── 03_run_fmriprep.sh # fMRIPrep (local) or BABS setup
└── 04_check_outputs.sh # verify fMRIPrep outputs
Common Pitfalls
- Not filtering MoCo series: Siemens scanners duplicate BOLD as motion-corrected series. Always check
is_motion_corrected. - Missing
--minmetain heudiconv: Without it, JSON sidecars balloon with dcmstack metadata. - TemplateFlow on offline nodes: Pre-fetch templates on the login node before submitting jobs.
- FreeSurfer license: Forgetting to set it up is the #1 fMRIPrep failure. Always verify the path.
- Mixing fMRIPrep versions: Process the entire dataset with one version. Don't mix.
- Not running
babs check-setup --job-test: Always test before bulk submission. - Killing
babs submit: Never interrupt it mid-run — job IDs won't be captured.
References
references/heudiconv-details.md— Full SeqInfo fields, advanced heuristic patterns, multi-echo, IntendedForreferences/fmriprep-details.md— Complete flag reference, output structure, confounds, troubleshootingreferences/babs-details.md— Full YAML schema, advanced workflows (anat-only + ingressed-fs), consuming results