name: medical-data-tools description: Medical data workflow guide for DICOM metadata inspection and basic de-identification, physiological signal analysis with NeuroKit2, and cohort-table profiling for clinical research datasets. Use when the user asks to inspect imaging metadata, summarize ECG/PPG/EDA/RSP/EMG signals, or profile tabular medical datasets without making patient-specific diagnoses or treatment decisions.
Medical Data Tools
Use this skill when the user asks to inspect medical imaging files, biosignal recordings, or cohort tables for research, QA, or data-engineering work.
Typical triggers:
- inspect DICOM files, modality mix, study or series structure, metadata completeness
- write basic de-identified DICOM copies for downstream research workflows
- analyze ECG, PPG, EDA, RSP, or EMG tables with NeuroKit2
- summarize clinical cohort tables exported from EHR, OMOP, FHIR, registry, or claims workflows
- profile labels, codes, visits, time ranges, or subgroup balance before modeling
Environment Check
which python3 || true
python3 - <<'PY'
mods = ["pandas", "numpy", "pydicom", "neurokit2"]
for name in mods:
try:
__import__(name)
print(f"{name}: ok")
except Exception as exc:
print(f"{name}: missing ({exc})")
PY
If pydicom or neurokit2 is missing, say so immediately instead of pretending the analysis ran.
Bundled Assets
templates/dicom_inspect.pytemplates/neuro_signal_analyze.pytemplates/clinical_cohort_profile.py
Preferred Workflow
- Identify the input surface first: DICOM file tree, CSV or TSV signal table, or cohort table.
- Run the smallest deterministic template before any broader interpretation.
- Save both the machine-readable output and the summary JSON.
- Report missing columns, unreadable files, or module gaps explicitly.
- Treat all outputs as research or operations artifacts, not clinical decisions.
DICOM
Use templates/dicom_inspect.py for:
- modality or study inventory
- metadata QA
- study/series counts
- basic research-oriented de-identification copies
Quick start:
python3 templates/dicom_inspect.py imaging/ct_series \
--recursive \
--output medical/dicom_inventory.csv \
--summary medical/dicom_inventory.json
Basic de-identification example:
python3 templates/dicom_inspect.py imaging/mri_case \
--recursive \
--deidentify-dir medical/dicom_deidentified \
--output medical/dicom_case.csv \
--summary medical/dicom_case.json
Deliverables:
- per-file metadata CSV
- summary JSON with modality counts and study/series counts
- optional de-identified DICOM copies
State clearly that the built-in de-identification is basic tag scrubbing, not a validated anonymization pipeline.
Biosignals
Use templates/neuro_signal_analyze.py for:
- ECG feature extraction and quality-aware summaries
- PPG, EDA, RSP, or EMG preprocessing and interval features
- quick physiology feature generation for downstream research models
Example:
python3 templates/neuro_signal_analyze.py \
--input signals/ecg.csv \
--signal-column ecg \
--signal-type ecg \
--sampling-rate 250 \
--signals-output medical/ecg_processed.csv \
--output medical/ecg_features.csv \
--summary medical/ecg_features.json
Deliverables:
- one-row feature CSV
- optional processed signal CSV
- summary JSON with duration and processing metadata
Do not overstate these outputs. They are research features, signal summaries, and QA signals, not diagnoses.
Cohort Tables
Use templates/clinical_cohort_profile.py for:
- patient and visit counts
- label balance checks
- code distribution checks
- subgroup balance before survival, prediction, or trial-emulation work
Example:
python3 templates/clinical_cohort_profile.py \
--input cohorts/nsclc_registry.csv \
--patient-id-column patient_id \
--visit-id-column visit_id \
--time-column encounter_time \
--label-column response \
--code-column regimen \
--group-column sex \
--group-column stage \
--output medical/nsclc_profile.csv \
--summary medical/nsclc_profile.json
Deliverables:
- normalized metric table CSV
- summary JSON with patient, visit, label, code, and group distributions
Output Expectations
Good answers should mention:
- the exact input paths and column names used
- which template was run
- what files were written
- the key data-quality findings
- any missing metadata, missing modules, or de-identification limits
Related Skills
For public drug, clinical-trial, regulatory, or literature APIs, activate pharma-db-tools.
For study design, evidence synthesis, and reporting-guideline work, activate clinical-research-tools.
For biology databases, activate bio-db-tools.
For hypothesis tests, regression, or survival analysis on cohort tables, activate stat-modeling-tools or survival-analysis-tools.
For review matrices, citation cleanup, or hypothesis and reproducibility planning, activate literature-review-tools or scientific-workflow-tools.