name: finding-data-metacat description: Find and manage Mu2e data using modern metacat, Rucio, and mdh tools. Use when querying with metacat MQL, uploading files, checking locations, prestaging from tape, or working with the future data handling stack. compatibility: Requires mu2einit, muse setup ops, metacat, mdh, Rucio metadata: version: "1.0.0" last-updated: "2026-02-13"
Mu2e Data Handling - Metacat, Rucio, and mdh
Modern data handling tools (replacing SAM as of 2025-2026)
See also: datahandling-overview.md for architecture and datahandling-sam.md for legacy SAM tools (still ~90% in use during transition).
Quick Reference
Setup:
mu2einit
muse setup ops
Common commands:
# Find datasets
metacat dataset list mu2e:* # All production datasets
metacat dataset list $USER:* # Your datasets
# Query files
metacat query files from <dataset> # List files in dataset
metacat file show -m <file> # Show file metadata
# Check locations and status
mdh print-url -s path -l tape <file> # Show dCache path
mdh verify-dataset <dataset> # Check if files on disk/tape
mdh query-dcache -o -v <dataset> # Check ONLINE/NEARLINE status
# Prestage from tape
mdh prestage-files <dataset> # Copy from tape to disk
# Upload personal files
mdh create-metadata <file> > <file>.json # Create metadata
mdh declare-file - # Declare to metacat (from stdin)
mdh copy-file -s -l scratch - # Copy to dCache scratch
Introduction
Modern Mu2e data handling stack:
- metacat: File catalog database and query tool
- Rucio: File location catalog and data movement service
- mdh: Mu2e convenience scripts wrapping metacat and Rucio
- dCache: Distributed storage (tape/persistent/scratch)
Authentication:
- Generally automatic if you have a Kerberos ticket
- Run
mu2einitto get credentials
Dataset Monitor:
- View all datasets and status: https://mu2e.fnal.gov/atwork/computing/ops/datasetMon.html
Files and Datasets
Registered vs Unregistered Files
Registered files:
- Cataloged in metacat database
- Must follow naming conventions
- Located in standard dCache directories
- Subject to collaboration policies
- Can be owned by collaboration (
mu2e) or user (your username)
Unregistered files:
- For local/temporary use
- Not in catalog
- No naming restrictions
- Your scratch area:
/pnfs/mu2e/scratch/users/$USER
File Naming Convention
All registered files must follow the six-field pattern:
tier.owner.description.config.sequencer.format
Example:
sim.MyUsername.dh_tutorial.2026-02-12.001000_000001.art
Fields:
- tier: Data processing level (sim, dig, mcs, nts, etc.)
- owner:
mu2e(collaboration) or your username - description: Brief descriptor for this file type
- config: Configuration identifier or date
- sequencer: Run/subrun number (unique per file)
- format: File format (art, root, tar, tbz, fcl)
Dataset Names
Dataset = file name without the sequencer field:
tier.owner.description.config.format
Example:
sim.MyUsername.dh_tutorial.2026-02-12.art
All files in a dataset are considered "more of the same."
Namespaces and Data Identifiers (DIDs)
Namespace = owner prefix:
owner:tier.owner.description.config.sequencer.format # File DID
owner:tier.owner.description.config.format # Dataset DID
Example:
MyUsername:sim.MyUsername.dh_tutorial.2026-02-12.001000_000001.art # File
MyUsername:sim.MyUsername.dh_tutorial.2026-02-12.art # Dataset
Your namespace:
- Created once (see Rucio wiki)
- Check if it exists:
metacat namespace list $USER
Finding Files
List Datasets
All production datasets:
metacat dataset list mu2e:*
Your datasets:
metacat dataset list $USER:*
View on monitor page:
Query Files in a Dataset
List all files:
DS=mu2e:mcs.mu2e.dh_test.000.art
metacat query files from $DS
Query with conditions:
metacat query files from $DS where rs.first_subrun=2
Count files:
metacat query "files from $DS" | wc -l
Show File Metadata
Display file metadata:
FILE=mu2e:mcs.mu2e.dh_test.000.001200_000002.art
metacat file show -m $FILE
Metadata fields include:
- File size, checksum (CRC), event counts
- Run/subrun information
- Parent files, data tier
- File format, art version
File Locations and Protocols
dCache Storage Areas
Three types of dCache storage:
| Nickname | Description | Path Prefix | Retention |
|---|---|---|---|
| tape | Tape-backed persistent | /pnfs/mu2e/tape/ |
Permanent, migrates to/from tape |
| disk | Persistent disk | /pnfs/mu2e/persistent/ |
Permanent, disk-only |
| scratch | Temporary | /pnfs/mu2e/scratch/ |
Auto-purged in 1-2 weeks |
Find File Locations
Print dCache path for a file:
FILE=mu2e:mcs.mu2e.dh_test.000.001200_000002.art
# Show path in tape area
mdh print-url -s path -l tape $FILE
# Show path in scratch area
mdh print-url -s path -l scratch $FILE
From stdin (pipe files from dataset query):
echo $FILE | mdh print-url -s path -l tape -
List with ls:
ls -l $(mdh print-url -s path -l tape $FILE)
Generate File URLs
Root URLs (for art jobs):
DS=mu2e:mcs.mu2e.dh_test.000.art
# Root URLs for tape-backed files
metacat query files from $DS | mdh print-url -l tape -s root -
Output formats:
-s path: Filesystem path (/pnfs/mu2e/...)-s root: Root protocol URL (root://...)-s xrootd: XRootD URL-s uri: Generic URI
Storage locations:
-l tape: Tape-backed area-l disk: Persistent disk-l scratch: Scratch area
Verify Dataset Locations
Check which files are on disk vs tape:
DS=mu2e:mcs.mu2e.dh_test.000.art
mdh verify-dataset $DS
Query dCache status:
mdh query-dcache -o -v $DS
Status meanings:
NEARLINE: Only on tape (needs prestaging)ONLINE: Only on disk (ready to read)ONLINE_AND_NEARLINE: On both disk and tape
Prestaging Files
Files in tape-backed dCache must be prestaged (copied from tape to disk) before efficient reading.
Check if Prestaging Needed
Query dCache status:
DS=mu2e:mcs.mu2e.dh_test.000.art
mdh query-dcache -o -v $DS
If results show NEARLINE, files are tape-only and need prestaging.
Request Prestaging
Prestage entire dataset:
mdh prestage-files $DS
This command:
- Issues prestage requests to tape system
- Monitors progress periodically
- Blocks until all files are staged
- Can be interrupted and restarted
Monitor-only mode (skip new requests):
mdh prestage-files -m $DS
Verbose progress updates:
mdh prestage-files -v $DS
Prestage time varies:
- Small dataset, drives available: Minutes
- Large dataset, busy drives: Hours to days
Work with Partially Prestaged Datasets
List only files already on disk:
mdh query-dcache -o -v $DS \
| grep ONLINE | awk '{print $2}' \
| mdh print-url -s root -
This generates root URLs only for files ready to read.
Uploading Personal Files
Important: All registered files must use mdh commands to implement proper protocols and policies.
Prerequisites
Check your namespace exists:
metacat namespace list $USER
If missing, create it once: Rucio namespace procedure
Step 1: Create Files with Proper Names
Example: Create test files
SFILE1=sim.${USER}.dh_tutorial.$(date +%F).001000_000001.art
SFILE2=sim.${USER}.dh_tutorial.$(date +%F).001000_000002.art
DS=${USER}:sim.${USER}.dh_tutorial.$(date +%F).art
# Copy from example sources (or create your own)
SDIR=/cvmfs/mu2e.opensciencegrid.org/DataFiles/Validation
cp $SDIR/sim.OWNER.dh_tutorial.CONFIG.001000_000001.art ./$SFILE1
cp $SDIR/sim.OWNER.dh_tutorial.CONFIG.001000_000002.art ./$SFILE2
# Verify names
ls -l sim.*.art
Step 2: Create Metadata
Generate metadata JSON for each file:
mdh create-metadata $SFILE1 > ${SFILE1}.json
mdh create-metadata $SFILE2 > ${SFILE2}.json
Or batch process:
ls -1 sim.*.art | while read FF; do
mdh create-metadata $FF > ${FF}.json
done
# Verify metadata files created
ls -l sim*.art.json
Metadata includes:
- File size, checksums
- Run/subrun, event counts
- Art version, data tier
- Parent files (if applicable)
Step 3: Declare Files to Metacat
Declare all metadata files:
ls -1 sim.*.art.json | mdh declare-file -
The - reads file list from stdin.
Useful switches:
-v: Verbose output-n: Dry-run (show what would be done)- Run
mdh declare-file -hfor all options
Step 4: Copy Files to dCache
Copy to scratch dCache:
ls -1 sim.*.art | mdh copy-file -s -l scratch -
Options:
-s: Read file list from stdin-l scratch: Target scratch area-l disk: Target persistent disk (if approved)
Verify file in dCache:
ls -l $(mdh print-url -l scratch -s path $SFILE1)
Step 5: Verify in Metacat
Query your dataset:
metacat query files from $DS
Show file metadata:
metacat file show -m ${USER}:${SFILE1}
Optional: Declare Locations to Rucio
For permanent datasets or grid job input:
If files will be used in grid jobs or should be tracked by Rucio, declare locations:
# WARNING: Rucio records are permanent!
# Only do this for finalized datasets
mdh locate-dataset -l scratch $DS
Note: The tutorial skips this step for practice datasets because Rucio records cannot be easily deleted.
Cleanup: Delete Test Files
Delete catalog records and physical files:
metacat query files from $DS | mdh delete-files -v -d -l scratch -c -
Options:
-d: Delete physical files-l scratch: From scratch location-c: Delete catalog records-v: Verbose output-: Read file list from stdin
Remove local files:
rm *dh_tutorial*
Metacat Command Reference
Authentication
metacat auth whoami # Show current user
metacat auth login # Log in
metacat auth mydn # Show DN
metacat auth list # List auth tokens
metacat auth export # Export token
metacat auth import # Import token
Dataset Commands
metacat dataset create <dataset> # Create new dataset
metacat dataset show <dataset> # Show dataset info
metacat dataset files <dataset> # List files in dataset
metacat dataset list <pattern> # List datasets matching pattern
metacat dataset add-files <dataset> ... # Add files to dataset
metacat dataset remove-files ... # Remove files from dataset
metacat dataset update <dataset> # Update dataset metadata
metacat dataset remove <dataset> # Remove dataset
File Commands
metacat file show <file> # Show file info
metacat file show -m <file> # Show with metadata
metacat file declare <file> <json> # Declare file with metadata
metacat file declare-many <jsonfile> # Declare multiple files
metacat file datasets <file> # Show datasets containing file
metacat file update <file> # Update file record
metacat file update-meta <file> # Update metadata
metacat file retire <file> # Mark file as retired
metacat file name <fid> # Get file name from ID
metacat file fid <name> # Get file ID from name
Query Commands
Query syntax uses MQL (Metacat Query Language):
metacat query "files from <dataset>"
metacat query "files from <dataset> where <condition>"
metacat query -q <query-file>
Common query patterns:
# Files in dataset with conditions
metacat query "files from mu2e:dataset.art where rs.first_subrun > 100"
# Files by metadata
metacat query "files where dh.tier=sim and file_size > 1000000000"
# Multiple datasets
metacat query "files from mu2e:dataset1.art, mu2e:dataset2.art"
Namespace Commands
metacat namespace list <pattern> # List namespaces
metacat namespace show <namespace> # Show namespace info
metacat namespace create <namespace> # Create namespace
Other Commands
metacat category list # List metadata categories
metacat category show <cat> # Show category details
metacat named_query create ... # Create named query
metacat named_query list # List named queries
metacat named_query show <name> # Show named query
metacat version # Show versions
metacat validate <jsonfile> # Validate metadata JSON
mdh Command Reference
File Operations
Compute CRC checksum:
mdh compute-crc <file>
Print file URLs/paths:
mdh print-url [options] <file>
-s path|root|xrootd|uri # Output format
-l tape|disk|scratch # Storage location
- # Read files from stdin
Query dCache status:
mdh query-dcache [options] <dataset>
-o # Show online/nearline status
-v # Verbose output
Create metadata:
mdh create-metadata <file> > <file>.json
Catalog Operations
Declare files to metacat:
mdh declare-files [options]
- # Read file list from stdin
-v # Verbose
-n # Dry-run
Locate dataset in Rucio:
mdh locate-dataset [options] <dataset>
-l tape|disk|scratch # Storage location to register
Data Movement
Copy files to/from/within dCache:
mdh copy-files [options]
-s # Read file list from stdin
-l tape|disk|scratch # Target location
-v # Verbose
Prestage files from tape:
mdh prestage-files [options] <dataset>
-m # Monitor only (skip new requests)
-v # Verbose progress updates
Dataset Management
Verify dataset:
mdh verify-dataset <dataset>
Checks:
- File existence in dCache
- Location status (tape/disk)
- File counts match catalog
Delete files:
mdh delete-files [options]
-c # Delete catalog records
-d # Delete physical files
-l tape|disk|scratch # Location to delete from
-v # Verbose
- # Read file list from stdin
Grid Operations
Upload grid job outputs:
mdh upload-grid [options]
See mdh upload-grid -h for details on grid upload workflows.
Common Workflows
1. Find and Query a Dataset
# Search for datasets
metacat dataset list mu2e:*beam-g4s1*
# Pick a dataset
DS=mu2e:sim.mu2e.example-beam-g4s1.1812a.art
# List files
metacat query files from $DS
# Count files
metacat query files from $DS | wc -l
# Show first file metadata
FILE=$(metacat query files from $DS | head -1)
metacat file show -m mu2e:$FILE
2. Check Location and Prestage
DS=mu2e:sim.mu2e.example-beam-g4s1.1812a.art
# Check if files are on disk
mdh verify-dataset $DS
# Query detailed status
mdh query-dcache -o -v $DS
# If NEARLINE (tape-only), prestage
mdh prestage-files -v $DS
# Generate root URLs for art job
metacat query files from $DS | mdh print-url -l tape -s root -
3. Upload Personal Simulation
# Create files with proper names
MYFILE=sim.${USER}.myanalysis.$(date +%F).001000_000001.art
MYDS=${USER}:sim.${USER}.myanalysis.$(date +%F).art
# (Generate or copy your file to $MYFILE)
# Create and declare metadata
mdh create-metadata $MYFILE > ${MYFILE}.json
echo ${MYFILE}.json | mdh declare-file -
# Copy to dCache scratch
echo $MYFILE | mdh copy-file -s -l scratch -
# Verify
metacat query files from $MYDS
ls -l $(mdh print-url -l scratch -s path ${USER}:${MYFILE})
4. Work with Partially Staged Dataset
DS=mu2e:dig.mu2e.CeEndpointMix1BBTriggered.MDC2020ar_best_v1_3.art
# Start prestaging in background
mdh prestage-files -v $DS &
# While waiting, work with files already on disk
mdh query-dcache -o -v $DS \
| grep ONLINE | awk '{print $2}' \
| mdh print-url -s root - > online_files.txt
# Use online_files.txt as input to art job
Tips and Best Practices
File naming:
- Always follow six-field convention for registered files
- Use meaningful description field
- Sequencer must be unique across all your files
Storage choices:
- Scratch: Testing, temporary results (auto-purged)
- Persistent disk: Important user files (must request quota)
- Tape-backed: Large datasets, permanent storage
Prestaging:
- Request prestaging early (can take days for large datasets)
- Use
-vflag to monitor progress - Consider prestaging overnight for large datasets
Metadata:
mdh create-metadataextracts most fields automatically from art files- Verify JSON before declaring to metacat
- Cannot easily change metadata after declaration
Cleanup:
- Delete test files from scratch when done
- Use
mdh delete-files -d -l scratch -c -to remove both files and records
Namespaces:
- Create your namespace once before first upload
- Namespace = your username
- Cannot be deleted once created
Transition from SAM
If you're familiar with SAM, here are key differences:
| SAM | Metacat/mdh | Notes |
|---|---|---|
samweb |
metacat or mdh |
mdh wraps many common operations |
samweb list-files |
metacat query |
MQL is more powerful |
samweb get-metadata |
metacat file show -m |
Similar output |
samweb locate-file |
mdh print-url |
Multiple output formats |
samweb prestage-dataset |
mdh prestage-files |
Similar functionality |
| Dataset dimensions | MQL queries | Different syntax, more flexible |
dh.dataset=... |
files from ... |
Dataset specification |
SAM files still work during transition:
- Most production datasets are in both SAM and metacat
- Use whichever tool is most convenient for your workflow
- New files should use metacat/mdh
Documentation and References
Primary resources:
- Data Handling Tutorial: https://mu2ewiki.fnal.gov/wiki/DataHandlingTutorial
- Metacat Documentation: https://fermitools.github.io/metacat
- Rucio Wiki: https://mu2ewiki.fnal.gov/wiki/Rucio
- Dataset Monitor: https://mu2e.fnal.gov/atwork/computing/ops/datasetMon.html
- File Naming Convention: https://mu2ewiki.fnal.gov/wiki/FileNames
- dCache Information: https://mu2ewiki.fnal.gov/wiki/Dcache
Getting help:
- Slack:
#help-datachannel - Email: mu2e-data-handling@fnal.gov
- Computing help page: https://mu2ewiki.fnal.gov/wiki/ComputingHelp
Last updated: February 2026 based on tutorial (January 2026) and tool help