name: rdetoolkit
description: >
Guide development of RDE (Research Data Express) structured programs using
rdetoolkit — a Python framework by NIMS for research data registration workflows.
Covers project scaffolding, dataset function implementation, processing mode
selection (Invoice / ExcelInvoice / MultiDataTile / RDEFormat), template editing,
schema & metadata validation via CLI, encoding-safe file I/O with rdetoolkit.fileops,
and CSV-to-graph generation with rdetoolkit.graph.
MUST be used whenever code imports rdetoolkit, calls workflows.run(), reads/writes
JSON in research-data contexts, processes CSV for graphing, edits invoice.schema.json
or metadata-def.json, or runs rdetoolkit validate or rdetoolkit init commands.
Also activate when the user mentions RDE, structured processing, NIMS, materials data,
research data registration, or any rdetoolkit module.
license: MIT
metadata:
author: nims-mdpf
version: "1.0"
docs: https://nims-mdpf.github.io/rdetoolkit/
repository: https://github.com/nims-mdpf/rdetoolkit
RDEToolKit — Structured Program Development Guide
RDEToolKit is a Python framework by NIMS (National Institute for Materials Science) that automates research data registration into RDE. It handles directory scaffolding, file validation, metadata extraction, thumbnail generation, and graph creation — so you only write the domain-specific data transformation logic.
Docs: https://nims-mdpf.github.io/rdetoolkit/ Repo: https://github.com/nims-mdpf/rdetoolkit
Quick Start
1. Initialize a project
pip install rdetoolkit
rdetoolkit init # or: python3 -m rdetoolkit init
This generates the standard layout:
container/
├── main.py
├── requirements.txt
├── modules/
└── data/
├── inputdata/ # Place experimental data here
├── invoice/
│ └── invoice.json
└── tasksupport/
├── invoice.schema.json
└── metadata-def.json
2. Write a dataset function (recommended signature)
from rdetoolkit.models.rde2types import RdeDatasetPaths
def dataset(paths: RdeDatasetPaths) -> None:
# Read input from paths.inputdata
# Write outputs to paths.struct
...
3. Wire the entry point
import rdetoolkit
from modules.my_module import dataset
rdetoolkit.workflows.run(custom_dataset_function=dataset)
4. Run locally
python3 main.py
Critical Rules — Always Follow These
Use rdetoolkit APIs, Do NOT Reinvent
Research data files often use legacy encodings (Shift_JIS, EUC-JP, CP932).
Standard Python open() / json.load() will crash on these files.
Always use rdetoolkit's encoding-aware functions.
File I/O (rdetoolkit.fileops)
| Task | ✅ Use this | ❌ Never do this |
|---|---|---|
| Read JSON | rdetoolkit.fileops.read_from_json_file(path) |
json.load(open(path)) |
| Write JSON | rdetoolkit.fileops.write_to_json_file(path, data) |
json.dump(data, open(path, 'w')) |
| Detect encoding | rdetoolkit.fileops.detect_encoding(path) |
Raw chardet.detect() |
# ✅ CORRECT — handles Shift_JIS, EUC-JP, CP932 transparently
from rdetoolkit.fileops import read_from_json_file, write_to_json_file
metadata = read_from_json_file(paths.meta / "metadata.json")
write_to_json_file(paths.struct / "output.json", result)
# ❌ WRONG — will raise UnicodeDecodeError on legacy-encoded files
import json
with open(paths.meta / "metadata.json") as f:
metadata = json.load(f)
CSV-to-Graph (rdetoolkit.graph)
For simple XY-axis graphs from CSV data, use csv2graph before writing matplotlib code. It generates publication-ready plots in one call.
from rdetoolkit.graph import csv2graph
# Generates XY line graph from CSV and saves to output directory
csv2graph(csv_path, output_dir)
See references/preferred-apis.md for full options and examples.
Metadata Writing (rdetoolkit.models.metadata.Meta)
ALWAYS use the Meta class to write metadata.json. Do NOT write it manually with json.dump().
from rdetoolkit.rde2util import Meta
def save_metadata(metadata: dict[str, str], metadata_def_json_path, save_path):
meta = Meta(metadata_def_json_path)
meta.assign_vals(metadata) # All values MUST be strings
meta.writefile(str(save_path))
Error Handling (Result Type — REQUIRED)
All helper functions in structured processing MUST use the Result type for error handling. Do NOT wrap the entire dataset() function in a single try/except block.
from rdetoolkit.result import Result, Success, Failure
def parse_data(filepath: Path) -> Result[pd.DataFrame, str]:
try:
# ... parsing logic ...
return Success(df)
except Exception as e:
return Failure(f"Failed to parse: {e}")
def dataset(paths: RdeDatasetPaths) -> None:
result = parse_data(paths.inputdata / "data.csv")
if result.is_failure():
raise RuntimeError(result.error)
df = result.unwrap()
# ❌ WRONG: Giant try/except hides all errors
def dataset(paths: RdeDatasetPaths) -> None:
try:
# ... 100 lines ...
except Exception as e:
print(f"Error: {e}")
Dataset Function Signature
# ✅ RECOMMENDED — single-argument style (v1.4+)
from rdetoolkit.models.rde2types import RdeDatasetPaths
def dataset(paths: RdeDatasetPaths) -> None:
...
# ⚠️ LEGACY — two-argument style (still works, but do not use for new code)
from rdetoolkit.models.rde2types import RdeInputDirPaths, RdeOutputResourcePath
def dataset(inputdata: RdeInputDirPaths, output: RdeOutputResourcePath) -> None:
...
Path Access
Use the RdeDatasetPaths attributes. Do NOT hardcode paths.
| Attribute | Purpose |
|---|---|
paths.inputdata |
Input data directory |
paths.struct |
Structured output directory |
paths.meta |
Metadata directory |
paths.thumbnail |
Thumbnail output directory |
paths.raw |
Raw file copy destination |
paths.invoice |
Invoice file path |
paths.tasksupport |
Task support files directory |
Processing Modes
Choose the mode that matches your data registration scenario.
Set it in rdeconfig.yaml under system.extended_mode.
| Mode | Config value | When to use |
|---|---|---|
| Invoice | (default, no config needed) | Single data file, basic registration |
| ExcelInvoice | ExcelInvoice |
Batch registration with per-item metadata in Excel |
| MultiDataTile | MultiDataTile |
Multiple files sharing the same metadata |
| RDEFormat | RDEFormat |
Pre-formatted RDE data, system integration |
Mode selection flowchart
How many files per registration?
├── One file → Invoice mode (default)
└── Multiple files
├── Each file needs different metadata?
│ ├── Yes → ExcelInvoice mode
│ └── No (shared metadata) → MultiDataTile mode
└── Data already in RDE format? → RDEFormat mode
Configuration example
# rdeconfig.yaml
system:
extended_mode: 'MultiDataTile' # or 'ExcelInvoice', 'RDEFormat'
save_raw: true
magic_variable: true
save_thumbnail_image: true
See references/modes.md for detailed mode descriptions and examples.
CLI Workflow — Correct Order Matters
Template editing and validation MUST follow this sequence. Running them out of order causes confusing validation errors.
Step 1: Edit templates (in this order)
data/tasksupport/invoice.schema.json— Define the schema firstdata/tasksupport/metadata-def.json— Configure metadata definitionsdata/invoice/invoice.json— Fill in values conforming to the schema
Step 2: Validate (in this order)
# 1. Check schema syntax itself
rdetoolkit validate invoice-schema data/tasksupport/invoice.schema.json
# 2. Check invoice conforms to schema
rdetoolkit validate invoice data/invoice/invoice.json \
--schema data/tasksupport/invoice.schema.json
# 3. Check metadata definition
rdetoolkit validate metadata-def data/tasksupport/metadata-def.json
# 4. Full project validation (all of the above at once)
rdetoolkit validate all
Step 3: Run structured processing
python3 main.py
See references/cli-workflow.md for all CLI commands and CI/CD integration.
Project Structure Reference
container/
├── main.py # Entry point: calls workflows.run()
├── requirements.txt # Additional Python dependencies
├── modules/
│ └── my_module.py # Your dataset() function lives here
├── rdeconfig.yaml # Optional: mode & behavior config
└── data/
├── inputdata/
│ └── <your experimental data>
├── invoice/
│ └── invoice.json # Data registration metadata
└── tasksupport/
├── invoice.schema.json # JSON Schema for invoice validation
└── metadata-def.json # Metadata field definitions
Building Structured Processing Autonomously
When asked to create a new RDE structured processing program, follow this sequence:
- Analyze the user's input data file format and identify extractable metadata
- Create
metadata-def.json— define fields with bilingual names (ja/en) and types - Create
invoice.schema.json— define the registration form schema - Create
invoice.json— fill values conforming to the schema - Implement
dataset()function — parse data, save metadata via Meta class, create structured CSV, generate plots - Wire
main.py—rdetoolkit.workflows.run(custom_dataset_function=dataset) - Validate —
rdetoolkit validate all, thenpython3 main.py
Each helper function in the dataset module MUST return a Result type.
Metadata MUST be saved via the Meta class (not manual JSON writes).
File I/O MUST use rdetoolkit.fileops.
If the user specifies a directory structure or coding pattern, follow their instructions. Otherwise, use the default patterns described here.
See references/building-structured-processing.md for the complete pattern with full code examples, directory specifications, metadata-def.json format, Meta class usage, Result-type error handling, and a submission checklist.
Common Mistakes and Fixes
| Symptom | Cause | Fix |
|---|---|---|
UnicodeDecodeError reading JSON |
Using json.load() directly |
Use rdetoolkit.fileops.read_from_json_file() |
Validation error on invoice.json |
Edited invoice before defining schema | Edit invoice.schema.json first, then invoice.json |
extended_mode not recognized |
Typo in config value | Must be exactly ExcelInvoice, MultiDataTile, or RDEFormat |
| Missing output files after run | Writing to wrong directory | Use paths.struct from RdeDatasetPaths, not hardcoded paths |
| Graph not generated | Using matplotlib manually for simple XY | Try rdetoolkit.graph.csv2graph() first |
| metadata.json missing or malformed | Writing JSON manually | Use Meta class: meta.assign_vals() + meta.writefile() |
| Errors silently swallowed | Giant try/except around dataset() | Use Result type in helpers, check .is_failure() per step |
References
- Full documentation: https://nims-mdpf.github.io/rdetoolkit/
- API reference: https://nims-mdpf.github.io/rdetoolkit/en/api/
- Repository: https://github.com/nims-mdpf/rdetoolkit
- Contributing guide: https://github.com/nims-mdpf/rdetoolkit/blob/main/CONTRIBUTING.md
- 5-mode templates: https://github.com/nims-mdpf/RDE_rdetoolkit_5mode_templates
Reference files in this skill
- references/building-structured-processing.md — Complete guide for building structured processing from scratch (dataset function pattern, metadata writing, Result-type error handling, directory specs, checklist)
- references/preferred-apis.md — Detailed fileops and csv2graph usage patterns
- references/modes.md — Deep dive into each processing mode
- references/cli-workflow.md — Complete CLI reference and CI/CD integration
- references/config.md — Configuration file specification (rdeconfig.yaml, pyproject.toml)