name: tinker description: Run LLM post-training on Tinker with CPU-side orchestration and remote GPU execution. Use when preparing or launching SFT, DPO, or PPO-style runs, checkpointing, sampling checkpoints, or resuming long-running jobs through Hermes Research Agent. version: 1.0.0 author: Hermes Research Agent license: MIT metadata: hermes: tags: [tinker, training, sft, dpo, ppo, checkpoints, lora]
Tinker
Tinker is the only compute backend in Hermes Research Agent v1.
Requirements
TINKER_API_KEYmust be set.- Use
tinker_posttrainfor lifecycle management instead of ad hoc shell commands when possible.
Supported methods
sft: dataset rows needpromptandcompletiondpo: dataset rows needprompt,chosen, andrejectedppo: dataset rows needprompt,completion, token-levellogprobs, and eitheradvantageorreward
Default operating pattern
- Validate the config with
tinker_posttrain(action="validate_config", ...). - Start the run with
tinker_posttrain(action="start_run", ...). - Let
research_loop(action="monitor_run", ...)manage long-running polling and resumption. - Use
tinker_posttrain(action="sample_checkpoint", ...)for quick qualitative checks. - Use
tinker_posttrain(action="download_checkpoint", ...)only when the project needs local checkpoint artifacts.
Run hygiene
- Always tie the run to an experiment id and a written hypothesis.
- Save checkpoints regularly.
- Record the outcome with
research_state(action="record_result", ...). - If approval mode is active, expect
start_run,resume_run, andstop_runto require an approval record.