name: big-job description: Run long-running shell commands (builds, tests, data pipelines, scraping, training runs, etc.) as detached background jobs via systemd transient units, with journal-based output capture and lifecycle management. Use this skill whenever the user asks you to run something that will take more than a few seconds — large test suites, long builds, batch processing, data migrations, ML training, or any command the user explicitly wants running "in the background". Also use it when you need to run multiple heavy commands in parallel without blocking. Trigger on phrases like "run in background", "long running", "don't wait for it", "kick off a build", "run tests and come back", "start a training run", "batch process", or any task where blocking the conversation would be annoying.
Big Jobs
Run heavy or long-running shell commands as detached systemd transient services. Each job gets its own user unit with journal-based output, exit code tracking, and OOM protection. All units use the bigjob- prefix for easy filtering.
Naming convention
Every job unit is named bigjob-YYYYMMDD-HHMMSS-jobname where:
- The timestamp is the current datetime (evaluate it yourself, do NOT use shell
$(date ...)) jobnameis a short kebab-case label describing the task (e.g.train,build-frontend,run-tests)
Example: bigjob-20260305-143021-train
Use JOBNAME as shorthand for the full unit name in commands below.
Commands
Start a job
By default, protect the machine with a hard cgroup memory cap: reserve 24 GiB for the OS, desktop, Docker, databases, and agent processes, then let the job use the rest. If the job exceeds the cap, systemd/kernel OOM-kills the job's unit instead of letting the whole machine thrash.
MEMORY_RESERVE_GB=24
TOTAL_MEMORY_GB=$(awk '/MemTotal/ {print int($2/1024/1024)}' /proc/meminfo)
MEMORY_MAX_GB=$((TOTAL_MEMORY_GB - MEMORY_RESERVE_GB))
[ "$MEMORY_MAX_GB" -lt 4 ] && MEMORY_MAX_GB=4
systemd-run --user \
--unit=JOBNAME \
--slice=bigjobs.slice \
--remain-after-exit \
-p MemoryAccounting=yes \
-p MemoryMax="${MEMORY_MAX_GB}G" \
-p MemorySwapMax=0 \
-p OOMPolicy=kill \
-p OOMScoreAdjust=500 \
-p StandardOutput=journal \
-p StandardError=journal \
--working-directory=/absolute/path/to/project \
-E PYTHONUNBUFFERED=1 \
-- command arg1 arg2
MemoryMaxis a hard cgroup limit set to installed RAM minusMEMORY_RESERVE_GB; on a 122 GiB machine the default cap is about 98 GiBMemorySwapMax=0prevents the job cgroup from using swap if swap exists; with swap disabled globally, this is still harmless and explicitOOMPolicy=killmakes memory failure kill the unit rather than leave children behindOOMScoreAdjust=500biases kernel OOM selection toward big jobs over interactive/system processes--slice=bigjobs.slicegroups heavy jobs together for easier inspection and future policy--remain-after-exitkeeps the unit visible after the command finishes so we can check status and read output- Output goes to the systemd journal, queryable with
journalctl - The unit is fully detached — survives agent restarts
After starting, tell the user the unit name, then immediately start waiting for the job (see below). Never just start a job and stop — always follow up with the wait command in the same turn.
Wait for job (always do this after starting)
while [ "$(systemctl --user show -p SubState --value JOBNAME)" = "running" ]; do sleep 2; done; \
SUB=$(systemctl --user show -p SubState --value JOBNAME); \
RESULT=$(systemctl --user show -p Result --value JOBNAME); \
CODE=$(systemctl --user show -p ExecMainStatus --value JOBNAME); \
TAIL=$(journalctl --user -u JOBNAME -o cat --no-pager | tail -80); \
printf 'SubState=%s Result=%s ExecMainStatus=%s\n%s\n' "$SUB" "$RESULT" "$CODE" "$TAIL"
This polls until the command finishes, reports the final state/result/exit code, then dumps the last 80 lines of output. This must always be run as a background Bash call immediately after starting a job. The agent will be notified when the job finishes. If the job exceeds its memory cap, expect a nonzero status and often Result=oom-kill or an exit status such as 137.
Check status
systemctl --user show -p SubState -p Result -p ExecMainStatus -p MemoryCurrent -p MemoryPeak JOBNAME
Prints named properties:
SubState:running|exited|failedResult:success|exit-code|oom-kill| other systemd resultExecMainStatus: the exit code (0 = success)MemoryCurrent: current cgroup memory usage when availableMemoryPeak: peak cgroup memory usage when available
Read output
# Last 50 lines
journalctl --user -u JOBNAME -o cat --no-pager | tail -50
# All output
journalctl --user -u JOBNAME -o cat --no-pager
# Follow (live stream)
journalctl --user -u JOBNAME -o cat -f
Kill a job
# Graceful (SIGTERM)
systemctl --user kill JOBNAME
# Force (SIGKILL)
systemctl --user kill -s KILL JOBNAME
# Stop entirely (also removes the transient unit)
systemctl --user stop JOBNAME
List jobs
systemctl --user list-units 'bigjob-*' --all --no-pager
Shows all bigjob units with their state (running/exited/failed).
Clean up finished jobs
# Stop a specific finished unit (removes it since it's transient)
systemctl --user stop JOBNAME
# Stop all finished (exited) bigjob units
systemctl --user list-units 'bigjob-*' --all --no-pager --plain \
| awk '$4=="exited" {print $1}' \
| xargs -r systemctl --user stop
# Reset any failed units
systemctl --user reset-failed 'bigjob-*'
Guidelines
- Always tell the user the unit name after starting
- Always wait immediately — after starting a job, always run the wait loop as a background Bash call in the same turn. Never start a job without also starting to wait for it
- Default to background for anything > 30 seconds
- Check exit codes — don't assume success; read output on failure
- Use
tailfor output — output can be huge; don't dump everything unless needed - Memory safety — keep the default all-but-24G cap unless the user explicitly asks for a different reserve; increase
MEMORY_RESERVE_GBwhen the desktop, databases, Docker, or vLLM services must stay responsive - Working directory — always pass
--working-directorywith an absolute path - Reconnection — use
systemctl --user list-units 'bigjob-*'to find jobs after restart