name: authoring-challenges
description: >-
Author, review, or revise pwn.college challenges and curriculum in this repo — any
module, any archetype (templated web/service, interpreted "checker", compiled SUID
binary, or legacy). Use when creating a new challenge, level, module, or dojo;
designing a curriculum sequence; writing DESCRIPTION.md / module.yml / dojo.yml;
authoring challenge code (Jinja2 templates, Python checkers, C/asm, shell); or fixing
challenge mechanics (flag delivery, SUID/exec-suid, ASLR, victims, visibility).
Encodes hard-won rules that previously required user correction: one concept per
level, extend the common templates instead of hand-rolling, deterministic solves with
the flag granted only on a real solve, verified-only learner-facing claims, minimum
moving parts, and testing ONLY under nix develop.
Authoring pwn.college challenges
Forward-looking rules for creating/revising challenges and curriculum in this repo with minimal back-and-forth. The behavioral rules (§1, §4, §5, §6) are universal across modules; the mechanics (§2, §3) adapt per archetype. Every rule exists because skipping it caused a real user correction. Read this fully, then open the reference for your archetype.
0. Orient before you build
- Identify the archetype first (see §7) and extend the module's common templates
instead of hand-rolling. The single biggest way to keep a challenge minimal and
idiomatic is to
{% extends %}/{% include %}the existingcommon/template (flask.py.j2,cmdi.py.j2,sqli-pw.py.j2,common/check,secret-value-checker.py,common/Dockerfile.j2, …) and set variables in{% block setup %}withsuper(). - Confirm the target module. New levels go in the active module the task points at,
in dojo/module order. Never author in
challenges/legacy/(it pulls prebuilt binaries from external dojos — not where new content goes). If placement is ambiguous, state your chosen path in one line and proceed. - Read 2–3 sibling levels first — their
DESCRIPTION.md,challenge/,module.yml, and tests. The pattern you need almost always already exists. - Never declare a mechanism impossible until you've checked how siblings do it. The
blessed pattern (a
bin/gdbexec-suid wrapper, asubmit-number, a victim script, a seccomp allowance inruntime/) is in the repo, not a blank slate or the user's head.
1. Pedagogy — the most-corrected area (universal)
- One new concept per level. Count what a level introduces and compare to its neighbors; if it teaches more than one, split the surplus into its own level — don't hide the overload in prose. (A level that bundled five concepts as a "capstone" had to be torn apart.)
- Respect sequencing. Don't introduce advanced topics (e.g. ASLR) early, even as a
footnote/
NOTE. They get their own much-later level. - The phenomenon you teach must be REAL, not manufactured by your own scaffolding.
If the lesson only "works" because of a wrapper you added (e.g. faking a tool's
behavior with
env -i), the design is wrong — redesign, don't ship the contrivance. - No unverified claims in learner-facing text. Never assert how a tool behaves unless you've tested it. (Shipping the false "GDB does not pass your shell's environment" in a DESCRIPTION was a hard correction.) A 3-line empirical check is cheap.
- When a design feels contrived, stop and offer 2–4 genuinely different designs ranked by honesty/cleanliness — don't keep patching and defending the contrivance.
- Match the house voice, scaled to the module. Beginner modules (computing-101,
linux-luminarium) explain everything warmly; advanced modules are terse. Read siblings
to calibrate; keep DESCRIPTIONs to the length and cadence of their neighbors. For the
per-challenge DESCRIPTION structure (recap → one new concept + why → success
condition → how-to + win) and a writing checklist, see
references/curriculum-and-conventions.md→ "Writing the DESCRIPTION.md".
2. Determinism & flag delivery (the rule that got over-generalized — read carefully)
- The intended solution must be deterministic — it works every time, no guessing. Disable nondeterminism that would break the intended technique (e.g. ASLR off for a fixed-address lesson). What the user objected to was a solve whose target value was random and whose flag came through a side channel, not randomness as such.
- Randomness is legitimate and often desirable — as long as it's not a coin-flip in
the solve path: seeded build-time secrets (endpoint/parameter/table names via
random_names.j2, admin passwords), and per-instance/per-run secret values the learner discovers through the intended technique (e.g.secret_value = random.randint(...)that the checker knows and validates). These prevent memorized/hardcoded solutions. Builds are seeded (CHALLENGE_SEED) so they're reproducible. - Grant the flag only on a real solve, via the archetype's normal mechanism — don't
bolt on indirection the design doesn't need:
- SUID binary: open
/flagpre-drop and print it when the success condition is met. - Interpreted checker: set
give_flag = True(the harness prints/flag), or have the learner's own code emit it (give_flag = False). - Web/service: the app reveals
/flagon successful exploit (admin login, injection, victim exfil). submit-number: the blessed pattern for "read a value, submit it" levels — use it when that is the lesson; don't graft it onto a challenge that should hand over the flag directly.
- SUID binary: open
3. Minimum moving parts & code style (universal)
- Fewest parts that satisfy the literal spec. Don't add wrappers, launchers, extra
binaries,
privileged: true, or indirection the spec doesn't need. (bin/gdbwrappers andsubmit-numberare fine when the archetype calls for them — the error was adding one to a challenge that didn't.) If a constraint seems to need extra machinery, surface it and ask. - Write it review-ready on the first pass (first-draft code was once ~1.6× too long
and sent back as "really verbose and weird"). Applies to every language:
- C:
<err.h>(err/errx/warn), notperror+return ladders; factor repeated read/write & open-then-dup2into helpers; comment the why. Don't reimplement what the kernel gives you (inherited FDs surviveexecve— no memfd copy of/flag). - Python checkers / Jinja / shell: extend the common base, set a few
{% block setup %}vars, override only what differs; don't paste a whole service when a block override suffices. Shell: compact[ cond ] || { echo msg >&2; exit 1; }guards.
- C:
4. Testing & integrity — non-negotiable (universal)
- Run
pwnshop testONLY insidenix develop. Outside it, pwnshop silently falls through to the host dockerd (stock seccomp) andpersonality/ASLR challenges fail withpersonality: Operation not permitted. That EPERM means "enter the dev shell," not "patch pwnshop/the challenge." See [[feedback_fix_at_devshell_layer]]. tests_private/test_*.shruns the actual solve and confirms the real flag — build the reference solution, run it, andgrep "pwn.college{".tests_publicchecks functionality. Test files must be executable.- Never claim tests pass without running them and reading the output; qualify a pass with its environment. Per the global rule, all tests must pass at all times — there is no "preexisting flakiness."
- A challenge that passes only because of an invisible ambient runtime affordance (the
patched dockerd behind
DOCKER_HOST) is a latent trap — make the dependency explicit. - When
pwnshopflakes undernix develop(transient build/network blip), retry the same command — don't pivot to the host daemon or hand-rolleddocker runfor a green.
5. Git & communication (universal)
- On an unpushed WIP branch, just amend an obvious fix — no permission menus.
- State git persistence precisely: working-tree vs committed vs pushed. Never let "Done" imply more durability than exists.
- Never attribute an uncommitted change to "the user" as fact — show evidence (mtime,
git log) and say what you actually know. Commit/push only when asked.
6. Universal conventions quick-reference
- Curriculum wiring: a dojo has one
dojo.yml(name/type/id/award/orderedmodules:); each sub-module has amodule.ymllisting orderedresources:—lecture(video/playlist/slides),markdown(content), andchallenge(id/name/ optional description/visibility). Order = display order. - Hide a not-yet-released challenge via
visibility: { start: "<ISO-8601 w/ tz>" }on itsmodule.ymlentry (far-future = hidden). Validate withtools/dojo/parse-dojo-yml. - Every challenge ships
challenge/Dockerfile.j2— usually{% include "…/common/Dockerfile.j2" %}(override{% block additional_packages %}/etc.). .setupruns at build time (compile, then delete source);.init/.init.j2runs at container start (per-instance files/perms/services). Flag is always at/flag.- SUID: compiled binary →
chmod 4755in.setup; interpreted script → shebang#!/usr/bin/exec-suid -- /usr/bin/python3 -I(or--real/--environ=nonevariants). tests_private/**is git-crypt-encrypted per module (.gitattributes:**/tests_private/** filter=git-crypt-<MODULE>). Borrow the key from a sibling checkout under~/pwncollege/challenges*/.git/git-crypt/keys/instead of prompting for a GPG passphrase.
7. Archetypes — pick one, then open its reference
- Templated web / service (Flask, sqli, cmdi, path-traversal, xss; victims) →
references/service-and-web-challenges.md - Interpreted "checker" (asm/ELF graded by
common/check+chal.py; memory/pointer viasecret-value-checker.py;submit-number) →references/interpreted-checker-challenges.md - Compiled SUID binary (deterministic addresses,
personality+memfd, gdb-under-ptrace wrapper, cross-context sentinels) →references/binary-challenge-recipes.md - Curriculum scaffolding & house style (dojo.yml/module.yml, resources, visibility,
Dockerfile/.setup/.init, exec-suid, git-crypt, DESCRIPTION voice) →
references/curriculum-and-conventions.md - Legacy (
challenges/legacy/**) pulls prebuilt binaries from external dojos viaDockerfile.j2+ a stubtest_legacy.sh.j2; don't author new content here.
Recurring failure modes to self-check against
- Hand-rolled what a common template already provides (didn't
extends/include). - Added complexity nobody asked for (extra binary, wrapper,
privileged, indirection). - Made an unverified factual claim in learner text or to the user.
- Made the solve nondeterministic, or gated the flag behind a guess / needless side channel.
- Crammed more than one concept into a level, or mismatched the module's voice.
- Didn't consult siblings /
runtime/before building or before giving up. - Claimed success (tests, push) without verifying, or attributed a change to the user.