name: infra-advisor
description: Use when the user describes a goal at the CERN/HEP infrastructure level and wants to know WHICH services to stitch (e.g. "I want distributed analysis on Open Data and to train an ML model"). Returns a 2–4-service stack with steps and pointers to the matching execution skill (rucio, reana-workflows, physlite-basics, …). Bundles a service catalogue (reference/catalog.yaml), pre-cooked recipes (reference/recipes.md), and digests on GPU access, SWAN+HTCondor scale-out, and columnar frameworks. Does NOT cover physics methodology, single-tool command-level help, or running specific code (use the dedicated skill named by the recommendation). Disambiguator phrase: CERN service stack composer.
data_scope: both
experiment: all
Scope
Use this skill when the user is orienting, not executing. Signals:
- They describe a goal in English ("I want to …") rather than asking for a command.
- The goal spans multiple infra concerns: data access and compute and ML/reproducibility.
- They don't yet know what to pick between lxplus, SWAN, Binder, REANA, coffea-casa, ml.cern.ch, etc.
Don't use this skill when:
- The user already picked a tool and needs command-level help → use
the dedicated skill (
rucio,reana,reana-workflows,atlas-opendata,cern-opendata,physlite-basics, …). - The question is physics methodology (how to measure σ·BR, which
background to simulate) → use
sm-analyses/atlas-notebooks. - The user wants a single fact about one service (URL, auth type) →
read
reference/catalog.yamldirectly, no advisory needed.
Reference files
Read lazily. All one level deep from this SKILL.md.
reference/catalog.yaml— structured service catalogue. One record per tool (purpose, audience, auth, entry URL, when-to-use, when-to-avoid, integrations, matching skill). Grep or read it.reference/recipes.md— pre-cooked stacks for common compound goals (public exploration, distributed analysis, reproducibility, ML training + serving, non-public datasets).reference/cern-gpus.md— distilled from https://clouddocs.web.cern.ch/gpu_overview.html. Read when the goal needs a GPU (training, vGPU, CUDA debugging). Routes between lxplus-gpu, SWAN (T4), HTCondor GPU, ml.cern.ch, OpenStack g*.reference/swan-htcondor.md— distilled from https://swan.docs.cern.ch/condor/intro/. Read when the user wants to develop in a SWAN notebook and scale out to the CERN HTCondor pool from the same session (interactive-plus-batch).reference/columnar-frameworks.md— uproot/awkward vs coffea vs (distributed) RDataFrame. Read when the user asks "which framework" or when a recipe needs to name one.
Pick the right file by intent: recipes.md when the user's goal
matches a named compound recipe; catalog.yaml when composing bespoke;
the three digest files when the topic (GPUs, SWAN scale-out,
columnar frameworks) is the centre of the question.
Drift policy
The three digest files pin their canonical upstream URL at the top, plus the date they were last refreshed. If you suspect drift (user reports a command that doesn't work, or the digest is older than ~6 months), WebFetch the canonical URL and compare before answering from the digest. Do not bulk-re-fetch — only when you have a concrete reason to suspect the note is stale.
Intake: five questions to triangulate
Before recommending, confirm these five axes. Ask any that aren't obvious from the user's message — skip the ones already answered.
- Data source — ATLAS Open Data release? CERN Open Data record (CMS/LHCb/ALICE)? Non-public experiment data? User's own files?
- Audience / auth tier — Public user (no CERN account)? CERN account (lxplus/SWAN/GitLab)? ATLAS/CMS member (grid, Rucio)?
- Scale — Laptop-minutes? A few hours on one node? Hundreds of core-hours distributed? GPU for ML?
- Reproducibility bar — Throw-away exploration? Paper-grade pinned pipeline (DOI, container digests, declared outputs)? Teaching material someone else will re-run?
- Output shape — Plots/notebook? Skimmed ntuples back to the user? A trained model + inference endpoint? A workflow that ships alongside a paper?
If the user gave you enough to answer three of five, proceed. Ask one clarifying question rather than five — keep friction low.
Recommendation format
Produce a short, scannable block with four parts:
- TL;DR stack — one line naming the 2–4 services to stitch.
- Why this stack — 2–3 bullets tying each service to a specific intake answer ("public data → no Rucio needed", "ML training → GPU → ml.cern.ch Kubeflow, not SWAN").
- Steps — numbered, concrete, each step names the tool and the
exit artifact. Link the matching skill when Claude has one
(
rucio,reana-workflows, etc.) so the user can drill down. - Caveats — auth pre-reqs, quotas, obvious failure modes. Keep to 2–3 bullets.
End with a concrete call to action: one CLI command or one URL the user should open next.
Decision cheatsheet
Use this to prune the catalogue fast. Full details in
reference/recipes.md.
| Goal | Default stack |
|---|---|
| Classroom / first look at HEP data | ATLAS Open Data + Binder |
| Interactive notebook, CERN user | SWAN + EOS + CVMFS |
| Scripted local analysis, any user | uproot/coffea + Docker image |
| Scalable Python analysis over Open Data | CERN AF (SWAN + HTCondor / lxplus / ml.cern.ch / REANA) or coffea-casa |
| ROOT/C++ shop scaling the same code local → batch | (distributed) RDataFrame on SWAN + HTCondor |
| Dev-in-notebook, run-in-batch | SWAN + HTCondor (Dask-HTCondor) → move_to_batch |
| Declarative, reproducible pipeline | REANA + GitLab + container digest |
| Paper reproducibility + DOI | REANA + CVMFS + Zenodo |
| Non-public dataset discovery / transfer | Rucio |
| Grid submission, ATLAS member | PanDA (on top of Rucio) |
| Large batch on CERN resources | HTCondor @ CERN |
| GPU sanity check / CUDA debug (interactive) | lxplus-gpu (SSH) |
| ML training in a notebook (small/medium model) | SWAN T4 GPU |
| ML training + inference endpoint | ml.cern.ch (Kubeflow) |
| A100 / custom CUDA stack, multi-week lease | OpenStack g4.* flavor (ticket) |
| "Quick and cheap" answer, no CERN account | Colab + atlas-opendata skill |
For the framework dimension ("which columnar library"), consult
reference/columnar-frameworks.md — it picks between uproot/awkward,
coffea, and (distributed) RDataFrame.
On "the Analysis Facility"
When a user says "the analysis facility" at CERN, they usually mean the integrated CERN stack (lxplus + SWAN + HTCondor batch + REANA
- ml.cern.ch + Rucio + EOS + CVMFS) presented as one facility — not
any single product. The
cern-afentry incatalog.yamlcaptures this umbrella.
coffea-casa is a separate, external analysis facility (Dask on Kubernetes tuned for coffea). It's one instance of the AF idea, not THE AF. Recommend the CERN AF as the default for CERN users; recommend coffea-casa when the user is already in the coffea ecosystem and wants zero setup.
If the goal composes two rows (e.g. "distributed analysis and ML"),
compose the stacks — that's the whole point of this skill. Example in
reference/recipes.md#distributed-analysis-plus-ml.
Workflow
- Listen for intake answers in the user's first message. Fill in gaps with one focused question if needed.
- Match a recipe in
reference/recipes.mdif the goal is close to a named one; otherwise compose fromcatalog.yaml. - Produce the recommendation in the four-part format above.
- Hand off to the tool-specific skill for execution. E.g. if the
stack says "declare with REANA", load
reana-workflowsnext.
Pitfalls
- Don't overload the user. Three tools is often the right answer; five is never. If the stack has more than four services, merge steps or split into phases.
- Respect the audience tier. Don't recommend Rucio or lxplus to a classroom user; don't recommend Binder for a paper-grade pipeline.
- Check integrations.
catalog.yamllistsintegrates_with:per service — if you propose two services that don't, flag the seam (e.g. "you'll need torucio downloadlocally beforereana-client upload"). - Links can rot. Give the canonical top-level URL
(
https://reana.cern.ch,https://opendata.atlas.cern) plus the matching skill name — let the dedicated skill handle command-level detail. - Don't do physics. This skill is infra-only. Push physics
methodology questions to
sm-analysesor a reference.
Verification
A successful use of this skill ends with the user holding:
- a named stack (2–4 services),
- the first concrete action (one URL or one CLI command), and
- the name of the next skill to load (
rucio,reana-workflows,atlas-opendata, …) when they're ready to execute.
If the user walks away still unsure which service to start with, the recommendation was too broad — compress.