name: fedarena_arena description: "FedArena Arena — evaluate user-designed FL attacks/defenses against a standardized benchmark matrix. Supports both prompt mode (describe your idea → Claude implements → auto-evaluate) and file submission mode." argument-hint: "<describe your attack/defense idea in natural language>"
FedArena Arena — FL Attack/Defense Evaluation
You are the FedArena Arena evaluator. Users submit new FL attack or defense algorithms, and you evaluate them against a pre-computed benchmark matrix of existing methods.
How it works
- Benchmark Matrix: A pre-computed table of
attack × defenseaccuracy results stored atresults/arena/benchmark_matrix.json - User submits a new method: Either by describing it (you implement) or by providing code
- Evaluation: The new method is tested against all opponents in the matrix
- Report: Results are compared and ranked against existing methods
Input parsing
Parse the user's input to determine:
- Role: Is this an attack or defense? Look for keywords:
- Attack: "attack", "poison", "poisoning", "bypass", "degrade"
- Defense: "defense", "defend", "robust", "aggregation", "protect"
- Method description: The core idea of their algorithm
Workflow
Step 1 — Check prerequisites
Verify results/arena/benchmark_matrix.json exists. If not, tell the user:
基准矩阵尚未生成。请先运行:
PYTHONPATH=libs:apps/backend/runners uv run python -m fl_core.research.arena generate \
--config configs/research/bench_baseline.yaml --seeds 0 --output results/arena
Step 2 — Implement the method
Based on the user's description, implement their algorithm.
For attacks, create a file at libs/fl_core/research/attacks/submissions/<name>/:
libs/fl_core/research/attacks/submissions/<name>/
├── __init__.py # from .strategy import <ClassName>
└── strategy.py # the implementation
from fl_core.research.base_attack import ResearchAttackStrategy
class UserAttack(ResearchAttackStrategy):
method_name = "arena_attack_<name>"
def attack(self, local_model_params, global_model_params,
all_client_params=None, round_num=0, client_id=0, **kwargs):
# Implementation here
return poisoned_params
For defenses, create at libs/fl_core/research/defenses/submissions/<name>/:
from fl_core.research.base_defense import ResearchDefenseStrategy
class UserDefense(ResearchDefenseStrategy):
method_name = "arena_defense_<name>"
def aggregate(self, client_models, client_weights=None, **kwargs):
# Implementation here
return aggregated_params
Step 3 — Run evaluation
PYTHONPATH=libs:apps/backend/runners uv run python -m fl_core.research.arena evaluate \
--method <method_name> \
--role <attack|defense> \
--config configs/research/bench_baseline.yaml \
--matrix results/arena/benchmark_matrix.json \
--seeds 0 \
--output results/arena
Wait for it to finish (do NOT run in background).
Step 4 — Report results
Read the output and present:
- The accuracy against each opponent
- Comparison with existing methods in the matrix
- Overall ranking
- Analysis of strengths/weaknesses
Rules
- Method names MUST start with
arena_attack_orarena_defense_ - Keep implementations self-contained (only import torch, numpy, stdlib)
- Always show the ranking comparison — that's the whole point of Arena
- If the benchmark matrix doesn't exist, guide the user to generate it first