name: giskard-checks-new-check description: Guides implementation of new or changed eval checks in the giskard-checks package—Check subclasses, discriminated kind registration, public exports, and tests. Use when adding or modifying built-in checks, LLM-based judges, @Check.register kinds, serialization of TestCase or Scenario, or when editing files under libs/giskard-checks/src/giskard/checks related to checks.
giskard-checks — new check
Scope
Library code lives under libs/giskard-checks/. Follow existing patterns; use absolute imports inside giskard.checks. Python 3.12+, Pydantic v2.
Read project rules when relevant: libs/giskard-checks/.cursor/rules/02-project.mdc, 03-development.mdc.
Classify the check
| Kind | Module | Base class |
|---|---|---|
| Deterministic (string, numeric, custom logic on trace) | src/giskard/checks/builtin/ |
Check (often shared helpers in existing builtins) |
| Uses LLM to score or classify | src/giskard/checks/judges/ |
BaseLLMCheck |
For one-off or experimental logic, users can use from_fn / FnCheck without a new class—only add a new type when it is a reusable, serializable primitive.
JSONPath fields (key and *_key)
Naming and typing (required)
Any Check field named key or whose name ends with _key is a JSONPath into the trace. It must be annotated with JSONPathStr, or JSONPathStr | None, or JSONPathStr | MISSING (import MISSING from pydantic.experimental.missing_sentinel). Otherwise tests/core/test_jsonpath_enforcement.py fails.
Inside the library, import from ..core.extraction (module giskard.checks.core.extraction).
Path syntax
- Every path must start with
trace.(validated at model construction viajsonpath_ng). - Resolution runs against
{"trace": trace.model_dump()}. Prefertrace.last.outputs,trace.last.inputs,trace.last.metadata.*over brittle indices;trace.lastis the usual default for “current turn” data.
Reading values
resolve(trace, key)— Use when the value always comes from the trace (e.g. comparisonkey).provided_or_resolve(trace, key=..., value=...)— Use when the user may pass an inline value or fall back to a path (e.g.answer+answer_key). Ifvalue is not MISSING, that wins; otherwise the path is evaluated.
Optional inline-or-path fields: type as T | MISSING = MISSING (and JSONPathStr | MISSING = MISSING for optional *_key fields). Pass fields directly to provided_or_resolve; validate combinations with a @model_validator if only one of (inline, *_key) may be set (see StringMatching / ComparisonCheck). Do not use None to mean "extract from trace"—omit the field so it stays MISSING. Explicit None is only valid when comparing against or supplying None as a real value (e.g. Equals(expected_value=None)).
Failures and types
- Missing matches become
NoMatch. Check withisinstance(x, NoMatch)and returnCheckResult.failurewith a message that names the field/key. - Some paths return a list (multiple matches or list-producing JSONPath). See
resolveincore/extraction.pyif the check must treat collections differently.
Defaults
Use Field(default="trace.last.outputs", ...) (or another sensible default) so typical scenarios need no extra configuration.
More detail: reference.md.
Implementation checklist
Kind string — Pick a unique snake_case discriminator. Search the repo for
@Check.register("to avoid duplicates.Class — Subclass
Check[...]orBaseLLMCheck[...]from..core.check/..judges.base. Use PydanticFieldfor config. For non-LLM checks, implementasync def run(self, trace: TraceType) -> CheckResult(useCheckResult.success/CheckResult.failure; put extra data indetails=when useful).Registration — Decorate with
@Check.register("your_kind")on the class definition.Wire imports so the kind is registered — Discriminated unions only know subclasses that were imported. Add an import of the new module to the appropriate package
__init__.py(e.g.builtin/__init__.pyre-exports;judges/__init__.pyfor judges). Updatesrc/giskard/checks/__init__.pyexports and__all__if the type is public API.JSONPath fields — Follow the section above for every
key/*_key; run tests sotest_jsonpath_enforcementpasses.Tests — Add
tests/builtin/test_<feature>.pyor extend an existing file (e.g.test_judge.py). Mirror package layout. Usepytestand async tests whererunis async. Cover pass, fail, and edge cases (missing values, wrong types).Validate — From the repository root (this monorepo uses the root
Makefile):make test-unit PACKAGE=giskard-checks— unit tests for that libmake check— lint, format, compat, typecheck, etc.
Use
make test PACKAGE=giskard-checksif changes must coexist with functional tests.make test-unit PACKAGE=giskard-checksincludestests/core/test_jsonpath_enforcement.py.
Serialization note
After model_dump(), model_validate() needs every custom Check subclass already imported. If deserialization tests fail with an unknown kind, ensure the defining module is imported on the test (or via package __init__) before validation.
LLM checks
For BaseLLMCheck, implement get_prompt and respect output_type / structured parsing as in existing judges. Details: reference.md.