name: evaluate-evidence description: Evaluate engineering artifacts against skill markers and write evidence rows for Landmark.
Evaluate Evidence
Evaluate engineering artifacts against an agent-aligned engineering standard and write evidence rows that Landmark presents.
Trigger
The prompt must contain "evaluate" and one of:
- Person scope: "for {email}" —
GetUnscoredArtifacts({ email }) - Team scope: "for direct reports of {email}" —
GetUnscoredArtifacts({ manager_email: email }) - Org scope: "for all" —
GetUnscoredArtifacts({ org: true })
Procedure
- Parse the scope from the prompt.
- Call
GetUnscoredArtifactswith the parsed scope. If the result is empty, report "no unscored artifacts" and exit. - For each artifact returned:
a. Call
GetArtifactwith the artifact's id to get its detail. b. CallGetPersonwith the artifact's author email to get their profile (discipline, level, track). c. CallGetMarkersForProfilewith the profile to get the markers the engineer is expected to demonstrate. Each line is tab-separated:skill_id\tlevel_id\tmarker_text. d. Evaluate the artifact against each returned marker:- Determine
matched(boolean): does the artifact demonstrate this marker? - Write a 1-3 sentence
rationaleexplaining your reasoning. matched: falserows are valid — write them to document what was checked and not found. e. CallWriteEvidenceonce per marker with:artifact_id,skill_id,level_id,marker_text,matched,rationale, andprovenance: 'agent_attested'. Theprovenanceargument tags the row as agent-judged so downstream consumers can distinguish it from human-attested rows. Call multiple markers in parallel for throughput.
- Determine
Constraints
- Every
skill_id+marker_textpair must come verbatim fromGetMarkersForProfile— never invent or paraphrase markers. - Every row must have non-null
rationaleandlevel_id. - Do not re-evaluate artifacts that already have evidence — they will not
appear in
GetUnscoredArtifacts. - Evaluation is idempotent:
WriteEvidenceupserts on(artifact_id, skill_id, level_id, marker_text). - The
provenanceargument is always'agent_attested'for rows this skill writes. Other values exist for other producers; this skill does not emit them.
Multi-source note
GetArtifact returns different structures per source type. Evaluate based on
what the artifact contains:
- Pull requests: title, description, diff context, review thread
- Reviews: review body, comments, verdict
- Commits: commit message, changed files
After all artifacts are evaluated, report the count of evidence rows written and exit.