name: ai-ready-data description: Assess and optimize data for AI workloads across platforms. Scan estates for prioritization, assess assets against profiles, and guide remediation.
AI-Ready Data
Assess data products for AI-readiness and remediate gaps. Each requirement is a self-contained directory with three markdown files per platform: check.md (context + SQL returning a 0–1 score), diagnostic.md (context + detail SQL), and fix.md (context + remediation SQL and/or organizational guidance). Each file co-locates all relevant context — constraints, gotchas, variant selection guidance, and platform-specific notes — directly above the SQL it applies to. The manifest (requirements/requirements.yaml) provides lightweight metadata for profile-load time. Every assessment has exactly six stages named after the six factors of AI-ready data — use these exact names everywhere (reports, plans, tasks): Clean, Contextual, Consumable, Current, Correlated, Compliant.
What This Skill Does
Three phases, light to deep:
- Scan — Lightweight estate-level sweep across many schemas. Produces a comparative readiness view for prioritization. Uses the
scanprofile. - Assess — Deep evaluation of specific assets against a profile (RAG, agents, training, feature-serving, or custom). Scores each requirement, reports pass/fail.
- Remediate — For failing requirements, present platform-specific fixes, get approval, execute, verify.
Conversation Flow
1. Platform → user selects platform
2. Discovery → guided, explore, or scan (estate-level)
3. Profile → user picks a profile or selects individual requirements
4. Adjustments → apply overrides (skip/set/add)
5. Coverage → show what's runnable vs N/A before executing
6. Assess → execute checks, score, report
7. Report → present results, offer to save standardized report to filesystem
8. Remediate → platform-specific fixes for failures
Step 1: Platform
Ask the user what platform their data is on. Supported platforms:
snowflake
Load the platform reference from platforms/ — either platforms/{PLATFORM}.md or platforms/{platform}/{PLATFORM}.md. This is your reference for all platform-specific behavior during this session.
Step 2: Discovery
Discovery has three modes: guided (user already knows what to assess), explore (user wants to understand the landscape first), and scan (estate-level sweep for prioritization). Ask the user which fits:
How would you like to scope this assessment?
1. I know which database/schema/tables to assess
2. Help me explore what's available first
3. Scan my data estate for AI readiness
Mode 1: Guided Discovery
Ask the user:
- What database?
- What schema?
- What tables? All tables in the schema, or specific ones?
This establishes the scope for the assessment. No SQL is executed during guided discovery.
Mode 2: Explore-First Discovery
Run read-only reconnaissance queries against the platform to help the user understand their data landscape before choosing targets. Present findings progressively — don't dump everything at once.
Step 2a: Database landscape. List accessible databases with table and view counts. On Snowflake, query INFORMATION_SCHEMA.DATABASES or SHOW DATABASES. Present a summary:
Available databases:
ANALYTICS 42 schemas 1,204 tables
RAW_INGESTION 12 schemas 387 tables
ML_FEATURES 5 schemas 89 tables
Ask the user to pick a database (or multiple).
Step 2b: Schema inventory. For the selected database(s), enumerate schemas with structural signals:
ANALYTICS schemas:
Schema Tables Views Dynamic Tables Streams Has Tags
PRODUCT_METRICS 34 12 3 2 yes
USER_BEHAVIOR 28 4 0 0 no
RAW_EVENTS 112 0 0 0 no
Structural signals to surface (platform-dependent):
- Table and view counts
- Presence of dynamic tables (indicates transformation pipelines)
- Presence of streams (indicates change tracking)
- Presence of tags (indicates governance investment)
- Presence of masking policies (indicates compliance posture)
- Approximate row counts or storage size where available cheaply
Step 2c: Narrow scope. Based on what the user sees, help them select the database, schema, and tables for assessment. Users may choose all tables in a schema or pick specific ones.
All reconnaissance queries are read-only. No mutating operations during discovery.
Mode 3: Scan (Estate-Level)
Scan produces a comparative readiness view across many schemas within a database (or across databases). It uses the scan profile — a lightweight subset of requirements that are all schema-scoped and fast to execute.
Step 2s-a: Scope. Ask the user for the database (or account-level). No schema or table selection — the scan covers all schemas.
Step 2s-b: Execute scan profile. Load profiles/scan.yaml. For each schema in the database, run the scan profile's requirements. All checks in the scan profile must be schema-scoped to allow batch execution across many schemas.
Step 2s-c: Present portfolio view. Score each schema and present a comparative ranking:
Estate Scan — {platform} — {DATABASE}
Schema Readiness Contextual Consumable Current Compliant
──────── ───────── ────────── ────────── ─────── ─────────
PRODUCT_METRICS 4.2 4.5 3.0 5.0 4.5
USER_BEHAVIOR 3.1 2.0 3.5 4.0 3.0
RAW_EVENTS 1.4 1.0 1.0 2.5 1.0
{N} schemas scanned. {H} above readiness threshold (≥ 3.5).
Step 2s-d: Drill down. After presenting the scan, offer the user a choice:
Options:
1. Assess a specific schema in depth (pick from the list)
2. Export scan results
3. Done
If the user picks a schema, transition to the full Assess flow (Step 3 onward) with that schema pre-scoped. The scan phase flows naturally into the assess phase.
Step 3: Profile
Ask the user what they want to assess:
Load each profile YAML from profiles/ and count its requirements to present accurate numbers:
What would you like to assess?
1. RAG readiness ({N} requirements)
2. Feature serving readiness ({N} requirements)
3. Training readiness ({N} requirements)
4. Agent readiness ({N} requirements)
5. Full assessment (all {N} requirements)
6. Let me pick specific requirements
If the user picks a built-in profile, load profiles/{name}.yaml.
If the user picks "full assessment," include all requirements from requirements/requirements.yaml with default thresholds of 0.80.
Option 6: Fine-Grained Requirement Picker
If the user wants to pick specific requirements, dynamically generate the catalog from requirements/requirements.yaml at runtime. Read every entry, group by factor, number sequentially, and present in this format:
Select the requirements you want to assess (comma-separated numbers, or "all" within a factor):
─── {Factor Name} ───
{N}. {requirement_key} {description}
...
─── {Next Factor} ───
...
Use the factor order: Clean, Contextual, Consumable, Current, Correlated, Compliant. Number requirements sequentially across all factors starting at 1.
Users can respond with:
- Specific numbers:
1, 2, 13, 34, 52(picks those) - Factor groups:
all Clean, all Compliant(picks every requirement in those factors) - Ranges:
1-12, 52-62 - Mix:
all Clean, 13, 34-36, 52
For each selected requirement, apply a default threshold of 0.80. The user can adjust thresholds in Step 4 (Adjustments).
Step 4: Adjustments
After loading the profile, offer three adjustment verbs:
skip <requirement>— Exclude a requirement entirely.set <requirement> <threshold>— Override a threshold (e.g.,set chunk_readiness 0.70).add <requirement> <threshold>— Include a requirement not in the profile.
Step 5: Coverage Summary
Before executing, intersect the selected requirements with what the platform can actually run. For each requirement, check if requirements/{key}/{platform}/check.md exists.
Present the coverage summary:
{Profile} Assessment — {platform} — {DATABASE}.{SCHEMA}
Selected: {N} requirements
Runnable: {R}
Not available: {N-R} (no implementation for this platform)
- {requirement_key}: no {platform} implementation
- ...
Proceed?
Checkpoint: User confirms before execution begins.
Step 6: Assess
Load requirements/requirements.yaml once at session start. This manifest provides lightweight metadata: description, factor, scope, and implementations.
For each stage in order (Clean, Contextual, Consumable, Current, Correlated, Compliant), for each requirement:
- Look up the requirement entry in
requirements/requirements.yamlfor metadata (scope). - Read
requirements/{requirement_name}/{platform}/check.md. This file contains all context (constraints, gotchas, variant guidance) and one or more SQL blocks. - Use the context in the file to determine which SQL block to execute (e.g., sampled vs full scan based on row count, primary vs variant based on available platform features).
- Substitute
{{ placeholder }}values from the user's scope context and the SQL block itself (database, schema, asset, column, plus any requirement-specific values documented in the check file's context section). - Execute the SQL. Read the
valueresult (float 0.0–1.0, where 1.0 is perfect). - Compare
value >= thresholdto determine pass/fail. - If no implementation exists for this platform, report
N/A.
SQL blocks within markdown files use {{ placeholder }} syntax for variable substitution. The scope field in the manifest tells you whether the check is schema-scoped, table-scoped, or column-scoped:
- Schema-scoped (only
database,schema): run once per schema. - Table-scoped (includes
asset): run per table, aggregate results. - Column-scoped (includes
column): run per column, aggregate results.
Scoring conventions
- Range.
valueis a float in[0.0, 1.0]where1.0is perfect. All requirements aregte-direction: pass whenvalue >= threshold. - N/A (empty denominator). When the check has nothing to measure (no in-scope tables, no in-scope columns, no events in the window, etc.), the SQL must return
NULLviaNULLIF(denominator, 0). The orchestrator rendersNULLasN/Ain reports and treats it as neither pass nor fail. Never emit a hard-coded1.0or0.0fallback — they silently inflate or deflate dashboards. - Casing. All
information_schemafilters must wrap both sides inUPPER(...)(e.g.UPPER(table_schema) = UPPER('{{ schema }}')) so callers can pass identifiers in any case. - Determinism.
LIMITonaccount_usagescans (access_history,query_history, …) must be paired with a stableORDER BY— typicallyORDER BY query_start_time DESCorORDER BY start_time DESC— so repeated runs return the same score. - LIKE vs REGEXP_LIKE. Prefer
REGEXP_LIKE(col, pattern)for name pattern matching.LIKE '%_X'is unsafe because_is a single-character wildcard that matches unintended names (e.g.USERAIDmatches%_ID).
Step 7: Report
Present results in conversation first, then offer to save:
{Profile} Assessment — {platform} — {DATABASE}.{SCHEMA}
{Stage Name} {PASS/FAIL}
"{why}"
{requirement} {value} (need >= {threshold}) {PASS/FAIL}
Summary: {N} of {total} stages passing ({M} of {R} requirements passing)
Save Report to Filesystem
After presenting results in conversation, offer to write a standardized report file:
Would you like me to save this report to your filesystem?
Default path: ./ai-ready-report-{DATABASE}-{SCHEMA}-{YYYY-MM-DD}.md
(You can specify a different path)
If the user confirms, write the report in the following standardized format. The report must be self-contained — a reader who wasn't present for the conversation should understand the full context, results, and next steps.
Report format:
# AI-Ready Data Assessment
| Field | Value |
|---------------|------------------------------------|
| Date | {YYYY-MM-DD} |
| Platform | {platform} |
| Database | {DATABASE} |
| Schema | {SCHEMA} |
| Tables | {table count} ({list or "all"}) |
| Profile | {profile name or "custom"} |
| Requirements | {selected count} of {manifest count} |
| Runnable | {runnable count} |
## Summary
{N} of {total} stages passing. {M} of {R} requirements passing.
{1-3 sentence narrative: what's strong, what's the biggest gap, what's the highest-ROI fix.}
## Results by Stage
### {Stage Name} — {PASS/FAIL}
> {why}
| Requirement | Score | Threshold | Status |
|-------------|-------|-----------|--------|
| {key} | {val} | {thresh} | {P/F} |
{Repeat for each stage}
## Failing Requirements
{For each failing requirement, one subsection:}
### {requirement_key} — {value} (need >= {threshold})
- **Factor:** {factor}
- **What it measures:** {description from manifest}
- **Scope:** {schema/table/column}
- **Constraints:** {any constraints from manifest, or "None"}
## Recommended Next Steps
{Prioritized list of remediation actions, ordered by impact.
Group by effort level: quick wins first, then medium effort, then larger investments.
Reference specific requirements and their current scores.}
---
*Generated by ai-ready-data assessment. Re-run to track progress.*
If the assessment was run against multiple tables, include per-table breakdowns under each stage where relevant (table-scoped and column-scoped checks).
If the user ran a custom selection (option 6), note which requirements were included and which were excluded under the metadata table.
Checkpoint: Options: remediate (fix gaps), tell-me-more (run diagnostics), done (stop).
Diagnostics
When the user wants detail on a failing requirement, read requirements/{requirement_name}/{platform}/diagnostic.md. The file contains context explaining what the diagnostic measures and one or more SQL blocks. Use the context to select the appropriate SQL, substitute placeholders, execute, and present the results. If the file doesn't exist, explain that diagnostics aren't available for this requirement on this platform.
Remediation Workflow
Process failing stages in order. For each stage:
Present Stage Context
Stage: {Stage Name}
Why: {why}
Failing requirements:
{requirement}: {value} (need >= {threshold})
Load Fix Operations
For each failing requirement:
- Read
requirements/{requirement_name}/{platform}/fix.md. This file contains all context (constraints, preconditions, delegation notes) and one or more remediation options — each with its own SQL block and/or organizational guidance. - Use the context to determine which remediation option(s) to present. Some fixes are executable SQL; others are organizational process guidance for humans.
- Substitute
{{ placeholder }}values in the SQL blocks. - Check the platform reference for delegation targets. If the fix references a delegated workflow, follow it.
Present Remediation Plan
Show the substituted implementation, affected objects, and any constraints.
Checkpoint: Options: approve (execute), skip (next stage), modify (edit SQL), tell-me-more (diagnostics), abort (stop).
Execute with Idempotency Guards
Before executing non-idempotent operations, check the platform reference for idempotency guards. Run the guard first; skip the operation if the desired state already exists.
Skipped guards are not failures — the desired state already exists. Never use CREATE OR REPLACE unless the platform documentation explicitly says it's safe for that operation.
Verify
Re-run the platform check implementation for each requirement in the stage. Show before/after:
{Stage Name} — remediation complete
{requirement}:
Before: {old_value}
After: {new_value}
Status: {PASS/FAIL}
Proceed or Finish
Move to the next failing stage. After all stages:
Remediation Complete
Stage Before After
───── ────── ─────
{Stage Name} FAIL PASS
{Stage Name} FAIL PASS
What changed:
{Stage}: {one-line summary}
Overrides
Overrides are applied in memory for the current run. For repeatability, overrides can be saved as a custom profile using extends:
name: my-rag-profile
extends: rag
overrides:
skip:
- embedding_coverage
set:
chunk_readiness: { min: 0.70 }
add:
row_access_policy: { min: 0.50 }
When loading a profile with extends, first load the base profile, then apply overrides.
Constraints
- Read-only during assessment. Never execute mutating operations during scan or assess phases.
- Fix operations require approval. Execute only with explicit user consent per stage.
- Never batch without consent. Present the plan first, execute stage-by-stage with approval.
- Surface all constraints. Show constraints from the fix file's context section before executing fix operations.
- No credentials in output. Connection strings stay in environment variables.
- Read platform docs first. Load the platform reference from
platforms/before executing any operations. - Use capability gating. If platform doesn't support an operation, return
N/Awith reason.
Requirement Directory Convention
Each requirement has a directory under requirements/ containing platform-specific implementations as markdown files. The manifest (requirements/requirements.yaml) provides lightweight metadata needed at profile-load time. All detailed context — constraints, gotchas, variant selection guidance, platform-specific notes — lives in the markdown files themselves, co-located directly above the SQL they apply to.
| File | Purpose |
|---|---|
requirements.yaml |
Single manifest: lightweight metadata (description, factor, scope, implementations) |
{requirement_key}/{platform}/check.md |
Context + check SQL (read-only, returns normalized 0–1 score) |
{requirement_key}/{platform}/diagnostic.md |
Context + diagnostic SQL (read-only detail drill-down) |
{requirement_key}/{platform}/fix.md |
Context + remediation SQL and/or organizational guidance (mutating, requires approval) |
Markdown file format
Each file follows this structure:
# {Type}: {requirement_key}
{One-line description}
## Context
{Prose: what it measures, constraints, gotchas, platform-specific notes,
variant selection guidance, preconditions. Everything the agent needs
to understand before executing the SQL.}
## SQL
{One or more fenced SQL blocks. If multiple variants or options exist,
use ### subheadings with prose explaining when to use each one.}
A single file can contain multiple SQL implementations. For example, check.md can contain both a full-scan and a sampled variant; fix.md can contain multiple remediation options with different tradeoffs. The agent reads the context to determine which SQL block to use.
Fix files may contain organizational process guidance (not just SQL) — for example, governance decisions, ownership assignments, or data model restructuring advice.
File Layout
skills/ai-ready-data/
SKILL.md ← You are here
platforms/ ← Platform references
{PLATFORM}.md ← Capabilities, nuances, permissions, dialect
profiles/ ← Assessment profiles
scan.yaml ← Estate-level scan profile (lightweight)
rag.yaml ← RAG readiness profile
feature-serving.yaml ← Feature serving readiness profile
training.yaml ← Training readiness profile
agents.yaml ← Agents readiness profile
requirements/ ← Requirement manifest + implementation directories
requirements.yaml ← Single manifest (all requirement metadata)
{requirement_key}/
{platform}/
check.md ← Context + check SQL
diagnostic.md ← Context + diagnostic SQL
fix.md ← Context + remediation SQL/guidance
Adding a New Requirement
- Add an entry to
requirements/requirements.yamlwith: description, factor, scope, implementations. - Create
requirements/{name}/{platform}/directory with three markdown files:check.md(required) — context + SQL returning avaluescore 0–1diagnostic.md(required) — context + SQL for detail drill-downfix.md(required) — remediation SQL and/or organizational guidance
- Add the requirement to the relevant profile YAML(s) under the matching factor stage.
Adding a New Profile
- Create
profiles/{name}.yamlwith six stages (Clean, Contextual, Consumable, Current, Correlated, Compliant). - Select requirements for each stage and set thresholds appropriate for the use case.
- Alternatively, use
extendsto derive from an existing profile and apply overrides.
Adding a New Platform
- Create
platforms/{PLATFORM}.mdcovering capabilities, dialect, permissions, nuances, idempotency guards, and delegation targets. - Add requirement files under
requirements/{key}/{platform}/.