dfmea - SKILL.md Agent Skill

name: dfmea description: Perform Design Failure Mode and Effects Analysis (DFMEA) for autonomous systems including robots, autonomous vehicles, drones, and similar safety-critical systems. Use when users want to conduct FMEA, analyze failure modes, identify hazards, assess risks, or develop mitigation strategies for perception stacks, sensor systems, control systems, or other autonomous system components.

Design FMEA for Autonomous Systems

Conduct systematic qualitative DFMEA analysis to identify potential failure modes and their effects at the system level.

Workflow

Phase 0: Pre-Analysis Configuration

Before starting the analysis, configure the DFMEA scope and output preferences through user interaction.

Step 0.1: Gather System Context

If the user provides a brief description (e.g., "DFMEA of Perception Stack"), ask clarifying questions:

What sensors/components are in scope? (cameras, LiDAR, radar, IMU, GPS, etc.)
What is the operational domain? (highway, urban, warehouse, outdoor terrain)
What are the key system functions to analyze?
Are there specific failure scenarios of concern?

If user provides detailed context, proceed to configuration.

Step 0.2: Configure Analysis Columns

Ask the user which columns to include in the DFMEA. Present the options:

Standard Columns (always included):

Function ID
Function Name
Guideword
Failure Mode
Effect at System Level

Optional Columns (user selects):

Column	Description	Default
Severity	Severity rating (1-10)	Off
Occurrence	Likelihood rating (1-10)	Off
Detection	Detection rating (1-10)	Off
RPN	Risk Priority Number (S×O×D)	Off
Design Mitigation	Architectural/design countermeasures	On
Testing Mitigation	Verification activities	On
Runtime Mitigation	Operational safeguards	On
Sensors Affected	Which sensors involved	On
Assumptions	Analysis assumptions	Off
Notes	Additional remarks	Off

Step 0.3: Configure Analysis Depth

Ask the user about the desired analysis depth:

Depth Level	Description	Use Case
High-Level	1-2 failure modes per function, major failure categories only	Quick overview, initial scoping
Standard	3-5 failure modes per function, covers main guideword categories	Typical analysis, design reviews
Detailed	5-10+ failure modes per function, exhaustive guideword application	Safety-critical systems, certification

Step 0.4: Configure Output Formats

Ask the user which outputs to generate:

Format	Description	File
JSON	Structured data for processing (always generated)	`dfmea_output.json`
Excel	Formatted spreadsheet with multiple sheets	`dfmea_report.xlsx`
Interactive Explorer	HTML dashboard with visualizations	`dfmea_explorer.html`

Token-Efficient Output: Do NOT output verbose markdown tables inline. Generate files directly and show only a concise summary.

Step 0.5: Configure Project Name

Ask the user for a project name to organize output files:

Question: "What would you like to name this DFMEA project?"

Behavior:

Create a folder with the project name (sanitized for filesystem: lowercase, spaces replaced with hyphens)
All output files will be generated inside this folder
Example: Project name "Highway Autopilot Perception" creates folder highway-autopilot-perception/

Folder Structure:

<project-name>/
├── dfmea_output.json
├── dfmea_report.xlsx
└── dfmea_explorer.html

Phase 1: Identify Functions

For each component/subsystem, list its primary functions. Functions describe what the system is supposed to do.

Example functions for a perception stack:

Detect obstacles in the environment
Classify detected objects (pedestrian, vehicle, static obstacle)
Track object motion over time
Estimate distance and velocity of objects

Phase 1.5: Confirm Functions with User

IMPORTANT: After identifying functions, present them to the user for confirmation before proceeding with failure mode analysis.

Step 1.5.1: Present Identified Functions

Display the identified functions in a clear table format:

## Identified Functions

| ID | Function Name | Description | Sensors |
|----|---------------|-------------|---------|
| F-01 | [Name] | [Description] | [Related sensors] |
| F-02 | [Name] | [Description] | [Related sensors] |
...

Step 1.5.2: Ask for Confirmation

Use AskUserQuestion to let the user choose how to proceed:

Question: "How would you like to proceed with these functions?"

Options:

Option	Description
Accept all	Proceed with all identified functions
Select specific functions	Choose which functions to include (show multi-select)
Add custom functions	User provides additional functions to include
Modify and confirm	User wants to edit function names/descriptions

Step 1.5.3: Handle User Selection

Based on user's choice:

Accept all: Proceed directly to Phase 2
Select specific functions:
- Present a multi-select list of all identified functions
- User selects which ones to include
- Only analyze selected functions
Add custom functions:
- Ask user to provide function names and descriptions
- Add these to the function list
- Show updated list and confirm
Modify and confirm:
- Ask user which functions to modify
- Accept edits to names/descriptions
- Show updated list and confirm

Example Interaction

## Identified Functions

| ID | Function Name | Description |
|----|---------------|-------------|
| F-01 | Image Acquisition | Capture frames from cameras |
| F-02 | Object Detection | Detect vehicles and obstacles |
| F-03 | Lane Detection | Detect lane markings |
| F-04 | Depth Estimation | Estimate distance using stereo |

**How would you like to proceed?**
○ Accept all (Recommended)
○ Select specific functions
○ Add custom functions
○ Modify and confirm

Phase 2: Apply Guidewords to Generate Failure Modes

Use guidewords systematically to identify how each function can fail. See guidewords.md for the complete list.

Common guideword categories:

Omission: No output, not detected, missing data
Commission: False positive, spurious output
Timing: Too early, too late, intermittent
Value: Incorrect value, out of range, stuck
Spatial: Wrong position, wrong orientation
Degradation: Partial detection, reduced range

Apply guidewords based on configured depth level.

Phase 3: Analyze Effects at System Level

For each failure mode, trace the effect up to the system/vehicle level:

What happens to the immediate subsystem?
What happens to dependent subsystems?
What is the effect on vehicle/robot behavior?
What is the potential harm to users, bystanders, or property?

Phase 4: Develop Mitigations

For each significant failure mode, identify mitigations based on configured columns:

Category	Description	Examples
Design	Architectural choices that prevent or reduce failure likelihood	Redundancy, diversity, fail-safe defaults, graceful degradation
Testing	Verification activities that detect the failure mode	Fault injection, simulation, field testing, edge case coverage
Runtime	Runtime monitoring and operational constraints	Health monitoring, plausibility checks, operational domain restrictions, fallback behaviors

See mitigation-patterns.md for detailed patterns.

Phase 5: Generate Outputs

Based on configured output formats, generate files directly without verbose inline output.

Output Generation Steps:

Create the project folder (sanitized project name)
Generate JSON output first (required for Excel and HTML)
Generate Excel report if requested
Generate Interactive Explorer if requested
Display concise summary with metrics and file links

Example Output Summary (token-efficient):

───────────────────────────────────────────────────────────
  DFMEA Analysis Complete
───────────────────────────────────────────────────────────

  Project     highway-autopilot-perception
  System      Camera-Based Perception Pipeline
  Domain      Highway Driving

───────────────────────────────────────────────────────────
  Analysis Summary
───────────────────────────────────────────────────────────

  Functions Analyzed     10
  Failure Modes          20
  Common-Cause Failures   5
  Mitigations            60

───────────────────────────────────────────────────────────
  Key Findings
───────────────────────────────────────────────────────────

  1. Camera-only perception vulnerable to glare/rain
  2. E2E models lack failure attribution
  3. Stereo depth degrades at distance

───────────────────────────────────────────────────────────
  Generated Files
───────────────────────────────────────────────────────────

  JSON      highway-autopilot-perception/dfmea_output.json
  Excel     highway-autopilot-perception/dfmea_report.xlsx
  Explorer  highway-autopilot-perception/dfmea_explorer.html

───────────────────────────────────────────────────────────

Do NOT output:

Full DFMEA markdown tables inline
Verbose failure mode descriptions
Complete mitigation lists

User can view full details in:

Excel report (formatted, printable)
Interactive Explorer (visual, searchable)

Output Format

JSON Output (Primary)

Generate structured JSON following this schema:

{
    "metadata": {
        "generated_at": "ISO-8601 timestamp",
        "version": "1.0",
        "depth_level": "standard",
        "columns": ["function", "guideword", "failure_mode", "effect", "design_mitigation", "testing_mitigation", "runtime_mitigation"]
    },
    "system_info": {
        "system": "System name",
        "subsystem": "Subsystem name",
        "sensors": "Sensor list",
        "operational_domain": "Domain description",
        "analysis_date": "YYYY-MM-DD"
    },
    "functions": [
        {
            "id": "F-01",
            "name": "Function name",
            "description": "Function description",
            "sensors": "Related sensors",
            "failure_count": 3
        }
    ],
    "failure_modes": [
        {
            "id": "FM-01",
            "function_id": "F-01",
            "function": "Function name",
            "guideword": "Guideword",
            "failure_mode": "Failure mode description",
            "effect": "System-level effect",
            "design_mitigation": "Design mitigation",
            "testing_mitigation": "Testing mitigation",
            "runtime_mitigation": "Runtime mitigation"
        }
    ],
    "common_cause_failures": [
        {
            "id": "CCF-01",
            "cause": "Common cause",
            "affected_sensors": "Affected sensors",
            "effect": "Effect description",
            "mitigation": "Mitigation description"
        }
    ],
    "assumptions": [
        {
            "id": "A-01",
            "category": "Category",
            "assumption": "Assumption text",
            "impact_if_invalid": "Impact description"
        }
    ],
    "key_findings": [
        "Key finding 1",
        "Key finding 2"
    ]
}

Generating Outputs

All outputs are generated in the project folder specified during configuration.

Creating Project Folder

Before generating outputs, create the project folder:

# Sanitize project name: lowercase, replace spaces with hyphens, remove special chars
PROJECT_DIR="<sanitized-project-name>"
mkdir -p "$PROJECT_DIR"

Excel Report

To generate a formatted Excel report, use the generate_excel.py script:

python scripts/generate_excel.py <project-dir>/dfmea_output.json <project-dir>/dfmea_report.xlsx

The generated report includes:

Overview: System info, summary stats, key findings
DFMEA Analysis: Main failure mode table (color-coded mitigations)
Functions: List of analyzed functions
Guidewords: Reference table
Common-Cause Failures: CCF analysis
Mitigation Summary: Mitigations grouped by type
Assumptions: Analysis assumptions

Interactive Explorer

To generate an interactive HTML dashboard, use the generate_explorer.py script:

python scripts/generate_explorer.py <project-dir>/dfmea_output.json <project-dir>/dfmea_explorer.html

The explorer includes:

Overview Tab: Dashboard with metrics, donut charts, mitigation summary
Flow Tab: Interactive node-based visualization of functions and failure modes
Common-Cause Tab: CCF cards with affected sensors and mitigations

Features:

Drag and drop nodes
Zoom and pan
Click failure modes to see details
Minimap navigation

Requirements

Both scripts require openpyxl (Excel only):

pip install openpyxl

Domain References

For domain-specific knowledge about autonomous system architectures:

See autonomous-systems.md for common subsystems and their interactions

Analysis Depth Guidelines

High-Level Analysis

Focus on top 1-2 most critical failure modes per function
Use broad guideword categories (omission, commission, value)
Suitable for: Initial scoping, executive summaries, early design phase

Standard Analysis

Apply 3-5 failure modes per function
Cover all major guideword categories
Include sensor-specific failures
Suitable for: Design reviews, hazard analysis, most projects

Detailed Analysis

Exhaustive guideword application (5-10+ per function)
Consider environmental sensitivities
Include edge cases and unusual conditions
Document all assumptions
Suitable for: Safety-critical systems, certification, automotive/aerospace

Best Practices

Start with high-level analysis, then decompose if user requests more detail
Focus on safety-critical failure modes first
Consider both independent failures and common-cause failures
Note any assumptions about the operational environment
Validate mitigations are independent of the failure mode they address
Cross-reference with industry standards (ISO 26262, IEC 61508) when applicable