vmware-monitor

name: vmware-monitor description: > Use this skill for safe, risk-free queries of VMware infrastructure — code-level enforced safety means no destructive operations exist in the codebase. Directly handles: list VMs/hosts/datastores/clusters, check active alarms with remediation hints, view recent events, get VM details (CPU/memory/disks/NICs/snapshots). Always use vmware-monitor when the user asks to "list VMs", "check vSphere alarms", "show host status", "get VM details", "what vSphere events happened", or needs read-only VMware information before making changes. Do NOT use for any write operations — this skill is code-level read-only and cannot modify, create, or delete any resource. For VM modifications use vmware-aiops, for networking use vmware-nsx, for metrics/capacity use vmware-aria. For load balancing/AVI/AKO use vmware-avi. installer: kind: uv package: vmware-monitor allowed-tools: - Bash metadata: {"openclaw":{"requires":{"env":["VMWARE_MONITOR_CONFIG"],"bins":["vmware-monitor"],"config":["/.vmware-monitor/config.yaml","/.vmware-monitor/.env"]},"optional":{"env":["VMWARE_TARGET_PASSWORD","SLACK_WEBHOOK_URL","DISCORD_WEBHOOK_URL"],"bins":["vmware-policy"]},"primaryEnv":"VMWARE_MONITOR_CONFIG","homepage":"https://github.com/zw008/VMware-Monitor","emoji":"📊","os":["macos","linux"]}} compatibility: > vmware-policy auto-installed as Python dependency (provides @vmware_tool decorator and audit logging). All operations audited to ~/.vmware/audit.db. Credentials: Each vCenter/ESXi target requires a per-target password env var in ~/.vmware-monitor/.env following the pattern VMWARE__PASSWORD (e.g., target "vcenter-prod" → VMWARE_VCENTER_PROD_PASSWORD). SLACK_WEBHOOK_URL and DISCORD_WEBHOOK_URL are optional — disabled by default, user-configured only, used solely by the opt-in daemon scanner. Daemon: the background scanner (vmware-monitor daemon start) is user-initiated only, never auto-started. Webhook payloads contain only aggregated alert metadata (alarm counts, event types) — no credentials, IPs, or PII.

VMware Monitor (Read-Only)

Disclaimer: This is a community-maintained open-source project and is not affiliated with, endorsed by, or sponsored by VMware, Inc. or Broadcom Inc. "VMware" and "vSphere" are trademarks of Broadcom. Source code is publicly auditable at github.com/zw008/VMware-Monitor under the MIT license.

Read-only VMware vCenter/ESXi monitoring — 21 MCP tools, zero destructive code.

Code-level safety: This skill contains NO power, create, delete, snapshot, or modify operations. Not disabled — they don't exist in the codebase. Companion skills: vmware-aiops (VM lifecycle), vmware-storage (iSCSI/vSAN), vmware-vks (Tanzu Kubernetes), vmware-nsx (NSX networking), vmware-nsx-security (DFW/firewall), vmware-aria (metrics/alerts/capacity), vmware-avi (AVI/ALB/AKO), vmware-harden (compliance baselines). | vmware-pilot (workflow orchestration) | vmware-policy (audit/policy)

What This Skill Does

All 21 tools are read-only.

Category	Capabilities
Inventory	List VMs, ESXi hosts, datastores, clusters, networks
Health	Active alarms, recent events (filter by severity/time), hardware sensors, host services
Performance	Real-time host & VM CPU/memory/disk/network utilisation (PerfManager)
Capacity	Datastore thin-provisioning over-commit, resource-pool reservation/usage
Infra Health	ESXi certificate expiry, license usage/expiry, NTP configuration health
Snapshots	Inventory-wide snapshot aging & sprawl (flag old snapshots)
Activity	In-flight tasks, active login sessions
VM Details	CPU, memory, disks, NICs, snapshots, guest OS, IP
Scanning	Scheduled alarm/log scanning with Slack/Discord webhooks

Quick Install

uv tool install vmware-monitor
vmware-monitor doctor

When to Use This Skill

List or search VMs, hosts, datastores, clusters
Check active alarms or recent events
Get detailed info about a specific VM
Set up scheduled monitoring with webhook alerts
Any read-only VMware query where safety is paramount

Alarm/Event Output: `suggested_actions` Field

get_alarms and get_events results include a suggested_actions list. Each item is a ready-to-use hint pointing to the correct companion skill and tool:

{
  "alarm_name": "VM CPU Ready High",
  "entity_name": "prod-db-01",
  "suggested_actions": [
    "vmware-aiops: acknowledge_vcenter_alarm(entity_name='prod-db-01', alarm_name='VM CPU Ready High')",
    "vmware-aiops: reset_vcenter_alarm(entity_name='prod-db-01', alarm_name='VM CPU Ready High')"
  ]
}

AI agents (especially smaller local models) can read these hints directly to determine which skill and tool to call next, without needing to reason about skill routing themselves.

Use companion skills for:

Power on/off, deploy, clone, migrate --> vmware-aiops
iSCSI, vSAN, datastore management --> vmware-storage
Tanzu Kubernetes clusters --> vmware-vks
Load balancing, AVI/ALB, AKO, Ingress --> vmware-avi

Related Skills — Skill Routing

User Intent	Recommended Skill
Read-only vSphere monitoring, zero risk	vmware-monitor ← this skill
Storage: iSCSI, vSAN, datastores	vmware-storage
VM lifecycle, deployment, guest ops	vmware-aiops
Tanzu Kubernetes (vSphere 8.x+)	vmware-vks
NSX networking: segments, gateways, NAT	vmware-nsx
NSX security: DFW rules, security groups	vmware-nsx-security
Aria Ops: metrics, alerts, capacity planning	vmware-aria
Multi-step workflows with approval	vmware-pilot
Compliance baselines (CIS / 等保 / PCI-DSS), drift detection, LLM remediation advisor	vmware-harden (`uv tool install vmware-harden`)
Load balancer, AVI, ALB, AKO, Ingress	vmware-avi (`uv tool install vmware-avi`)
Audit log query	vmware-policy (`vmware-audit` CLI)

Common Workflows

Diagnostic investigations: Before running any "why is X failing / down / abnormal" workflow, follow references/investigation-protocol.md. It enforces the four root-cause completeness criteria (falsifiability / sufficiency / necessity / mechanism) and the up-to-three-rounds deepening loop. Since vmware-monitor is read-only, it serves as the data source — actuation belongs to companion skills like vmware-aiops.

Daily Health Check

Judgment: alarms tell you what vCenter has decided is wrong, events tell you what happened. They diverge — an event burst with no alarms often signals a metric threshold miscalibration, not "everything is fine." Read both.

Check alarms --> vmware-monitor health alarms --target prod-vcenter — focus on Red severity AND alarms older than 1 hour (transient ones self-clear)
Review recent events --> vmware-monitor health events --hours 24 --severity warning — look for repeated events from the same entity (a single event is noise; 50 events in an hour is a pattern)
List hosts --> vmware-monitor inventory hosts — flag hosts disconnected, in maintenance mode unexpectedly, or memory > 90%
If connection fails --> run vmware-monitor doctor to diagnose config/network issues

Investigate a Specific VM

Find the VM --> vmware-monitor inventory vms --power-state poweredOff
Get details --> vmware-monitor vm info problem-vm
Check related events --> vmware-monitor health events --hours 48
If VM not found --> verify VM name with vmware-monitor inventory vms --limit 100 or check target with --target <other-vcenter>

Performance Triage ("the cluster feels slow")

Judgment: inventory shows configured capacity (cores, GB); it cannot tell you what is actually hot. Use the real-time perf tools, then narrow.

Rank hosts --> vmware-monitor perf hosts — the busiest host floats to the top (sorted by CPU%)
Rank VMs on the suspect --> vmware-monitor perf vms --limit 25 — find the noisy neighbour
Check for hidden storage pressure --> vmware-monitor capacity datastores — over-commit % > 100 means a thin datastore can fill mid-run even with "free" space showing
Rule out snapshot drag --> vmware-monitor snapshots aging --only-old — old snapshots silently degrade I/O
If perf tools return empty --> the host/VM may be disconnected or powered off (no real-time provider); confirm with inventory hosts / inventory vms

Scheduled-Outage Pre-flight (certs, licenses, time)

Cert expiry --> vmware-monitor infra certs --warn-days 60 — an expired ESXi cert drops host management
License headroom --> vmware-monitor infra licenses — catch over-allocation before it disables features
Time sync --> vmware-monitor infra ntp — healthy: no breaks SSO/Kerberos/log correlation (note: live offset is not exposed by the SOAP API, only config health)

Set Up Continuous Monitoring

Configure webhook in ~/.vmware-monitor/config.yaml
Start daemon --> vmware-monitor daemon start
Daemon scans every 15 min, sends alerts to Slack/Discord

Usage Mode

Scenario	Recommended	Why
Local/small models (Ollama, Qwen)	CLI	~2K tokens vs ~8K for MCP
Cloud models (Claude, GPT-4o)	Either	MCP gives structured JSON I/O
Automated pipelines	MCP	Type-safe parameters, structured output

MCP Tools (21 — all read-only)

Tool	Description
`list_virtual_machines`	List VMs with filtering (power state, sort, limit, `folder_filter` for case-insensitive folder-tree search); each VM includes `folder_path`
`list_esxi_hosts`	ESXi hosts with CPU, memory, version, uptime
`list_all_datastores`	Datastores with capacity, free space, type
`list_all_clusters`	Clusters with host count, DRS/HA status
`list_all_networks`	Networks with attached VM count and accessibility
`get_alarms`	All active/triggered alarms — includes `suggested_actions` remediation hints
`get_events`	Recent events filtered by severity and time — includes `suggested_actions` hints
`get_host_sensors`	Hardware sensor status (temperature/voltage/fan) per host with green/yellow/red health
`get_host_services`	Host service status (running state and startup policy), optionally filtered by host
`vm_info`	Detailed VM info (CPU, memory, disks, NICs, snapshots)
`vm_list_snapshots`	Snapshot list for one VM with nesting hierarchy (read-only)
`host_performance`	Real-time host CPU/mem/disk/net utilisation (PerfManager); busiest first
`vm_performance`	Real-time VM CPU/mem/disk/net utilisation (top 25 by default); powered-on only
`snapshot_aging`	Inventory-wide snapshot sweep with age + sprawl; flags snapshots older than N days
`certificate_status`	Per-host ESXi management certificate expiry (days until expiry, expiring flag)
`license_status`	vCenter/ESXi license inventory with used/total and expiration
`ntp_status`	Per-host NTP config health (servers + ntpd state); live offset not in SOAP API
`datastore_capacity`	Datastore over-commit (provisioned vs capacity); thin-provisioning risk
`resource_pool_usage`	Resource-pool CPU/memory reservation, limit, and current usage
`active_tasks`	In-flight (and recently completed) vCenter tasks with progress/errors
`active_sessions`	Currently authenticated vCenter/ESXi sessions (who is logged in)

All tools are read-only. No tool can modify, create, or delete any resource. Performance/capacity readings are point-in-time samples — this skill retains no history, so it never reports a fabricated "trend" or runway date.

CLI Quick Reference

vmware-monitor inventory vms [--target <t>] [--limit 20] [--power-state poweredOn]
vmware-monitor inventory hosts [--target <t>]
vmware-monitor inventory datastores [--target <t>]
vmware-monitor inventory clusters [--target <t>]
vmware-monitor inventory networks [--target <t>]
vmware-monitor health alarms [--target <t>]
vmware-monitor health events [--hours 24] [--severity warning]
vmware-monitor health sensors [--target <t>]
vmware-monitor health services [--host <esxi>] [--target <t>]
vmware-monitor perf hosts [--host <esxi>] [--target <t>]
vmware-monitor perf vms [--vm <name>] [--limit 25] [--target <t>]
vmware-monitor capacity datastores [--target <t>]
vmware-monitor capacity pools [--target <t>]
vmware-monitor infra certs [--warn-days 30] [--target <t>]
vmware-monitor infra licenses [--target <t>]
vmware-monitor infra ntp [--host <esxi>] [--target <t>]
vmware-monitor snapshots aging [--threshold 30] [--only-old] [--target <t>]
vmware-monitor activity tasks [--active-only] [--target <t>]
vmware-monitor activity sessions [--target <t>]
vmware-monitor vm info <vm-name> [--target <t>]
vmware-monitor scan now [--target <t>]
vmware-monitor daemon start|stop|status
vmware-monitor doctor [--skip-auth]

Full CLI reference: see references/cli-reference.md

Troubleshooting

Alarms returns empty but vCenter shows alarms

The get_alarms tool queries triggered alarms at the root folder level. Some alarms are entity-specific — try checking events instead: get_events --hours 1 --severity info.

"Connection refused" error

Run vmware-monitor doctor to diagnose
Verify target hostname/IP and port (443) in config.yaml
For self-signed certs: set disableSslCertValidation: true

Events returns too many results

Use severity filter: --severity warning (default) filters out info-level events. Use --hours 4 to narrow time range.

VM info shows "guest_os: unknown"

VMware Tools not installed or not running in the guest. Install/start VMware Tools for guest OS detection, IP address, and guest family info.

Doctor passes but commands fail with timeout

vCenter may be under heavy load. Try targeting a specific ESXi host directly instead of vCenter, or increase connection timeout in config.yaml.

Setup

uv tool install vmware-monitor
vmware-monitor init      # guided: prompts for host/user/password, writes config + .env (chmod 600), then verifies

init stores the password grep-safe (obfuscated b64:, never plaintext) and locks .env to 0600. Prefer it over hand-editing. Manual alternative:

mkdir -p ~/.vmware-monitor
cp config.example.yaml ~/.vmware-monitor/config.yaml
cp .env.example ~/.vmware-monitor/.env && chmod 600 ~/.vmware-monitor/.env
# Edit config.yaml (targets) and .env (passwords), then: vmware-monitor doctor

All tools are automatically audited via vmware-policy. Audit logs: vmware-audit log --last 20

Full setup guide, security details, and AI platform compatibility: see references/setup-guide.md

Audit & Safety

All operations are automatically audited via vmware-policy (@vmware_tool decorator):

Every tool call logged to ~/.vmware/audit.db (SQLite, framework-agnostic)
Policy rules enforced via ~/.vmware/rules.yaml (deny rules, maintenance windows, risk levels)
Risk classification: each tool tagged as low/medium/high/critical
View recent operations: vmware-audit log --last 20
View denied operations: vmware-audit log --status denied

vmware-policy is automatically installed as a dependency — no manual setup needed.

License

MIT — github.com/zw008/VMware-Monitor