linux-disaster-recovery - SKILL.md Agent Skill

name: linux-disaster-recovery description: Restore from GPG-encrypted backups on Debian/Ubuntu and RHEL-family servers (Fedora, RHEL, CentOS Stream, Rocky, Alma, Oracle). Covers MySQL database restore (single DB or full), app file restore, config snapshots, and emergency recovery checklist. Backup/restore logic is portable across both families; recovery-time tooling differs at a few critical points — bootloader paths and GRUB regeneration, initramfs rebuild tooling, and filesystem repair tools — get these wrong and a box won't boot. Backups are AES256 GPG encrypted, stored locally and on Google Drive via rclone. Always confirms before any destructive restore. license: MIT metadata: author: Peter Bamuhigire author_url: techguypeter.com author_contact: "+256784464178"

Disaster Recovery

Distro support

Two-family skill. Backup/restore strategy and systemd rescue/emergency targets are identical; recovery-time tooling differs at a few critical points — get these wrong and a box won't boot. Body uses Debian/Ubuntu; substitute per this matrix.

Recovery concept	Debian/Ubuntu	RHEL family
Reinstall packages	`apt install --reinstall`	`dnf reinstall`
Regenerate GRUB	`update-grub`	`grub2-mkconfig -o /boot/grub2/grub.cfg`
GRUB config path	`/boot/grub/grub.cfg`	`/boot/grub2/grub.cfg` (UEFI: `/boot/efi/EFI/<distro>/`)
Rebuild initramfs	`update-initramfs -u`	`dracut -f`
Default root FS / repair	ext4 → `e2fsck`	xfs → `xfs_repair` (xfs can't shrink)
Restore networking	Netplan	NetworkManager/`nmcli`
Restore firewall	`ufw`	`firewalld`
Rescue target	`systemctl rescue`	identical

RHEL-family note: the two recovery actions that most often differ are GRUB regeneration (grub2-mkconfig, not update-grub) and initramfs rebuild (dracut -f, not update-initramfs). Root is usually XFS — use xfs_repair, and remember XFS cannot be shrunk. See docs/multi-distro/plan.md. In sk-* scripts use the common.sh package primitives.

Use when

Data has been lost, corrupted, or overwritten and a restore may be required.
You need to recover databases, application files, or config snapshots from backups.
You need an emergency recovery checklist during a production incident.

Do not use when

The problem is only a service outage or bad config that can be fixed in place; use linux-service-management or linux-troubleshooting.
The task is a routine backup review rather than an actual restore path.
The task is creating backups — building rsync/tar archives, incremental snapshots, or filesystem (LVM/ZFS/Btrfs) snapshots. That now lives in the 13-backup-and-archiving category: linux-rsync-sync (offsite/incremental rsync), linux-archive-integrity (tar create + verify), and linux-filesystem-snapshots (LVM/ZFS/Btrfs). This skill stays focused on restore and emergency recovery.

Required inputs

The incident time window and affected data set.
The candidate backup location, timestamp, and encryption details.
Explicit confirmation that a destructive restore is acceptable.

Workflow

Confirm this is true data loss or corruption, not a recoverable service issue.
Locate the newest safe backup from before the incident.
Follow the matching restore path and confirm destructive impact before execution.
Verify service health, data integrity, and post-restore access before closing the incident.

Quality standards

Restore the smallest correct scope first when possible.
Use timestamps and incident facts to choose the backup, not guesswork.
Verification after restore is mandatory.

Anti-patterns

Restoring over live data without explicit confirmation.
Choosing the latest backup without checking whether it already contains the bad state.
Ending the incident after the restore command without service and data validation.

Outputs

The selected backup and restore path.
The recovery commands and confirmations required.
A post-restore verification summary with any remaining risk.

References

This skill is self-contained. Every command below works on a stock Debian/Ubuntu or RHEL-family server (substitute per the Distro support matrix above). The sk-* scripts in the Optional fast path section at the bottom are convenience wrappers — never required.

Always confirm before restoring. A restore overwrites existing data. Never start a restore without typing the full word yes at the prompt, even in non-interactive mode.

Step 1: Assess First

# Is this a service crash (restart only) or actual data loss?
sudo systemctl status nginx mysql postgresql php8.3-fpm

# When did it happen?
sudo journalctl --since "2 hours ago" | grep -iE "error|fail|crash" | head -20

Service crash → restart it (linux-service-management), no restore needed. Data loss/corruption → proceed below.

Step 2: Find The Right Backup

# Local backups (7-day retention)
ls -lth ~/backups/mysql/*.gpg 2>/dev/null | head -10

# Google Drive (3-day retention for MySQL)
rclone ls gdrive:<backup-folder> 2>/dev/null | sort | tail -10

# If rclone token expired:
rclone config reconnect gdrive:

Choose the backup closest to before the incident.

Step 3: Restore

Full restore procedure (decrypt → extract → import): See references/restore-procedures.md.

The procedure, condensed:

# Decrypt (enter passphrase when prompted)
gpg --decrypt backup.sql.gz.gpg > backup.sql.gz

# Inspect size and sanity
gunzip -l backup.sql.gz
zcat backup.sql.gz | head -20

# Stop the service that writes to the DB
sudo systemctl stop nginx apache2 php8.3-fpm

# Restore (confirm first!)
zcat backup.sql.gz | mysql -u root -p <database>

# Restart
sudo systemctl start php8.3-fpm apache2 nginx

Emergency Checklist

# 1. Stop affected service to prevent further damage
sudo systemctl stop <service>

# 2. Find best backup (Step 2 above)

# 3. Decrypt → restore → verify (references/restore-procedures.md)

# 4. Restart all services
sudo systemctl start nginx mysql php8.3-fpm apache2

# 5. Re-run security audit
sudo bash ~/.claude/skills/scripts/server-audit.sh

# 6. Clean up
rm -rf ~/restore/

Demo/Dev Reset (Git-Tracked SQL Dump Pattern)

Some apps ship a git-tracked SQL dump as the demo DB source of truth. A reset script drops and recreates from that dump:

ls /usr/local/bin/reset-*           # find available reset scripts
sudo reset-<app>-from-git           # requires typing YES
ls /var/backups/<app>/              # safety backup always created first

Optional fast path (when sk-* scripts are installed)

Running sudo install-skills-bin linux-disaster-recovery installs wrappers for the above:

Task	Fast-path script
Verify last backup is usable	`sudo sk-backup-verify`
Guided restore (pick backup, preview, confirm)	`sudo sk-restore-wizard`
MySQL restore from a specific file	`sudo sk-mysql-restore --file <path>`
PostgreSQL restore	`sudo sk-postgres-restore --file <path>`
Site file restore	`sudo sk-site-restore --backup <path> --target <dir>`
Maintenance mode on/off	`sudo sk-emergency-mode on\|off`

These are optional wrappers around the commands above.

Demo/Dev Reset (Git-Tracked SQL Dump Pattern)

Some apps ship a git-tracked SQL dump as the demo DB source of truth. A reset script drops and recreates from that dump:

ls /usr/local/bin/reset-*           # find available reset scripts
sudo reset-<app>-from-git           # requires typing YES
ls /var/backups/<app>/              # safety backup always created first

Scripts

This skill installs the following scripts to /usr/local/bin/. To install:

sudo install-skills-bin linux-disaster-recovery

Script	Source	Core?	Purpose
sk-backup-verify	scripts/sk-backup-verify.sh	yes	Verify last backup age, integrity (tar/gpg check), remote copy reachable via rclone.
sk-mysql-backup	scripts/sk-mysql-backup.sh	yes	Dump all databases with gzip + gpg + rclone upload; rotate local and remote. Refactored to source common.sh and honor standard flags.
sk-mysql-restore	scripts/sk-mysql-restore.sh	no	Guided restore: list backups, pick, download, decrypt, show sizes, confirm, restore.
sk-postgres-backup	scripts/sk-postgres-backup.sh	no	`pg_dump` + compression + gpg + rclone, per database or all, with rotation.
sk-postgres-restore	scripts/sk-postgres-restore.sh	no	Guided PostgreSQL restore from backup file or remote.
sk-site-backup	scripts/sk-site-backup.sh	no	Tar a full site directory, exclude cache/node_modules, gpg, upload via rclone.
sk-site-restore	scripts/sk-site-restore.sh	no	Restore a site backup to original path with permission repair.
sk-config-snapshot	scripts/sk-config-snapshot.sh	no	Snapshot `/etc/` (and other declared dirs) to a git-tracked archive; diff against previous.
sk-restore-wizard	scripts/sk-restore-wizard.sh	no	Interactive guided restore: pick backup set, pick target, preview, confirm, execute.
sk-emergency-mode	scripts/sk-emergency-mode.sh	no	Toggle maintenance mode: drop Nginx to 503 page, stop non-essential services, show live status.