sysadmin

star 0

Manage Linux servers with user administration, process control, storage, and system maintenance.

zangxin75 By zangxin75 schedule Updated 3/1/2026

name: Sysadmin description: Manage Linux servers with user administration, process control, storage, and system maintenance. metadata: {"clawdbot":{"emoji":"๐Ÿ–ฅ๏ธ","os":["linux","darwin"]}}

System Administration Rules

User Management

  • Create service accounts with --system flag โ€” no home directory, no login shell
  • sudo with specific commands, not blanket ALL โ€” principle of least privilege
  • Lock accounts instead of deleting: usermod -L โ€” preserves audit trail and file ownership
  • SSH keys in ~/.ssh/authorized_keys with restrictive permissions โ€” 600 for file, 700 for directory
  • visudo to edit sudoers โ€” catches syntax errors before saving, prevents lockout

Process Management

  • systemctl for services, not service โ€” systemd is standard on modern distros
  • journalctl -u service -f for live logs โ€” more powerful than tail on log files
  • nice and ionice for background tasks โ€” don't compete with production workloads
  • Kill signals: SIGTERM (15) first, SIGKILL (9) last resort โ€” SIGKILL doesn't allow cleanup
  • nohup or screen/tmux for long-running commands โ€” SSH disconnect kills regular processes

File Systems and Storage

  • df -h for disk usage, du -sh * to find culprits โ€” check before disk fills completely
  • lsof +D /path finds processes using a directory โ€” needed before unmounting
  • ncdu for interactive disk usage โ€” faster than repeated du commands
  • Mount options matter: noexec, nosuid for security on data partitions
  • Resize filesystems with care: grow is safe, shrink risks data loss โ€” always backup first

Logs and Monitoring

  • logrotate prevents disk fill โ€” configure size limits and retention
  • Centralize logs to external system โ€” local logs lost if server dies
  • /var/log/auth.log or /var/log/secure for login attempts โ€” watch for brute force
  • dmesg for kernel messages โ€” hardware errors, OOM kills appear here
  • Monitor inode usage, not just disk space โ€” many small files exhaust inodes

Permissions and Security

  • chmod 600 for secrets, 640 for configs, 644 for public โ€” world-writable is almost never correct
  • Sticky bit on shared directories (chmod +t) โ€” users can only delete their own files
  • setfacl for complex permissions โ€” when traditional owner/group/other isn't enough
  • chattr +i makes files immutable โ€” even root can't modify without removing flag
  • SELinux/AppArmor in enforcing mode โ€” permissive logs but doesn't protect

Package Management

  • apt update before apt upgrade โ€” upgrade without update uses stale package lists
  • Unattended security updates: unattended-upgrades โ€” critical patches shouldn't wait
  • Pin package versions in production โ€” unexpected upgrades cause unexpected outages
  • Remove unused packages: apt autoremove โ€” reduces attack surface and disk usage
  • Know your package manager: apt/yum/dnf/pacman โ€” commands differ, concepts similar

Backups

  • Test restores regularly โ€” backups that can't restore are worthless
  • Include package lists and configs, not just data โ€” recreating environment is painful
  • Offsite backups mandatory โ€” local backups don't survive disk failure or ransomware
  • Backup before any risky change โ€” "I'll just quickly edit" famous last words
  • Document restore procedure โ€” 3am disaster is wrong time to figure it out

Performance

  • top/htop for live view, vmstat for trends โ€” understand baseline before diagnosing
  • iotop for disk I/O bottlenecks โ€” slow disk often blamed on CPU
  • Load average: 1.0 per core is healthy โ€” consistently higher means queuing
  • Swap usage isn't inherently bad โ€” but consistent swapping indicates memory shortage
  • sar for historical data โ€” retroactively diagnose what happened during incident

Networking Basics

  • ss -tulpn shows listening ports โ€” netstat is deprecated
  • ip addr and ip route replace ifconfig and route โ€” learn the new tools
  • Check both host firewall and cloud security groups โ€” traffic blocked at either level fails
  • /etc/hosts for local overrides โ€” quick testing without DNS changes
  • curl -v shows full connection details โ€” headers, timing, TLS handshake

Common Mistakes

  • Running services as root โ€” one exploit owns the system
  • No monitoring until something breaks โ€” reactive is expensive
  • Editing config without backup โ€” cp file file.bak takes two seconds
  • Rebooting to "fix" issues โ€” masks the problem, it'll return
  • Ignoring disk space warnings โ€” 100% full causes cascading failures
  • Forgetting timezone configuration โ€” logs from different servers don't correlate
Install via CLI
npx skills add https://github.com/zangxin75/openclaw-skills --skill sysadmin
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator