name: Sysadmin description: Manage Linux servers with user administration, process control, storage, and system maintenance. metadata: {"clawdbot":{"emoji":"๐ฅ๏ธ","os":["linux","darwin"]}}
System Administration Rules
User Management
- Create service accounts with
--systemflag โ no home directory, no login shell sudowith specific commands, not blanket ALL โ principle of least privilege- Lock accounts instead of deleting:
usermod -Lโ preserves audit trail and file ownership - SSH keys in
~/.ssh/authorized_keyswith restrictive permissions โ 600 for file, 700 for directory visudoto edit sudoers โ catches syntax errors before saving, prevents lockout
Process Management
systemctlfor services, notserviceโ systemd is standard on modern distrosjournalctl -u service -ffor live logs โ more powerful than tail on log filesniceandionicefor background tasks โ don't compete with production workloads- Kill signals: SIGTERM (15) first, SIGKILL (9) last resort โ SIGKILL doesn't allow cleanup
nohuporscreen/tmuxfor long-running commands โ SSH disconnect kills regular processes
File Systems and Storage
df -hfor disk usage,du -sh *to find culprits โ check before disk fills completelylsof +D /pathfinds processes using a directory โ needed before unmountingncdufor interactive disk usage โ faster than repeated du commands- Mount options matter:
noexec,nosuidfor security on data partitions - Resize filesystems with care: grow is safe, shrink risks data loss โ always backup first
Logs and Monitoring
logrotateprevents disk fill โ configure size limits and retention- Centralize logs to external system โ local logs lost if server dies
/var/log/auth.logor/var/log/securefor login attempts โ watch for brute forcedmesgfor kernel messages โ hardware errors, OOM kills appear here- Monitor inode usage, not just disk space โ many small files exhaust inodes
Permissions and Security
chmod 600for secrets,640for configs,644for public โ world-writable is almost never correct- Sticky bit on shared directories (
chmod +t) โ users can only delete their own files setfaclfor complex permissions โ when traditional owner/group/other isn't enoughchattr +imakes files immutable โ even root can't modify without removing flag- SELinux/AppArmor in enforcing mode โ permissive logs but doesn't protect
Package Management
apt updatebeforeapt upgradeโ upgrade without update uses stale package lists- Unattended security updates:
unattended-upgradesโ critical patches shouldn't wait - Pin package versions in production โ unexpected upgrades cause unexpected outages
- Remove unused packages:
apt autoremoveโ reduces attack surface and disk usage - Know your package manager: apt/yum/dnf/pacman โ commands differ, concepts similar
Backups
- Test restores regularly โ backups that can't restore are worthless
- Include package lists and configs, not just data โ recreating environment is painful
- Offsite backups mandatory โ local backups don't survive disk failure or ransomware
- Backup before any risky change โ "I'll just quickly edit" famous last words
- Document restore procedure โ 3am disaster is wrong time to figure it out
Performance
top/htopfor live view,vmstatfor trends โ understand baseline before diagnosingiotopfor disk I/O bottlenecks โ slow disk often blamed on CPU- Load average: 1.0 per core is healthy โ consistently higher means queuing
- Swap usage isn't inherently bad โ but consistent swapping indicates memory shortage
sarfor historical data โ retroactively diagnose what happened during incident
Networking Basics
ss -tulpnshows listening ports โnetstatis deprecatedip addrandip routereplaceifconfigandrouteโ learn the new tools- Check both host firewall and cloud security groups โ traffic blocked at either level fails
/etc/hostsfor local overrides โ quick testing without DNS changescurl -vshows full connection details โ headers, timing, TLS handshake
Common Mistakes
- Running services as root โ one exploit owns the system
- No monitoring until something breaks โ reactive is expensive
- Editing config without backup โ
cp file file.baktakes two seconds - Rebooting to "fix" issues โ masks the problem, it'll return
- Ignoring disk space warnings โ 100% full causes cascading failures
- Forgetting timezone configuration โ logs from different servers don't correlate