Linux ServerOps — Playbook of a SysAdmin

Cover Image

🛠️ Ground-up Manual Runbook (root SSH)

This prescriptive runbook is written for a fresh VPS/VM where you have an SSH root session. It is intentionally copy-pasteable: run the commands as shown, adapting hostnames, keys, and device names. Debian/Ubuntu and Rocky Linux examples are given side-by-side when they differ.

1) Verify system

cat /etc/os-release; uname -a; uptime

Why: Confirm distro and kernel to choose correct packages and compatibility.

What it does: Prints OS release, kernel version, and uptime.

Cautions: None — read-only.

Notes: If kernel is very old, consider reprovisioning with an LTS image.

2) Set hostname and timezone

hostnamectl set-hostname web1.example.com
timedatectl set-timezone Etc/UTC
timedatectl status

Why: Hostname identifies machine in logs and monitoring; timezone ensures consistent timestamps.

What it does: Sets system hostname and timezone, updates systemd metadata.

Cautions: Changing hostname may affect SSL/Certbot configurations and monitoring tags — update those after change.

Notes: Use UTC for servers unless a strong reason to use local zone.

3) Update packages

# Debian/Ubuntu
apt update && apt upgrade -y && apt autoremove -y

# Rocky Linux
dnf upgrade -y && dnf autoremove -y

Why: Apply security updates and bug fixes before installing services.

What it does: Updates package lists and upgrades installed packages.

Cautions: On production with critical services, avoid immediate full upgrades without testing — prefer staged windows.

Notes: For unattended systems, enable automatic security updates after an initial manual verification.

4) Install core tools

# Debian/Ubuntu
apt install -y sudo curl wget git vim htop unzip rsync lsof

# Rocky Linux
dnf install -y epel-release
dnf install -y sudo curl wget git vim htop unzip rsync lsof

Why: Provide essential debugging and admin tools for later steps.

What it does: Installs commonly used CLI utilities.

Cautions: Minimal risk; installing EPEL on Rocky adds community packages — verify EPEL source.

Notes: Add build-essential or gcc only if compiling locally.

5) Create `deploy` user and SSH keys

adduser deploy
usermod -aG sudo deploy   # Debian/Ubuntu
usermod -aG wheel deploy   # Rocky

mkdir -p /home/deploy/.ssh && chmod 700 /home/deploy/.ssh
echo 'ssh-rsa AAAA... your-key' > /home/deploy/.ssh/authorized_keys
chmod 600 /home/deploy/.ssh/authorized_keys
chown -R deploy:deploy /home/deploy/.ssh

Why: Avoid daily root usage; enforce key-based auth for the admin account.

What it does: Adds a non-root user with sudo privileges and installs provided public key.

Cautions: Ensure the public key is correct; lock yourself out by misplacing keys or permissions.

Notes: Test SSH into a new session before disabling root.

6) Disable root SSH (after confirming `deploy` sudo works)

sed -i 's/^PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config || true
sed -i 's/^PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config || true
systemctl reload sshd

Why: Reduces attack surface by preventing direct root SSH logins and password-based brute force.

What it does: Updates SSHD config to disable root and password auth, reloads service.

Cautions: If deploy lacks sudo or key is wrong, you may lose access — always verify first.

Notes: Consider retaining a temporary escape (console access) until confirmed.

7) Create swapfile

fallocate -l 4G /swapfile || dd if=/dev/zero of=/swapfile bs=1M count=4096
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile none swap sw 0 0' >> /etc/fstab

Why: Provides memory headroom for spikes and prevents OOM kills on small VMs.

What it does: Creates and enables a swapfile and adds it to fstab for persistence.

Cautions: Swap on SSDs may cause wear but is acceptable for moderate use; ensure not to swap heavily.

Notes: Adjust size based on RAM and workload; monitor swapon --show.

8) Mount attached block volume (example `/dev/vdb`)

Warning: verify device names before running.

parted -s /dev/vdb mklabel gpt mkpart primary 0% 100%
mkfs.xfs /dev/vdb1
mkdir -p /var/lib/mydata
echo '/dev/vdb1 /var/lib/mydata xfs defaults,noatime 0 2' >> /etc/fstab
mount -a

Why: Keep data (DB, logs) on separate volumes for snapshotting and performance.

What it does: Partitions, formats, and mounts a block device to a chosen path.

Cautions: Ensure /dev/vdb is the correct device — formatting the wrong device destroys data.

Notes: Use lsblk and blkid to confirm devices beforehand; use xfs for large files and consistent performance.

9) Firewall

# Debian/Ubuntu (ufw)
apt install -y ufw
ufw default deny incoming
ufw default allow outgoing
ufw allow 22/tcp
ufw allow 80,443/tcp
ufw --force enable

# Rocky Linux (firewalld)
dnf install -y firewalld
systemctl enable --now firewalld
firewall-cmd --permanent --add-service=ssh
firewall-cmd --permanent --add-service=http
firewall-cmd --permanent --add-service=https
firewall-cmd --reload

Why: Limit exposed services to only what’s required.

What it does: Sets default deny and opens SSH/HTTP/S ports.

Cautions: If you change SSH port or enable complex rules, keep an active session to avoid lockout.

Notes: For multi-node setups, open private network ranges as needed.

10) Apply kernel tuning (sysctl)

cat <<'EOF' > /etc/sysctl.d/99-serverops.conf
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1
net.core.somaxconn = 1024
fs.file-max = 200000
vm.swappiness = 10
EOF

sysctl --system

Why: Improve TCP handling and file descriptor limits for server workloads.

What it does: Writes tuned kernel parameters and applies them system-wide.

Cautions: Values depend on workload; don’t set extremely high values without capacity planning.

Notes: Monitor ss and ulimit -n and adjust as necessary.

11) Fail2Ban

# Debian/Ubuntu
apt install -y fail2ban

# Rocky (EPEL)
dnf install -y fail2ban
systemctl enable --now fail2ban

cat <<'EOF' > /etc/fail2ban/jail.local
[sshd]
enabled = true
port = ssh
logpath = /var/log/auth.log
maxretry = 5
EOF
systemctl restart fail2ban

Why: Automatically block repeated malicious login attempts.

What it does: Watches auth logs and bans IPs exceeding failed attempts.

Cautions: Ensure log path matches distro (Rocky may use /var/log/secure) — adjust logpath accordingly.

Notes: Monitor banned IPs with fail2ban-client status sshd.

12) PostgreSQL quick start

# Debian/Ubuntu
apt install -y postgresql postgresql-contrib
systemctl enable --now postgresql

# Rocky
dnf install -y postgresql-server postgresql-contrib
postgresql-setup --initdb
systemctl enable --now postgresql

sudo -u postgres psql -c "CREATE USER deploy WITH PASSWORD 'strongpassword';"
sudo -u postgres psql -c "CREATE DATABASE myapp OWNER deploy;"

Why: Bootstraps a database for application use.

What it does: Installs and initializes PostgreSQL, creates a user and DB.

Cautions: Replace 'strongpassword' with a secure secret and store it in a vault; don’t expose DB port publicly.

Notes: Configure pg_hba.conf to restrict connections to private networks only.

13) Backup script (pg_dump -> restic)

Create /usr/local/bin/pg-backup.sh and cron:

cat <<'EOF' > /usr/local/bin/pg-backup.sh
#!/bin/bash
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
OUT=/tmp/pgdump-${TIMESTAMP}.sql
sudo -u postgres pg_dumpall > ${OUT}
restic backup ${OUT} --tag db-backup
rm -f ${OUT}
EOF
chmod +x /usr/local/bin/pg-backup.sh
(crontab -l 2>/dev/null; echo "0 2 * * * /usr/local/bin/pg-backup.sh >/dev/null 2>&1") | crontab -

Why: Capture logical backups and push to remote durable storage.

What it does: Dumps all PostgreSQL databases, backs up with restic, and removes the local dump.

Cautions: Ensure restic repository credentials are configured in env vars or config; test restores.

Notes: For large DBs prefer base backups + WAL shipping for PITR.

14) Node Exporter for metrics

VERSION="1.5.0"
curl -LO https://github.com/prometheus/node_exporter/releases/download/v${VERSION}/node_exporter-${VERSION}.linux-amd64.tar.gz
tar xzf node_exporter-${VERSION}.linux-amd64.tar.gz
mv node_exporter-${VERSION}.linux-amd64/node_exporter /usr/local/bin/
cat <<'EOF' > /etc/systemd/system/node_exporter.service
[Unit]
Description=Prometheus Node Exporter
After=network.target

[Service]
User=nobody
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=default.target
EOF
systemctl daemon-reload
systemctl enable --now node_exporter

Why: Expose host metrics to Prometheus for performance and capacity monitoring.

What it does: Installs node_exporter binary and runs it as a systemd service.

Cautions: Node exporter exposes system metrics — firewall and ACLs should restrict access to the monitoring network.

Notes: Scrape via Prometheus with a job that targets the instance IP.

15) Nginx + Certbot

# Debian/Ubuntu
apt install -y nginx certbot python3-certbot-nginx

# Rocky
dnf install -y nginx certbot python3-certbot-nginx

systemctl enable --now nginx
cat <<'EOF' > /etc/nginx/sites-available/myapp.conf
server {
  listen 80;
  server_name example.com www.example.com;

  location / {
    proxy_pass http://127.0.0.1:3000;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }
}
EOF
ln -sf /etc/nginx/sites-available/myapp.conf /etc/nginx/sites-enabled/myapp.conf || true
nginx -t && systemctl reload nginx
certbot --nginx -d example.com -d www.example.com

Why: Terminate TLS and reverse-proxy to application processes.

What it does: Installs Nginx, configures a proxy vhost, and obtains a Let’s Encrypt certificate.

Cautions: Certbot needs ports 80/443 open; ensure DNS points to the server before requesting certs. For automation, use DNS challenge for wildcard certs.

Notes: Use nginx -t to validate config and certbot renew --dry-run to test renewals.

16) systemd unit for your app

cat <<'EOF' > /etc/systemd/system/myapp.service
[Unit]
Description=MyApp
After=network.target

[Service]
User=deploy
WorkingDirectory=/srv/myapp
ExecStart=/usr/bin/node /srv/myapp/dist/index.js
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable --now myapp

Why: Manage the application process reliably with automatic restarts and logs.

What it does: Adds a systemd service which starts the app and restarts on failure.

Cautions: Keep WorkingDirectory and ExecStart paths correct; inspect journalctl -u myapp on failures.

Notes: For zero-downtime deploys use socket activation or a rolling restart strategy across replicas.

17) Snapshot and restore testing

Create a provider snapshot after provisioning and note the ID. Perform a monthly test restore to a throwaway instance.

Why: Snapshots allow rapid recovery from catastrophic failures.

What it does: Captures the disk state to a provider snapshot.

Cautions: Snapshots are not a substitute for off-site backups; store backups in separate regions/storage when possible.

Notes: Automate snapshot creation and retention via provider API.

18) Troubleshooting quick commands

journalctl -u myapp -f
systemctl status myapp
ss -tulpen | grep 3000
df -h
free -m
top

Why: Quick diagnostics to assess service health, listening sockets, and resource usage.

What it does: Streams logs, checks unit status, inspects sockets and resource usage.

Cautions: Use top/htop cautiously on production (read-only) — avoid interactive commands in scripts.

Notes: Capture journalctl -u myapp --since "1 hour ago" for post-incident analysis.

19) Recovery & restore

Use provider console and recovery ISO/serial if instance fails to boot. Mount root disk from recovery environment and fix /etc/ssh/authorized_keys or configs.
Restore data: restic restore to a new instance and reconfigure networking.

Why: Prepare for kernel panic, boot failures or accidental lockouts.

What it does: Provides manual steps to access disks and recover files or restore backups.

Cautions: When restoring, ensure networking, hostnames and SSH keys are reconfigured to avoid confusion.

Notes: Keep a documented recovery procedure with provider-specific console links.

20) Handoff checklist

Confirm deploy can sudo and SSH
Confirm firewall rules, TLS auto-renewal, backups and snapshot IDs
Document IPs, secrets location, and escalation contacts

Why: Ensure operational readiness before marking a server as production.

What it does: Validates access, security, backups and documentation are in place.

Cautions: Don’t skip restore tests — a green backup indicator is not a successful restore.

Notes: Store runbook and credentials in a team-accessible vault and link to incident playbooks.

🚀 Provisioning & First Boot

Essentials:

Use provider images: Ubuntu LTS, Debian Stable, or Rocky Linux LTS
Bootstrap with cloud-init when available, otherwise run the manual runbook above
Prefer small, immutable images and consistent build artifacts from CI

# cloud-config example (optional)
users:
  - name: deploy
    sudo: ['ALL=(ALL) NOPASSWD:ALL']
    ssh-authorized-keys:
      - ssh-rsa AAAA... your-key
packages:
  - git
  - curl
runcmd:
  - [ sh, -lc, "echo 'Provisioned' > /etc/motd" ]

🛡️ Initial Hardening

Start hardening early — many failures are configuration mistakes on day one. Minimum steps:

Create a non-root admin user and disable root SSH after verifying sudo access.
Enforce SSH key authentication and disable passwords.
Enable automatic security updates where appropriate.

Examples (Debian/Ubuntu vs Rocky):

# Create admin user
adduser deploy
usermod -aG sudo deploy    # Debian/Ubuntu
usermod -aG wheel deploy    # Rocky

# Disable root login and passwords
sed -i 's/^PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config || true
sed -i 's/^PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config || true
systemctl reload sshd

# Auto-updates
apt install -y unattended-upgrades && dpkg-reconfigure --priority=low unattended-upgrades
dnf install -y dnf-automatic && systemctl enable --now dnf-automatic.timer

Add Fail2Ban, auditd, and keep SELinux/AppArmor enforced (Rocky uses SELinux; Debian uses AppArmor).

💾 Storage, Filesystems & Swap

Place databases and logs on separate block volumes. Use xfs or ext4 per workload.
Prefer swapfiles on cloud VMs for resize flexibility.

Example:

mkfs.xfs /dev/vdb1
mkdir -p /var/lib/postgresql
echo '/dev/vdb1 /var/lib/postgresql xfs defaults,noatime 0 2' >> /etc/fstab
mount -a

fallocate -l 4G /swapfile && chmod 600 /swapfile && mkswap /swapfile && swapon /swapfile
echo '/swapfile none swap sw 0 0' >> /etc/fstab

🌐 Networking & DNS

Use provider DNS or external DNS (Cloudflare/Route53). Reserve floating/reserved IPs for failover.
Use private networking between VMs when possible to keep traffic off public interfaces.

Reverse proxy + TLS (Nginx + Certbot):

# Debian/Ubuntu
apt install -y nginx certbot python3-certbot-nginx

# Rocky
dnf install -y nginx certbot python3-certbot-nginx

systemctl enable --now nginx
certbot --nginx -d example.com -d www.example.com

📈 Observability & Monitoring

Expose metrics via Node Exporter (Prometheus) and centralize logs with Vector/Fluentd or a managed logging service.
Lightweight options: Netdata or Glances for small VPS.

Node Exporter quick install (example):

VERSION="1.5.0"
curl -LO https://github.com/prometheus/node_exporter/releases/download/v${VERSION}/node_exporter-${VERSION}.linux-amd64.tar.gz
tar xzf node_exporter-${VERSION}.linux-amd64.tar.gz
mv node_exporter-${VERSION}.linux-amd64/node_exporter /usr/local/bin/

🔁 Backups & Disaster Recovery

Use provider snapshots for fast restores and restic/borg for file-level backups to S3-compatible storage.
Backup databases using logical dumps and WAL shipping for point-in-time recovery.

Restic example:

restic init -r s3:s3.amazonaws.com/my-bucket
restic backup /var/lib/postgresql --tag db-2026-02-08
restic forget --prune --keep-daily 7 --keep-weekly 4 --keep-monthly 12

🗂️ Logging & Rotation

Rotate local logs with logrotate and forward to a central log store. Example config for app logs:

/var/log/myapp/*.log {
  daily
  rotate 14
  compress
  missingok
  notifempty
  copytruncate
}

✅ Security & Compliance

Enforce least privilege, centralize secrets (Vault / KMS), run Trivy/Lynis scans, and apply CIS benchmarks where needed.

Quick scan:

curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
trivy fs /

🚚 Service Management & Deployment

Use systemd units for app services; prefer immutable artifacts from CI when possible.

Example unit:

[Unit]
Description=MyApp
After=network.target

[Service]
User=deploy
WorkingDirectory=/srv/myapp
ExecStart=/usr/bin/node dist/index.js
Restart=on-failure

[Install]
WantedBy=multi-user.target

🐳 Containers on VMs

Use Podman (rootless) or Docker where appropriate. Pull by digest in production and run minimal containers.

podman pull docker.io/myorg/myapp@sha256:...
podman run -d --name myapp -p 3000:3000 --restart=always myorg/myapp@sha256:...

⚖️ Scaling & Networking Patterns

Terminate TLS at the LB or reverse proxy. Use health checks and autoscaling for stateless services. Keep state on managed DBs or networked block storage.

🗺️ Provider Notes

DigitalOcean / Linode: snapshots, floating IPs, user-data
Hetzner: strong IO and volumes
Scaleway / Upcloud / Hostinger / Racknerd: check private networking and snapshot APIs

Always validate provider capabilities before relying on them.

🆘 Incident Response & Runbooks

Document steps to access serial console, restore from snapshots, and contact escalation points. Keep a tested post-mortem template and retention policy.

📋 Best Practices Checklist

✅ Use immutable artifacts from CI
✅ Automate provisioning when possible (cloud-init optional)
✅ Enforce SSH keys and disable root login
✅ Enable automated security updates
✅ Centralize logs and metrics
✅ Test backups and restores regularly
✅ Apply least-privilege and network segmentation
✅ Monitor health, disk, and performance

🖥️ Cockpit — web-based server admin (optional)

Why: Cockpit provides a lightweight web UI for server management (systemd, logs, networking, containers) which complements CLI workflows and is useful for visual inspection and quick ops.

Install (Debian/Ubuntu):

apt update
apt install -y cockpit
systemctl enable --now cockpit.socket
# allow firewall
ufw allow 9090/tcp

Install (Rocky Linux):

dnf install -y cockpit
systemctl enable --now cockpit.socket
firewall-cmd --permanent --add-port=9090/tcp
firewall-cmd --reload

What it does: exposes a browser UI on port 9090 for user management, system logs, journal, storage, networking, and container inspection (Podman integration available).

Cautions:

Cockpit is a convenience UI — do not rely on it for automation or as the sole access path. Keep strong auth, reserve access to admin networks, or proxy behind an internal LB with mTLS.
Exposing Cockpit directly to the Internet without additional controls is unsafe.

Notes:

Integrates with podman on systems that have it installed; install cockpit-podman for container UI.
Use provider firewall or SSH tunnel to restrict access to trusted IPs.

🚀 Bonus: Self-hosted PaaS setup & management

Why: A lightweight self-hosted PaaS streamlines app deployment (git push or image deploy), manages TLS, and provides simple scaling for small teams without full K8s complexity.

Options covered: Dokku, Coolify, and lightweight k3s (Kubernetes) for production-grade deployments.

Dokku (easy git-based PaaS)

Install (Ubuntu recommended):

# Prepare (Ubuntu)
apt update && apt install -y docker.io
curl -sSL https://raw.githubusercontent.com/dokku/dokku/v0.28.5/bootstrap.sh | sudo DOKKU_TAG=v0.28.5 bash

Usage:

Create an app on the server: dokku apps:create myapp
Add remote on developer machine: git remote add dokku dokku@server:myapp
Deploy: git push dokku main

Cautions:

Dokku relies on Docker; ensure Docker is secured and updated.
Single-node Dokku is fine for small apps, but for HA use multiple nodes and external DBs/storage.

Notes:

Use Dokku plugins for Postgres, redis, and Let’s Encrypt integration.

Coolify (self-hosted v4)

Use the official v4 installer or the documented manual deployment from the Coolify docs. Official docs and installers: https://coolify.io/self-hosted and https://coolify.io/docs/installation

Note

Coolify v4 bash script installer is the recommended way to deploy. And it is recommended to be used in a fresh provisioned VM/VPS.

Quick install (recommended — official installer):

# Official v4 installer (runs required containers and setup)
curl -fsSL https://cdn.coollabs.io/coolify/install.sh | sudo bash

Notes on prerequisites and alternatives:

The installer will detect Docker and required components; if you prefer manual setup, follow the repository/docker-compose manifest in the Coolify docs.
For production, configure an external PostgreSQL database and S3-compatible storage instead of relying on embedded volumes.

Usage:

Visit the Coolify web UI (default port configured by the installer) to finish setup and create the admin account.
Deploy apps via Git, Docker image, or connected repos; Coolify manages TLS (Let’s Encrypt), environment variables, and basic scaling.

Cautions & best practices:

Do not expose the management UI directly to the public internet — restrict access with firewall rules, VPN, or an auth proxy (OAuth, mTLS).
Back up Coolify’s data and metadata (or use external DB/S3). Regularly test restores.
Monitor resource usage; Coolify runs background workers and may need more than minimal VM specs for production workloads.

References and further reading:

Official self-hosted docs and install guide: https://coolify.io/self-hosted
GitHub repo: https://github.com/coollabsio/coolify

k3s (lightweight Kubernetes) — production-ready PaaS foundation

Install single-node k3s quickly:

curl -sfL https://get.k3s.io | sh -
# kubectl is available at /usr/local/bin/kubectl (or use k3s kubectl)

Basic app deploy:

kubectl create deployment myapp --image=myorg/myapp:latest
kubectl expose deployment myapp --type=NodePort --port=80 --target-port=3000

Ingress & TLS:

Install Traefik (Helm chart bundled with k3s) or configure ingress-nginx and cert-manager for Let’s Encrypt.

Cautions:

k3s reduces K8s complexity but still requires operational knowledge: storage (Longhorn), backups (Velero), and RBAC.
For single-node, there’s limited resilience — use multi-node clusters for production.

Management & backups:

Back up etcd (or embedded datastore) and Kubernetes manifests. Use kubectl get all -o yaml and store images in a registry.

When to choose which:

Dokku / Coolify: fastest for web apps and small teams; minimal infra work.
k3s: when you need Kubernetes features, multi-service orchestration, and more control over networking and storage.

Final cautions for self-hosted PaaS:

Plan for backups of app data and PaaS metadata (e.g., Dokku plugin databases, Coolify configs, k3s etcd).
Monitor node resource usage; PaaS control planes can be noisy on small VMs.
Secure CLIs and dashboard ports with firewall rules, VPNs or auth proxies.

🎯 Conclusion

This playbook now leads with a practical, ground-up runbook and follows with focused, reordered sections that prioritize action and clarity.