08.02.2026 • 15 min read

Linux ServerOps — Playbook of a SysAdmin

Cover Image

🛠️ Ground-up Manual Runbook (root SSH)

This prescriptive runbook is written for a fresh VPS/VM where you have an SSH root session. It is intentionally copy-pasteable: run the commands as shown, adapting hostnames, keys, and device names. Debian/Ubuntu and Rocky Linux examples are given side-by-side when they differ.

1) Verify system

cat /etc/os-release; uname -a; uptime

Why: Confirm distro and kernel to choose correct packages and compatibility.

What it does: Prints OS release, kernel version, and uptime.

Cautions: None — read-only.

Notes: If kernel is very old, consider reprovisioning with an LTS image.

2) Set hostname and timezone

hostnamectl set-hostname web1.example.com
timedatectl set-timezone Etc/UTC
timedatectl status

Why: Hostname identifies machine in logs and monitoring; timezone ensures consistent timestamps.

What it does: Sets system hostname and timezone, updates systemd metadata.

Cautions: Changing hostname may affect SSL/Certbot configurations and monitoring tags — update those after change.

Notes: Use UTC for servers unless a strong reason to use local zone.

3) Update packages

# Debian/Ubuntu
apt update && apt upgrade -y && apt autoremove -y

# Rocky Linux
dnf upgrade -y && dnf autoremove -y

Why: Apply security updates and bug fixes before installing services.

What it does: Updates package lists and upgrades installed packages.

Cautions: On production with critical services, avoid immediate full upgrades without testing — prefer staged windows.

Notes: For unattended systems, enable automatic security updates after an initial manual verification.

4) Install core tools

# Debian/Ubuntu
apt install -y sudo curl wget git vim htop unzip rsync lsof

# Rocky Linux
dnf install -y epel-release
dnf install -y sudo curl wget git vim htop unzip rsync lsof

Why: Provide essential debugging and admin tools for later steps.

What it does: Installs commonly used CLI utilities.

Cautions: Minimal risk; installing EPEL on Rocky adds community packages — verify EPEL source.

Notes: Add build-essential or gcc only if compiling locally.

5) Create deploy user and SSH keys

adduser deploy
usermod -aG sudo deploy   # Debian/Ubuntu
usermod -aG wheel deploy   # Rocky

mkdir -p /home/deploy/.ssh && chmod 700 /home/deploy/.ssh
echo 'ssh-rsa AAAA... your-key' > /home/deploy/.ssh/authorized_keys
chmod 600 /home/deploy/.ssh/authorized_keys
chown -R deploy:deploy /home/deploy/.ssh

Why: Avoid daily root usage; enforce key-based auth for the admin account.

What it does: Adds a non-root user with sudo privileges and installs provided public key.

Cautions: Ensure the public key is correct; lock yourself out by misplacing keys or permissions.

Notes: Test SSH into a new session before disabling root.

6) Disable root SSH (after confirming deploy sudo works)

sed -i 's/^PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config || true
sed -i 's/^PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config || true
systemctl reload sshd

Why: Reduces attack surface by preventing direct root SSH logins and password-based brute force.

What it does: Updates SSHD config to disable root and password auth, reloads service.

Cautions: If deploy lacks sudo or key is wrong, you may lose access — always verify first.

Notes: Consider retaining a temporary escape (console access) until confirmed.

7) Create swapfile

fallocate -l 4G /swapfile || dd if=/dev/zero of=/swapfile bs=1M count=4096
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile none swap sw 0 0' >> /etc/fstab

Why: Provides memory headroom for spikes and prevents OOM kills on small VMs.

What it does: Creates and enables a swapfile and adds it to fstab for persistence.

Cautions: Swap on SSDs may cause wear but is acceptable for moderate use; ensure not to swap heavily.

Notes: Adjust size based on RAM and workload; monitor swapon --show.

8) Mount attached block volume (example /dev/vdb)

Warning: verify device names before running.

parted -s /dev/vdb mklabel gpt mkpart primary 0% 100%
mkfs.xfs /dev/vdb1
mkdir -p /var/lib/mydata
echo '/dev/vdb1 /var/lib/mydata xfs defaults,noatime 0 2' >> /etc/fstab
mount -a

Why: Keep data (DB, logs) on separate volumes for snapshotting and performance.

What it does: Partitions, formats, and mounts a block device to a chosen path.

Cautions: Ensure /dev/vdb is the correct device — formatting the wrong device destroys data.

Notes: Use lsblk and blkid to confirm devices beforehand; use xfs for large files and consistent performance.

9) Firewall

# Debian/Ubuntu (ufw)
apt install -y ufw
ufw default deny incoming
ufw default allow outgoing
ufw allow 22/tcp
ufw allow 80,443/tcp
ufw --force enable

# Rocky Linux (firewalld)
dnf install -y firewalld
systemctl enable --now firewalld
firewall-cmd --permanent --add-service=ssh
firewall-cmd --permanent --add-service=http
firewall-cmd --permanent --add-service=https
firewall-cmd --reload

Why: Limit exposed services to only what’s required.

What it does: Sets default deny and opens SSH/HTTP/S ports.

Cautions: If you change SSH port or enable complex rules, keep an active session to avoid lockout.

Notes: For multi-node setups, open private network ranges as needed.

10) Apply kernel tuning (sysctl)

cat <<'EOF' > /etc/sysctl.d/99-serverops.conf
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1
net.core.somaxconn = 1024
fs.file-max = 200000
vm.swappiness = 10
EOF

sysctl --system

Why: Improve TCP handling and file descriptor limits for server workloads.

What it does: Writes tuned kernel parameters and applies them system-wide.

Cautions: Values depend on workload; don’t set extremely high values without capacity planning.

Notes: Monitor ss and ulimit -n and adjust as necessary.

11) Fail2Ban

# Debian/Ubuntu
apt install -y fail2ban

# Rocky (EPEL)
dnf install -y fail2ban
systemctl enable --now fail2ban

cat <<'EOF' > /etc/fail2ban/jail.local
[sshd]
enabled = true
port = ssh
logpath = /var/log/auth.log
maxretry = 5
EOF
systemctl restart fail2ban

Why: Automatically block repeated malicious login attempts.

What it does: Watches auth logs and bans IPs exceeding failed attempts.

Cautions: Ensure log path matches distro (Rocky may use /var/log/secure) — adjust logpath accordingly.

Notes: Monitor banned IPs with fail2ban-client status sshd.

12) PostgreSQL quick start

# Debian/Ubuntu
apt install -y postgresql postgresql-contrib
systemctl enable --now postgresql

# Rocky
dnf install -y postgresql-server postgresql-contrib
postgresql-setup --initdb
systemctl enable --now postgresql

sudo -u postgres psql -c "CREATE USER deploy WITH PASSWORD 'strongpassword';"
sudo -u postgres psql -c "CREATE DATABASE myapp OWNER deploy;"

Why: Bootstraps a database for application use.

What it does: Installs and initializes PostgreSQL, creates a user and DB.

Cautions: Replace 'strongpassword' with a secure secret and store it in a vault; don’t expose DB port publicly.

Notes: Configure pg_hba.conf to restrict connections to private networks only.

13) Backup script (pg_dump -> restic)

Create /usr/local/bin/pg-backup.sh and cron:

cat <<'EOF' > /usr/local/bin/pg-backup.sh
#!/bin/bash
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
OUT=/tmp/pgdump-${TIMESTAMP}.sql
sudo -u postgres pg_dumpall > ${OUT}
restic backup ${OUT} --tag db-backup
rm -f ${OUT}
EOF
chmod +x /usr/local/bin/pg-backup.sh
(crontab -l 2>/dev/null; echo "0 2 * * * /usr/local/bin/pg-backup.sh >/dev/null 2>&1") | crontab -

Why: Capture logical backups and push to remote durable storage.

What it does: Dumps all PostgreSQL databases, backs up with restic, and removes the local dump.

Cautions: Ensure restic repository credentials are configured in env vars or config; test restores.

Notes: For large DBs prefer base backups + WAL shipping for PITR.

14) Node Exporter for metrics

VERSION="1.5.0"
curl -LO https://github.com/prometheus/node_exporter/releases/download/v${VERSION}/node_exporter-${VERSION}.linux-amd64.tar.gz
tar xzf node_exporter-${VERSION}.linux-amd64.tar.gz
mv node_exporter-${VERSION}.linux-amd64/node_exporter /usr/local/bin/
cat <<'EOF' > /etc/systemd/system/node_exporter.service
[Unit]
Description=Prometheus Node Exporter
After=network.target

[Service]
User=nobody
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=default.target
EOF
systemctl daemon-reload
systemctl enable --now node_exporter

Why: Expose host metrics to Prometheus for performance and capacity monitoring.

What it does: Installs node_exporter binary and runs it as a systemd service.

Cautions: Node exporter exposes system metrics — firewall and ACLs should restrict access to the monitoring network.

Notes: Scrape via Prometheus with a job that targets the instance IP.

15) Nginx + Certbot

# Debian/Ubuntu
apt install -y nginx certbot python3-certbot-nginx

# Rocky
dnf install -y nginx certbot python3-certbot-nginx

systemctl enable --now nginx
cat <<'EOF' > /etc/nginx/sites-available/myapp.conf
server {
  listen 80;
  server_name example.com www.example.com;

  location / {
    proxy_pass http://127.0.0.1:3000;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }
}
EOF
ln -sf /etc/nginx/sites-available/myapp.conf /etc/nginx/sites-enabled/myapp.conf || true
nginx -t && systemctl reload nginx
certbot --nginx -d example.com -d www.example.com

Why: Terminate TLS and reverse-proxy to application processes.

What it does: Installs Nginx, configures a proxy vhost, and obtains a Let’s Encrypt certificate.

Cautions: Certbot needs ports 80/443 open; ensure DNS points to the server before requesting certs. For automation, use DNS challenge for wildcard certs.

Notes: Use nginx -t to validate config and certbot renew --dry-run to test renewals.

16) systemd unit for your app

cat <<'EOF' > /etc/systemd/system/myapp.service
[Unit]
Description=MyApp
After=network.target

[Service]
User=deploy
WorkingDirectory=/srv/myapp
ExecStart=/usr/bin/node /srv/myapp/dist/index.js
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable --now myapp

Why: Manage the application process reliably with automatic restarts and logs.

What it does: Adds a systemd service which starts the app and restarts on failure.

Cautions: Keep WorkingDirectory and ExecStart paths correct; inspect journalctl -u myapp on failures.

Notes: For zero-downtime deploys use socket activation or a rolling restart strategy across replicas.

17) Snapshot and restore testing

  • Create a provider snapshot after provisioning and note the ID. Perform a monthly test restore to a throwaway instance.

Why: Snapshots allow rapid recovery from catastrophic failures.

What it does: Captures the disk state to a provider snapshot.

Cautions: Snapshots are not a substitute for off-site backups; store backups in separate regions/storage when possible.

Notes: Automate snapshot creation and retention via provider API.

18) Troubleshooting quick commands

journalctl -u myapp -f
systemctl status myapp
ss -tulpen | grep 3000
df -h
free -m
top

Why: Quick diagnostics to assess service health, listening sockets, and resource usage.

What it does: Streams logs, checks unit status, inspects sockets and resource usage.

Cautions: Use top/htop cautiously on production (read-only) — avoid interactive commands in scripts.

Notes: Capture journalctl -u myapp --since "1 hour ago" for post-incident analysis.

19) Recovery & restore

  • Use provider console and recovery ISO/serial if instance fails to boot. Mount root disk from recovery environment and fix /etc/ssh/authorized_keys or configs.
  • Restore data: restic restore to a new instance and reconfigure networking.

Why: Prepare for kernel panic, boot failures or accidental lockouts.

What it does: Provides manual steps to access disks and recover files or restore backups.

Cautions: When restoring, ensure networking, hostnames and SSH keys are reconfigured to avoid confusion.

Notes: Keep a documented recovery procedure with provider-specific console links.

20) Handoff checklist

  • Confirm deploy can sudo and SSH
  • Confirm firewall rules, TLS auto-renewal, backups and snapshot IDs
  • Document IPs, secrets location, and escalation contacts

Why: Ensure operational readiness before marking a server as production.

What it does: Validates access, security, backups and documentation are in place.

Cautions: Don’t skip restore tests — a green backup indicator is not a successful restore.

Notes: Store runbook and credentials in a team-accessible vault and link to incident playbooks.


🚀 Provisioning & First Boot

Essentials:

  • Use provider images: Ubuntu LTS, Debian Stable, or Rocky Linux LTS
  • Bootstrap with cloud-init when available, otherwise run the manual runbook above
  • Prefer small, immutable images and consistent build artifacts from CI
# cloud-config example (optional)
users:
  - name: deploy
    sudo: ['ALL=(ALL) NOPASSWD:ALL']
    ssh-authorized-keys:
      - ssh-rsa AAAA... your-key
packages:
  - git
  - curl
runcmd:
  - [ sh, -lc, "echo 'Provisioned' > /etc/motd" ]

🛡️ Initial Hardening

Start hardening early — many failures are configuration mistakes on day one. Minimum steps:

  • Create a non-root admin user and disable root SSH after verifying sudo access.
  • Enforce SSH key authentication and disable passwords.
  • Enable automatic security updates where appropriate.

Examples (Debian/Ubuntu vs Rocky):

# Create admin user
adduser deploy
usermod -aG sudo deploy    # Debian/Ubuntu
usermod -aG wheel deploy    # Rocky

# Disable root login and passwords
sed -i 's/^PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config || true
sed -i 's/^PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config || true
systemctl reload sshd

# Auto-updates
apt install -y unattended-upgrades && dpkg-reconfigure --priority=low unattended-upgrades
dnf install -y dnf-automatic && systemctl enable --now dnf-automatic.timer

Add Fail2Ban, auditd, and keep SELinux/AppArmor enforced (Rocky uses SELinux; Debian uses AppArmor).


💾 Storage, Filesystems & Swap

  • Place databases and logs on separate block volumes. Use xfs or ext4 per workload.
  • Prefer swapfiles on cloud VMs for resize flexibility.

Example:

mkfs.xfs /dev/vdb1
mkdir -p /var/lib/postgresql
echo '/dev/vdb1 /var/lib/postgresql xfs defaults,noatime 0 2' >> /etc/fstab
mount -a

fallocate -l 4G /swapfile && chmod 600 /swapfile && mkswap /swapfile && swapon /swapfile
echo '/swapfile none swap sw 0 0' >> /etc/fstab

🌐 Networking & DNS

  • Use provider DNS or external DNS (Cloudflare/Route53). Reserve floating/reserved IPs for failover.
  • Use private networking between VMs when possible to keep traffic off public interfaces.

Reverse proxy + TLS (Nginx + Certbot):

# Debian/Ubuntu
apt install -y nginx certbot python3-certbot-nginx

# Rocky
dnf install -y nginx certbot python3-certbot-nginx

systemctl enable --now nginx
certbot --nginx -d example.com -d www.example.com

📈 Observability & Monitoring

  • Expose metrics via Node Exporter (Prometheus) and centralize logs with Vector/Fluentd or a managed logging service.
  • Lightweight options: Netdata or Glances for small VPS.

Node Exporter quick install (example):

VERSION="1.5.0"
curl -LO https://github.com/prometheus/node_exporter/releases/download/v${VERSION}/node_exporter-${VERSION}.linux-amd64.tar.gz
tar xzf node_exporter-${VERSION}.linux-amd64.tar.gz
mv node_exporter-${VERSION}.linux-amd64/node_exporter /usr/local/bin/

🔁 Backups & Disaster Recovery

  • Use provider snapshots for fast restores and restic/borg for file-level backups to S3-compatible storage.
  • Backup databases using logical dumps and WAL shipping for point-in-time recovery.

Restic example:

restic init -r s3:s3.amazonaws.com/my-bucket
restic backup /var/lib/postgresql --tag db-2026-02-08
restic forget --prune --keep-daily 7 --keep-weekly 4 --keep-monthly 12

🗂️ Logging & Rotation

  • Rotate local logs with logrotate and forward to a central log store. Example config for app logs:
/var/log/myapp/*.log {
  daily
  rotate 14
  compress
  missingok
  notifempty
  copytruncate
}

✅ Security & Compliance

  • Enforce least privilege, centralize secrets (Vault / KMS), run Trivy/Lynis scans, and apply CIS benchmarks where needed.

Quick scan:

curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
trivy fs /

🚚 Service Management & Deployment

  • Use systemd units for app services; prefer immutable artifacts from CI when possible.

Example unit:

[Unit]
Description=MyApp
After=network.target

[Service]
User=deploy
WorkingDirectory=/srv/myapp
ExecStart=/usr/bin/node dist/index.js
Restart=on-failure

[Install]
WantedBy=multi-user.target

🐳 Containers on VMs

  • Use Podman (rootless) or Docker where appropriate. Pull by digest in production and run minimal containers.
podman pull docker.io/myorg/myapp@sha256:...
podman run -d --name myapp -p 3000:3000 --restart=always myorg/myapp@sha256:...

⚖️ Scaling & Networking Patterns

  • Terminate TLS at the LB or reverse proxy. Use health checks and autoscaling for stateless services. Keep state on managed DBs or networked block storage.

🗺️ Provider Notes

  • DigitalOcean / Linode: snapshots, floating IPs, user-data
  • Hetzner: strong IO and volumes
  • Scaleway / Upcloud / Hostinger / Racknerd: check private networking and snapshot APIs

Always validate provider capabilities before relying on them.


🆘 Incident Response & Runbooks

  • Document steps to access serial console, restore from snapshots, and contact escalation points. Keep a tested post-mortem template and retention policy.

📋 Best Practices Checklist

 Use immutable artifacts from CI
 Automate provisioning when possible (cloud-init optional)
 Enforce SSH keys and disable root login
 Enable automated security updates
 Centralize logs and metrics
 Test backups and restores regularly
 Apply least-privilege and network segmentation
 Monitor health, disk, and performance

🖥️ Cockpit — web-based server admin (optional)

Why: Cockpit provides a lightweight web UI for server management (systemd, logs, networking, containers) which complements CLI workflows and is useful for visual inspection and quick ops.

Install (Debian/Ubuntu):

apt update
apt install -y cockpit
systemctl enable --now cockpit.socket
# allow firewall
ufw allow 9090/tcp

Install (Rocky Linux):

dnf install -y cockpit
systemctl enable --now cockpit.socket
firewall-cmd --permanent --add-port=9090/tcp
firewall-cmd --reload

What it does: exposes a browser UI on port 9090 for user management, system logs, journal, storage, networking, and container inspection (Podman integration available).

Cautions:

  • Cockpit is a convenience UI — do not rely on it for automation or as the sole access path. Keep strong auth, reserve access to admin networks, or proxy behind an internal LB with mTLS.
  • Exposing Cockpit directly to the Internet without additional controls is unsafe.

Notes:

  • Integrates with podman on systems that have it installed; install cockpit-podman for container UI.
  • Use provider firewall or SSH tunnel to restrict access to trusted IPs.

🚀 Bonus: Self-hosted PaaS setup & management

Why: A lightweight self-hosted PaaS streamlines app deployment (git push or image deploy), manages TLS, and provides simple scaling for small teams without full K8s complexity.

Options covered: Dokku, Coolify, and lightweight k3s (Kubernetes) for production-grade deployments.

Dokku (easy git-based PaaS)

Install (Ubuntu recommended):

# Prepare (Ubuntu)
apt update && apt install -y docker.io
curl -sSL https://raw.githubusercontent.com/dokku/dokku/v0.28.5/bootstrap.sh | sudo DOKKU_TAG=v0.28.5 bash

Usage:

  • Create an app on the server: dokku apps:create myapp
  • Add remote on developer machine: git remote add dokku dokku@server:myapp
  • Deploy: git push dokku main

Cautions:

  • Dokku relies on Docker; ensure Docker is secured and updated.
  • Single-node Dokku is fine for small apps, but for HA use multiple nodes and external DBs/storage.

Notes:

  • Use Dokku plugins for Postgres, redis, and Let’s Encrypt integration.

Coolify (self-hosted v4)

Coolify Logo

Use the official v4 installer or the documented manual deployment from the Coolify docs. Official docs and installers: https://coolify.io/self-hosted and https://coolify.io/docs/installation

Note

Coolify v4 bash script installer is the recommended way to deploy. And it is recommended to be used in a fresh provisioned VM/VPS.

Quick install (recommended — official installer):

# Official v4 installer (runs required containers and setup)
curl -fsSL https://cdn.coollabs.io/coolify/install.sh | sudo bash

Notes on prerequisites and alternatives:

  • The installer will detect Docker and required components; if you prefer manual setup, follow the repository/docker-compose manifest in the Coolify docs.
  • For production, configure an external PostgreSQL database and S3-compatible storage instead of relying on embedded volumes.

Usage:

  • Visit the Coolify web UI (default port configured by the installer) to finish setup and create the admin account.
  • Deploy apps via Git, Docker image, or connected repos; Coolify manages TLS (Let’s Encrypt), environment variables, and basic scaling.

Cautions & best practices:

  • Do not expose the management UI directly to the public internet — restrict access with firewall rules, VPN, or an auth proxy (OAuth, mTLS).
  • Back up Coolify’s data and metadata (or use external DB/S3). Regularly test restores.
  • Monitor resource usage; Coolify runs background workers and may need more than minimal VM specs for production workloads.

References and further reading:

k3s (lightweight Kubernetes) — production-ready PaaS foundation

Install single-node k3s quickly:

curl -sfL https://get.k3s.io | sh -
# kubectl is available at /usr/local/bin/kubectl (or use k3s kubectl)

Basic app deploy:

kubectl create deployment myapp --image=myorg/myapp:latest
kubectl expose deployment myapp --type=NodePort --port=80 --target-port=3000

Ingress & TLS:

  • Install Traefik (Helm chart bundled with k3s) or configure ingress-nginx and cert-manager for Let’s Encrypt.

Cautions:

  • k3s reduces K8s complexity but still requires operational knowledge: storage (Longhorn), backups (Velero), and RBAC.
  • For single-node, there’s limited resilience — use multi-node clusters for production.

Management & backups:

  • Back up etcd (or embedded datastore) and Kubernetes manifests. Use kubectl get all -o yaml and store images in a registry.

When to choose which:

  • Dokku / Coolify: fastest for web apps and small teams; minimal infra work.
  • k3s: when you need Kubernetes features, multi-service orchestration, and more control over networking and storage.

Final cautions for self-hosted PaaS:

  • Plan for backups of app data and PaaS metadata (e.g., Dokku plugin databases, Coolify configs, k3s etcd).
  • Monitor node resource usage; PaaS control planes can be noisy on small VMs.
  • Secure CLIs and dashboard ports with firewall rules, VPNs or auth proxies.

🎯 Conclusion

This playbook now leads with a practical, ground-up runbook and follows with focused, reordered sections that prioritize action and clarity.