
🛠️ Ground-up Manual Runbook (root SSH)
This prescriptive runbook is written for a fresh VPS/VM where you have an SSH root session. It is intentionally copy-pasteable: run the commands as shown, adapting hostnames, keys, and device names. Debian/Ubuntu and Rocky Linux examples are given side-by-side when they differ.
1) Verify system
cat /etc/os-release; uname -a; uptime
Why: Confirm distro and kernel to choose correct packages and compatibility.
What it does: Prints OS release, kernel version, and uptime.
Cautions: None — read-only.
Notes: If kernel is very old, consider reprovisioning with an LTS image.
2) Set hostname and timezone
hostnamectl set-hostname web1.example.com
timedatectl set-timezone Etc/UTC
timedatectl status
Why: Hostname identifies machine in logs and monitoring; timezone ensures consistent timestamps.
What it does: Sets system hostname and timezone, updates systemd metadata.
Cautions: Changing hostname may affect SSL/Certbot configurations and monitoring tags — update those after change.
Notes: Use UTC for servers unless a strong reason to use local zone.
3) Update packages
# Debian/Ubuntu
apt update && apt upgrade -y && apt autoremove -y
# Rocky Linux
dnf upgrade -y && dnf autoremove -y
Why: Apply security updates and bug fixes before installing services.
What it does: Updates package lists and upgrades installed packages.
Cautions: On production with critical services, avoid immediate full upgrades without testing — prefer staged windows.
Notes: For unattended systems, enable automatic security updates after an initial manual verification.
4) Install core tools
# Debian/Ubuntu
apt install -y sudo curl wget git vim htop unzip rsync lsof
# Rocky Linux
dnf install -y epel-release
dnf install -y sudo curl wget git vim htop unzip rsync lsof
Why: Provide essential debugging and admin tools for later steps.
What it does: Installs commonly used CLI utilities.
Cautions: Minimal risk; installing EPEL on Rocky adds community packages — verify EPEL source.
Notes: Add build-essential or gcc only if compiling locally.
5) Create deploy user and SSH keys
adduser deploy
usermod -aG sudo deploy # Debian/Ubuntu
usermod -aG wheel deploy # Rocky
mkdir -p /home/deploy/.ssh && chmod 700 /home/deploy/.ssh
echo 'ssh-rsa AAAA... your-key' > /home/deploy/.ssh/authorized_keys
chmod 600 /home/deploy/.ssh/authorized_keys
chown -R deploy:deploy /home/deploy/.ssh
Why: Avoid daily root usage; enforce key-based auth for the admin account.
What it does: Adds a non-root user with sudo privileges and installs provided public key.
Cautions: Ensure the public key is correct; lock yourself out by misplacing keys or permissions.
Notes: Test SSH into a new session before disabling root.
6) Disable root SSH (after confirming deploy sudo works)
sed -i 's/^PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config || true
sed -i 's/^PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config || true
systemctl reload sshd
Why: Reduces attack surface by preventing direct root SSH logins and password-based brute force.
What it does: Updates SSHD config to disable root and password auth, reloads service.
Cautions: If deploy lacks sudo or key is wrong, you may lose access — always verify first.
Notes: Consider retaining a temporary escape (console access) until confirmed.
7) Create swapfile
fallocate -l 4G /swapfile || dd if=/dev/zero of=/swapfile bs=1M count=4096
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile none swap sw 0 0' >> /etc/fstab
Why: Provides memory headroom for spikes and prevents OOM kills on small VMs.
What it does: Creates and enables a swapfile and adds it to fstab for persistence.
Cautions: Swap on SSDs may cause wear but is acceptable for moderate use; ensure not to swap heavily.
Notes: Adjust size based on RAM and workload; monitor swapon --show.
8) Mount attached block volume (example /dev/vdb)
Warning: verify device names before running.
parted -s /dev/vdb mklabel gpt mkpart primary 0% 100%
mkfs.xfs /dev/vdb1
mkdir -p /var/lib/mydata
echo '/dev/vdb1 /var/lib/mydata xfs defaults,noatime 0 2' >> /etc/fstab
mount -a
Why: Keep data (DB, logs) on separate volumes for snapshotting and performance.
What it does: Partitions, formats, and mounts a block device to a chosen path.
Cautions: Ensure /dev/vdb is the correct device — formatting the wrong device destroys data.
Notes: Use lsblk and blkid to confirm devices beforehand; use xfs for large files and consistent performance.
9) Firewall
# Debian/Ubuntu (ufw)
apt install -y ufw
ufw default deny incoming
ufw default allow outgoing
ufw allow 22/tcp
ufw allow 80,443/tcp
ufw --force enable
# Rocky Linux (firewalld)
dnf install -y firewalld
systemctl enable --now firewalld
firewall-cmd --permanent --add-service=ssh
firewall-cmd --permanent --add-service=http
firewall-cmd --permanent --add-service=https
firewall-cmd --reload
Why: Limit exposed services to only what’s required.
What it does: Sets default deny and opens SSH/HTTP/S ports.
Cautions: If you change SSH port or enable complex rules, keep an active session to avoid lockout.
Notes: For multi-node setups, open private network ranges as needed.
10) Apply kernel tuning (sysctl)
cat <<'EOF' > /etc/sysctl.d/99-serverops.conf
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1
net.core.somaxconn = 1024
fs.file-max = 200000
vm.swappiness = 10
EOF
sysctl --system
Why: Improve TCP handling and file descriptor limits for server workloads.
What it does: Writes tuned kernel parameters and applies them system-wide.
Cautions: Values depend on workload; don’t set extremely high values without capacity planning.
Notes: Monitor ss and ulimit -n and adjust as necessary.
11) Fail2Ban
# Debian/Ubuntu
apt install -y fail2ban
# Rocky (EPEL)
dnf install -y fail2ban
systemctl enable --now fail2ban
cat <<'EOF' > /etc/fail2ban/jail.local
[sshd]
enabled = true
port = ssh
logpath = /var/log/auth.log
maxretry = 5
EOF
systemctl restart fail2ban
Why: Automatically block repeated malicious login attempts.
What it does: Watches auth logs and bans IPs exceeding failed attempts.
Cautions: Ensure log path matches distro (Rocky may use /var/log/secure) — adjust logpath accordingly.
Notes: Monitor banned IPs with fail2ban-client status sshd.
12) PostgreSQL quick start
# Debian/Ubuntu
apt install -y postgresql postgresql-contrib
systemctl enable --now postgresql
# Rocky
dnf install -y postgresql-server postgresql-contrib
postgresql-setup --initdb
systemctl enable --now postgresql
sudo -u postgres psql -c "CREATE USER deploy WITH PASSWORD 'strongpassword';"
sudo -u postgres psql -c "CREATE DATABASE myapp OWNER deploy;"
Why: Bootstraps a database for application use.
What it does: Installs and initializes PostgreSQL, creates a user and DB.
Cautions: Replace 'strongpassword' with a secure secret and store it in a vault; don’t expose DB port publicly.
Notes: Configure pg_hba.conf to restrict connections to private networks only.
13) Backup script (pg_dump -> restic)
Create /usr/local/bin/pg-backup.sh and cron:
cat <<'EOF' > /usr/local/bin/pg-backup.sh
#!/bin/bash
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
OUT=/tmp/pgdump-${TIMESTAMP}.sql
sudo -u postgres pg_dumpall > ${OUT}
restic backup ${OUT} --tag db-backup
rm -f ${OUT}
EOF
chmod +x /usr/local/bin/pg-backup.sh
(crontab -l 2>/dev/null; echo "0 2 * * * /usr/local/bin/pg-backup.sh >/dev/null 2>&1") | crontab -
Why: Capture logical backups and push to remote durable storage.
What it does: Dumps all PostgreSQL databases, backs up with restic, and removes the local dump.
Cautions: Ensure restic repository credentials are configured in env vars or config; test restores.
Notes: For large DBs prefer base backups + WAL shipping for PITR.
14) Node Exporter for metrics
VERSION="1.5.0"
curl -LO https://github.com/prometheus/node_exporter/releases/download/v${VERSION}/node_exporter-${VERSION}.linux-amd64.tar.gz
tar xzf node_exporter-${VERSION}.linux-amd64.tar.gz
mv node_exporter-${VERSION}.linux-amd64/node_exporter /usr/local/bin/
cat <<'EOF' > /etc/systemd/system/node_exporter.service
[Unit]
Description=Prometheus Node Exporter
After=network.target
[Service]
User=nobody
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=default.target
EOF
systemctl daemon-reload
systemctl enable --now node_exporter
Why: Expose host metrics to Prometheus for performance and capacity monitoring.
What it does: Installs node_exporter binary and runs it as a systemd service.
Cautions: Node exporter exposes system metrics — firewall and ACLs should restrict access to the monitoring network.
Notes: Scrape via Prometheus with a job that targets the instance IP.
15) Nginx + Certbot
# Debian/Ubuntu
apt install -y nginx certbot python3-certbot-nginx
# Rocky
dnf install -y nginx certbot python3-certbot-nginx
systemctl enable --now nginx
cat <<'EOF' > /etc/nginx/sites-available/myapp.conf
server {
listen 80;
server_name example.com www.example.com;
location / {
proxy_pass http://127.0.0.1:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
EOF
ln -sf /etc/nginx/sites-available/myapp.conf /etc/nginx/sites-enabled/myapp.conf || true
nginx -t && systemctl reload nginx
certbot --nginx -d example.com -d www.example.com
Why: Terminate TLS and reverse-proxy to application processes.
What it does: Installs Nginx, configures a proxy vhost, and obtains a Let’s Encrypt certificate.
Cautions: Certbot needs ports 80/443 open; ensure DNS points to the server before requesting certs. For automation, use DNS challenge for wildcard certs.
Notes: Use nginx -t to validate config and certbot renew --dry-run to test renewals.
16) systemd unit for your app
cat <<'EOF' > /etc/systemd/system/myapp.service
[Unit]
Description=MyApp
After=network.target
[Service]
User=deploy
WorkingDirectory=/srv/myapp
ExecStart=/usr/bin/node /srv/myapp/dist/index.js
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable --now myapp
Why: Manage the application process reliably with automatic restarts and logs.
What it does: Adds a systemd service which starts the app and restarts on failure.
Cautions: Keep WorkingDirectory and ExecStart paths correct; inspect journalctl -u myapp on failures.
Notes: For zero-downtime deploys use socket activation or a rolling restart strategy across replicas.
17) Snapshot and restore testing
- Create a provider snapshot after provisioning and note the ID. Perform a monthly test restore to a throwaway instance.
Why: Snapshots allow rapid recovery from catastrophic failures.
What it does: Captures the disk state to a provider snapshot.
Cautions: Snapshots are not a substitute for off-site backups; store backups in separate regions/storage when possible.
Notes: Automate snapshot creation and retention via provider API.
18) Troubleshooting quick commands
journalctl -u myapp -f
systemctl status myapp
ss -tulpen | grep 3000
df -h
free -m
top
Why: Quick diagnostics to assess service health, listening sockets, and resource usage.
What it does: Streams logs, checks unit status, inspects sockets and resource usage.
Cautions: Use top/htop cautiously on production (read-only) — avoid interactive commands in scripts.
Notes: Capture journalctl -u myapp --since "1 hour ago" for post-incident analysis.
19) Recovery & restore
- Use provider console and recovery ISO/serial if instance fails to boot. Mount root disk from recovery environment and fix
/etc/ssh/authorized_keysor configs. - Restore data:
restic restoreto a new instance and reconfigure networking.
Why: Prepare for kernel panic, boot failures or accidental lockouts.
What it does: Provides manual steps to access disks and recover files or restore backups.
Cautions: When restoring, ensure networking, hostnames and SSH keys are reconfigured to avoid confusion.
Notes: Keep a documented recovery procedure with provider-specific console links.
20) Handoff checklist
- Confirm
deploycan sudo and SSH - Confirm firewall rules, TLS auto-renewal, backups and snapshot IDs
- Document IPs, secrets location, and escalation contacts
Why: Ensure operational readiness before marking a server as production.
What it does: Validates access, security, backups and documentation are in place.
Cautions: Don’t skip restore tests — a green backup indicator is not a successful restore.
Notes: Store runbook and credentials in a team-accessible vault and link to incident playbooks.
🚀 Provisioning & First Boot
Essentials:
- Use provider images: Ubuntu LTS, Debian Stable, or Rocky Linux LTS
- Bootstrap with
cloud-initwhen available, otherwise run the manual runbook above - Prefer small, immutable images and consistent build artifacts from CI
# cloud-config example (optional)
users:
- name: deploy
sudo: ['ALL=(ALL) NOPASSWD:ALL']
ssh-authorized-keys:
- ssh-rsa AAAA... your-key
packages:
- git
- curl
runcmd:
- [ sh, -lc, "echo 'Provisioned' > /etc/motd" ]
🛡️ Initial Hardening
Start hardening early — many failures are configuration mistakes on day one. Minimum steps:
- Create a non-root admin user and disable root SSH after verifying sudo access.
- Enforce SSH key authentication and disable passwords.
- Enable automatic security updates where appropriate.
Examples (Debian/Ubuntu vs Rocky):
# Create admin user
adduser deploy
usermod -aG sudo deploy # Debian/Ubuntu
usermod -aG wheel deploy # Rocky
# Disable root login and passwords
sed -i 's/^PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config || true
sed -i 's/^PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config || true
systemctl reload sshd
# Auto-updates
apt install -y unattended-upgrades && dpkg-reconfigure --priority=low unattended-upgrades
dnf install -y dnf-automatic && systemctl enable --now dnf-automatic.timer
Add Fail2Ban, auditd, and keep SELinux/AppArmor enforced (Rocky uses SELinux; Debian uses AppArmor).
💾 Storage, Filesystems & Swap
- Place databases and logs on separate block volumes. Use
xfsorext4per workload. - Prefer swapfiles on cloud VMs for resize flexibility.
Example:
mkfs.xfs /dev/vdb1
mkdir -p /var/lib/postgresql
echo '/dev/vdb1 /var/lib/postgresql xfs defaults,noatime 0 2' >> /etc/fstab
mount -a
fallocate -l 4G /swapfile && chmod 600 /swapfile && mkswap /swapfile && swapon /swapfile
echo '/swapfile none swap sw 0 0' >> /etc/fstab
🌐 Networking & DNS
- Use provider DNS or external DNS (Cloudflare/Route53). Reserve floating/reserved IPs for failover.
- Use private networking between VMs when possible to keep traffic off public interfaces.
Reverse proxy + TLS (Nginx + Certbot):
# Debian/Ubuntu
apt install -y nginx certbot python3-certbot-nginx
# Rocky
dnf install -y nginx certbot python3-certbot-nginx
systemctl enable --now nginx
certbot --nginx -d example.com -d www.example.com
📈 Observability & Monitoring
- Expose metrics via Node Exporter (Prometheus) and centralize logs with Vector/Fluentd or a managed logging service.
- Lightweight options: Netdata or Glances for small VPS.
Node Exporter quick install (example):
VERSION="1.5.0"
curl -LO https://github.com/prometheus/node_exporter/releases/download/v${VERSION}/node_exporter-${VERSION}.linux-amd64.tar.gz
tar xzf node_exporter-${VERSION}.linux-amd64.tar.gz
mv node_exporter-${VERSION}.linux-amd64/node_exporter /usr/local/bin/
🔁 Backups & Disaster Recovery
- Use provider snapshots for fast restores and
restic/borgfor file-level backups to S3-compatible storage. - Backup databases using logical dumps and WAL shipping for point-in-time recovery.
Restic example:
restic init -r s3:s3.amazonaws.com/my-bucket
restic backup /var/lib/postgresql --tag db-2026-02-08
restic forget --prune --keep-daily 7 --keep-weekly 4 --keep-monthly 12
🗂️ Logging & Rotation
- Rotate local logs with
logrotateand forward to a central log store. Example config for app logs:
/var/log/myapp/*.log {
daily
rotate 14
compress
missingok
notifempty
copytruncate
}
✅ Security & Compliance
- Enforce least privilege, centralize secrets (Vault / KMS), run Trivy/Lynis scans, and apply CIS benchmarks where needed.
Quick scan:
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
trivy fs /
🚚 Service Management & Deployment
- Use
systemdunits for app services; prefer immutable artifacts from CI when possible.
Example unit:
[Unit]
Description=MyApp
After=network.target
[Service]
User=deploy
WorkingDirectory=/srv/myapp
ExecStart=/usr/bin/node dist/index.js
Restart=on-failure
[Install]
WantedBy=multi-user.target
🐳 Containers on VMs
- Use Podman (rootless) or Docker where appropriate. Pull by digest in production and run minimal containers.
podman pull docker.io/myorg/myapp@sha256:...
podman run -d --name myapp -p 3000:3000 --restart=always myorg/myapp@sha256:...
⚖️ Scaling & Networking Patterns
- Terminate TLS at the LB or reverse proxy. Use health checks and autoscaling for stateless services. Keep state on managed DBs or networked block storage.
🗺️ Provider Notes
- DigitalOcean / Linode: snapshots, floating IPs, user-data
- Hetzner: strong IO and volumes
- Scaleway / Upcloud / Hostinger / Racknerd: check private networking and snapshot APIs
Always validate provider capabilities before relying on them.
🆘 Incident Response & Runbooks
- Document steps to access serial console, restore from snapshots, and contact escalation points. Keep a tested post-mortem template and retention policy.
📋 Best Practices Checklist
✅ Use immutable artifacts from CI
✅ Automate provisioning when possible (cloud-init optional)
✅ Enforce SSH keys and disable root login
✅ Enable automated security updates
✅ Centralize logs and metrics
✅ Test backups and restores regularly
✅ Apply least-privilege and network segmentation
✅ Monitor health, disk, and performance
🖥️ Cockpit — web-based server admin (optional)
Why: Cockpit provides a lightweight web UI for server management (systemd, logs, networking, containers) which complements CLI workflows and is useful for visual inspection and quick ops.
Install (Debian/Ubuntu):
apt update
apt install -y cockpit
systemctl enable --now cockpit.socket
# allow firewall
ufw allow 9090/tcp
Install (Rocky Linux):
dnf install -y cockpit
systemctl enable --now cockpit.socket
firewall-cmd --permanent --add-port=9090/tcp
firewall-cmd --reload
What it does: exposes a browser UI on port 9090 for user management, system logs, journal, storage, networking, and container inspection (Podman integration available).
Cautions:
- Cockpit is a convenience UI — do not rely on it for automation or as the sole access path. Keep strong auth, reserve access to admin networks, or proxy behind an internal LB with mTLS.
- Exposing Cockpit directly to the Internet without additional controls is unsafe.
Notes:
- Integrates with
podmanon systems that have it installed; installcockpit-podmanfor container UI. - Use provider firewall or SSH tunnel to restrict access to trusted IPs.
🚀 Bonus: Self-hosted PaaS setup & management
Why: A lightweight self-hosted PaaS streamlines app deployment (git push or image deploy), manages TLS, and provides simple scaling for small teams without full K8s complexity.
Options covered: Dokku, Coolify, and lightweight k3s (Kubernetes) for production-grade deployments.
Dokku (easy git-based PaaS)
Install (Ubuntu recommended):
# Prepare (Ubuntu)
apt update && apt install -y docker.io
curl -sSL https://raw.githubusercontent.com/dokku/dokku/v0.28.5/bootstrap.sh | sudo DOKKU_TAG=v0.28.5 bash
Usage:
- Create an app on the server:
dokku apps:create myapp - Add remote on developer machine:
git remote add dokku dokku@server:myapp - Deploy:
git push dokku main
Cautions:
- Dokku relies on Docker; ensure Docker is secured and updated.
- Single-node Dokku is fine for small apps, but for HA use multiple nodes and external DBs/storage.
Notes:
- Use Dokku plugins for Postgres, redis, and Let’s Encrypt integration.
Coolify (self-hosted v4)
Use the official v4 installer or the documented manual deployment from the Coolify docs. Official docs and installers: https://coolify.io/self-hosted and https://coolify.io/docs/installation
Note
Coolify v4 bash script installer is the recommended way to deploy. And it is recommended to be used in a fresh provisioned VM/VPS.
Quick install (recommended — official installer):
# Official v4 installer (runs required containers and setup)
curl -fsSL https://cdn.coollabs.io/coolify/install.sh | sudo bash
Notes on prerequisites and alternatives:
- The installer will detect Docker and required components; if you prefer manual setup, follow the repository/docker-compose manifest in the Coolify docs.
- For production, configure an external PostgreSQL database and S3-compatible storage instead of relying on embedded volumes.
Usage:
- Visit the Coolify web UI (default port configured by the installer) to finish setup and create the admin account.
- Deploy apps via Git, Docker image, or connected repos; Coolify manages TLS (Let’s Encrypt), environment variables, and basic scaling.
Cautions & best practices:
- Do not expose the management UI directly to the public internet — restrict access with firewall rules, VPN, or an auth proxy (OAuth, mTLS).
- Back up Coolify’s
dataand metadata (or use external DB/S3). Regularly test restores. - Monitor resource usage; Coolify runs background workers and may need more than minimal VM specs for production workloads.
References and further reading:
- Official self-hosted docs and install guide: https://coolify.io/self-hosted
- GitHub repo: https://github.com/coollabsio/coolify
k3s (lightweight Kubernetes) — production-ready PaaS foundation
Install single-node k3s quickly:
curl -sfL https://get.k3s.io | sh -
# kubectl is available at /usr/local/bin/kubectl (or use k3s kubectl)
Basic app deploy:
kubectl create deployment myapp --image=myorg/myapp:latest
kubectl expose deployment myapp --type=NodePort --port=80 --target-port=3000
Ingress & TLS:
- Install Traefik (Helm chart bundled with k3s) or configure
ingress-nginxand cert-manager for Let’s Encrypt.
Cautions:
- k3s reduces K8s complexity but still requires operational knowledge: storage (Longhorn), backups (Velero), and RBAC.
- For single-node, there’s limited resilience — use multi-node clusters for production.
Management & backups:
- Back up etcd (or embedded datastore) and Kubernetes manifests. Use
kubectl get all -o yamland store images in a registry.
When to choose which:
- Dokku / Coolify: fastest for web apps and small teams; minimal infra work.
- k3s: when you need Kubernetes features, multi-service orchestration, and more control over networking and storage.
Final cautions for self-hosted PaaS:
- Plan for backups of app data and PaaS metadata (e.g., Dokku plugin databases, Coolify configs, k3s etcd).
- Monitor node resource usage; PaaS control planes can be noisy on small VMs.
- Secure CLIs and dashboard ports with firewall rules, VPNs or auth proxies.
🎯 Conclusion
This playbook now leads with a practical, ground-up runbook and follows with focused, reordered sections that prioritize action and clarity.