
Four VMs from one prompt: three PVE nodes + a QDevice witness, Corosync wiring baked in
May 9, 2026
A single-node Proxmox box is the entry point. The day you stop being willing to lose state when that one box goes down, you need three nodes — the minimum for reliable Corosync quorum and the first shape that survives a node loss with VMs still running. The Proxmox docs are blunt: “If you are interested in High Availability, you need to have at least three nodes for reliable quorum” (Cluster Manager). It is vote math, not marketing: Corosync needs a majority to stay writable, a two-node cluster has none after one failure, and both halves freeze — the split-brain problem this shape dodges.
This post walks through that exact shape as an OpenFactory build prompt: four buildable Debian Trixie VMs — three PVE nodes plus a QDevice witness — from a single prompt, with /etc/pve/corosync.conf already shaped, the cluster nodelist agreed across all three, and a mock PVE API reporting quorate: 1. Real pvecm create / pvecm add is the deploy-time step on top; you rehearse the quorum wiring before any hardware is racked.
pve-1, pve-2, pve-3 (10.81.0.11–13:8006) — three PVE hosts each with a per-node /etc/pve/corosync.conf listing all three ring0 addresses + the QDevice tie-breaker, mock :5404 and :5405 Corosync listeners, and a mock cluster-status JSON reporting quorate: 1.pbs-witness (10.81.0.20:5403) — QDevice / corosync-qnetd shape that breaks the tie on 2-node split-brain scenarios. Comes with a runbook explaining the real pvecm qdevice setup deploy step./etc/pve/storage.cfg on each PVE node pre-populated with zfspool: rpool-data as the replication target. Swap to Ceph via the next post in the series if you want full HA storage.corosync.conf — nodelist, quorum-device config, transport — is baked into all three PVE node ISOs. No copy-paste between hosts at deploy.Three PVE nodes in a row, QDevice below. PVE↔PVE on :8006 (API) and :5404/:5405 (Corosync ring); all three to the QDevice on :5403 for tie-breaking. Lab subnet 10.81.0.0/24. Every vote and ring link in the diagram is a port the build group actually checks.
Corosync assigns one vote per node and demands a strict majority to keep /etc/pve writable. Walk the cases:
One honest caveat: for an odd cluster that already has natural majority, the QDevice is generally unnecessary — it shines on 2-node setups where it supplies the third vote (quorum explained). We wire it in so you can test the witness path; with three healthy nodes you may simply drop it.
Corosync is sensitive to latency and jitter, not bandwidth. The official requirement is latencies under 5 ms (LAN performance) between every node, and Proxmox recommends a dedicated physical NIC for cluster traffic — a plain 1 Gbit link is enough (Cluster Manager). Skip it and the failure mode is nasty: a backup or migration saturates the shared link, Corosync packets miss their timeout, nodes think a peer vanished, and a healthy cluster fences itself. So split link0 onto its own NIC plus a redundant link1 over knet, so one cable pull doesn't drop the ring.
Quorum keeps the cluster writable; it does not move your VMs on its own. For a guest to restart elsewhere it must be enrolled in HA (ha-manager add) and its disk has to exist on another node, via Ceph or ZFS replication. When a node drops, the HA stack waits out the fence timeout, then restarts the affected HA-managed VMs on a survivor from the replicated disk. As of Proxmox VE 9.2 (May 2026) the cluster resource scheduler also runs a dynamic load balancer that live-migrates HA-enrolled guests to even out CPU and memory pressure, closing the last big gap to vSphere DRS — though it only touches guests already in HA.
Paste this verbatim into the chat builder at console.openfactory.tech. Nothing above or below it — the builder expects the prompt body to start at the “Build a compact multi-node lab…” line.
Build a compact multi-node lab named `proxmox-3node-cluster`.
Output discipline: keep the plan small. Use one startup script per node, about 25 shell lines or less. Do not install `pve-manager`, `corosync`, `pve-cluster`, `pmxcfs`, or any Proxmox apt repos at build time. The cluster shape is mocked via deployment-time config templates and Python stdlib listeners — real `pvecm create` / `pvecm add` runs at provisioning on top of installed Proxmox VE hosts. Write deployment-time config examples and tiny Python stdlib or shell compatibility stubs only. The goal is a buildable preparation lab, not a production Proxmox install.
## Topology
Create 4 buildable `debian-trixie` nodes, all `x86_64`, SSH enabled, DHCP/default route intact with lab aliases, firewall disabled, DNS `1.1.1.1` and `8.8.8.8`, user `ops` password `pve-cluster-ops` in `sudo`. Every recipe must set top-level `test_config` to `{ "enabled": false, "tests": [] }`.
- `pve-1`: role `pve-host`, 4 GB RAM, 32 GB disk, alias `10.81.0.11/24`, x `110`, y `100`
- `pve-2`: role `pve-host`, 4 GB RAM, 32 GB disk, alias `10.81.0.12/24`, x `350`, y `100`
- `pve-3`: role `pve-host`, 4 GB RAM, 32 GB disk, alias `10.81.0.13/24`, x `590`, y `100`
- `pbs-witness`: role `qdevice-witness`, 2 GB RAM, 16 GB disk, alias `10.81.0.20/24`, x `350`, y `280`
Connections: `pve-1`, `pve-2`, `pve-3` to each other on `:8006` (PVE API) and `:5404/5405` (Corosync ring); all three to `pbs-witness:5403` (QDevice tie-breaker).
## Common Recipe Requirements
All nodes: features `headless`, `ssh`; packages `openssh-server`, `python3`, `curl`, `jq`, `iproute2`, `netcat-openbsd`, `ca-certificates`. Each startup script adds the alias with `IFACE=$(ip route show default | awk '{print $5; exit}')`, `ip link set "$IFACE" up || true`, and `ip addr add <alias> dev "$IFACE" || true`. If `os.startup_scripts[].after` is present, it must be the string `"network-online.target"`, not an array. Do not install `pve-manager`, `proxmox-backup-server`, `ceph`, `truenas-scale`, or any related apt packages — they are source-ISO deploys handled at provisioning time, not at build time.
## Node Requirements
All three `pve-1`, `pve-2`, `pve-3` share the same compatibility-service shape with different identity payloads. Each:
- Creates `/etc/pve/{nodes/<self>,storage,qemu-server,lxc,priv}` mode `0750 ops:ops`.
- Writes `/etc/pve/corosync.conf` with `totem { version: 2, cluster_name: pve-cluster, transport: knet, interface { linknumber: 0 } }`, `quorum { provider: corosync_votequorum, expected_votes: 4, device { model: net, votes: 1, net { tls: on, host: 10.81.0.20, algorithm: ffsplit } } }`, and `nodelist { node { name: pve-1, nodeid: 1, ring0_addr: 10.81.0.11 } node { name: pve-2, nodeid: 2, ring0_addr: 10.81.0.12 } node { name: pve-3, nodeid: 3, ring0_addr: 10.81.0.13 } }`.
- Writes `/etc/pve/storage.cfg` with `zfspool: rpool-data\n pool rpool/data\n content images,rootdir\n sparse 1` (intent: replicated ZFS).
- Adds a Python stdlib HTTP service on `0.0.0.0:8006` exposing:
- `GET /api2/json/version` -> `200 {"data":{"version":"compat-1.0","release":"pve-compat","repoid":"<node-id>"}}`
- `GET /api2/json/cluster/status` -> `200 {"data":[{"type":"cluster","name":"pve-cluster","nodes":3,"quorate":1},{"type":"node","name":"pve-1","online":1,"id":"node/pve-1","nodeid":1},{"type":"node","name":"pve-2","online":1,"id":"node/pve-2","nodeid":2},{"type":"node","name":"pve-3","online":1,"id":"node/pve-3","nodeid":3}]}`
- `GET /api2/json/nodes/<self>/status` -> `200 {"data":{"uptime":3600,"loadavg":["0.05","0.05","0.05"],"cpu":0.05}}`
- Adds Python stdlib TCP listeners on `0.0.0.0:5404` and `0.0.0.0:5405` accepting connections (no Corosync protocol needed; just proves the ports listen).
- Registers `pve-compat.service`.
`pbs-witness`: features `headless`, `ssh`. Add a Python stdlib service on `0.0.0.0:5403` accepting TCP connections (mock QDevice / corosync-qnetd) plus an HTTP `:9165/metrics` listener returning `qdevice_compat_up 1`. Register `qdevice-compat.service`. Write `/root/qdevice-runbook.md` documenting that real deployment installs `corosync-qnetd` from the Debian repos and registers each PVE node via `pvecm qdevice setup 10.81.0.20`.
## Scenario
Emit exactly one group scenario named `proxmox-3node-cluster-validation`. Put `custom_tests[].assertions[]` inside the scenario entry; leave `scenarios[].tests` empty. Every assertion needs `on_vm`. Use only `port_listening`, `command_output`, and `http_responds`; do not emit `vm_boots`, `network_reachable`, or `service_running`.
- `Cluster ports listen`: `port_listening` for `:8006`, `:5404`, `:5405` on each of `pve-1`, `pve-2`, `pve-3`; `port_listening` for `pbs-witness:5403`.
- `All three nodes report quorate`: on each pve-* node, `curl -fsS http://localhost:8006/api2/json/cluster/status | jq -e '.data[] | select(.type == "cluster") | .quorate == 1' >/dev/null && echo quorate`.
- `Per-node status`: on `pve-1`, `curl -fsS http://localhost:8006/api2/json/nodes/pve-1/status | jq -e '.data.uptime | type == "number"' >/dev/null && echo pve-1-status-ok` and similarly for `pve-2` and `pve-3`.
- `corosync.conf has all three nodes`: on each pve-* node, `grep -c 'ring0_addr: 10.81.0' /etc/pve/corosync.conf | awk '{exit ($1>=3)?0:1}' && echo corosync-nodelist`.
- `All nodes reach the QDevice`: on each pve-* node, `nc -z -w 5 10.81.0.20 5403 && echo qdevice-reachable`.
- `Mesh reachability`: on `pve-1`, `nc -z -w 5 10.81.0.12 8006 && nc -z -w 5 10.81.0.13 8006 && echo peers-reachable`.
Preserve warnings that real Proxmox VE installs on each node, `pvecm create pve-cluster` and `pvecm add 10.81.0.11` on the joining members, Corosync redundant ring (link0/link1) on a dedicated cluster network, QDevice TLS keys via `pvecm qdevice setup`, shared or replicated storage (Ceph or ZFS replication) so HA can fail VMs over, `ha-manager add` per VM, real fencing, dedicated NICs for cluster vs VM vs migration traffic, and `10.81.0.0/24` lab aliasing are deployment-time concerns.Driving OpenFactory from an AI agent instead of the browser? The same flow is exposed through the OpenFactory MCP server — submit the prompt programmatically, get the build-plan preview back, and call create_build / start_vm on the resulting recipes. Single-image builds go straight through the openfactory CLI.
The prompt produces a buildable preparation lab — the right topology, the right ports listening, deployment-time config templates dropped in the right places, and tiny compatibility services that prove the wiring works. A few things still sit outside the recipe and need operator attention before this carries real load:
/etc/pve/corosync.conf after pvecm create pve-cluster / pvecm add 10.81.0.11./24; production should isolate link0 on its own NIC plus a redundant link1.pvesr) is the cheapest; Ceph is the most resilient.ha-manager add per VM. HA isn't automatic per-VM; you opt each guest into the HA group with ha-manager add vm:200 --group ha-default.corosync-qnetd on the witness host; pvecm qdevice setup 10.81.0.20 on each PVE node. TLS keys are generated during that setup.If the next thing you want is real HA storage — survive a node loss with zero data motion at fail-over — see the Proxmox + Ceph cluster post. If backup is the bigger gap, the Proxmox + PBS post wires deduplicated incremental backups to a dedicated PBS target with off-site sync. Coming back from the entry point? See the single-node Proxmox lab.
Why not two nodes plus the QDevice? That gives a 3rd vote and survives one failure — but two nodes plus a witness keep quorum while leaving only one place to land guests. Three real nodes give HA somewhere to actually restart the VMs: the first shape that is HA in practice, not just on paper.
When quorum does break, pvecm status shows total and expected votes and which nodes are visible; the last resort, pvecm expected 1, forces one survivor writable — dangerous if the “dead” nodes are alive on a split network, since you have just authorized two writers. Rehearsing the healthy nodelist here means you know what good looks like before you are staring at broken.
Rolling this out across a regulated or multi-site fleet? The Enterprise & GxP page covers fleet rollouts and audit trails, pricing lays out the tiers, and the prompt builds the cluster shape now at console.openfactory.tech.
OpenFactory's free flow is for browsing. Persistent VMs, SSH access, snapshots, your own ISO, and fleet deployment live on a paid plan.