OpenFactory Proxmox 3-node cluster lab with Corosync ring and a QDevice tie-breaker

Build a 3-Node Proxmox HA Cluster on OpenFactory

Four VMs from one prompt: three PVE nodes + a QDevice witness, Corosync wiring baked in

June 8, 2026

A single-node Proxmox box is the entry point. The day you stop being willing to lose state when that one box goes down, you need three nodes. That's the minimum for Corosync quorum and the first shape that survives a node loss with VMs still running.

This post walks through that exact shape as an OpenFactory build prompt: four buildable Debian Trixie VMs — three PVE nodes plus a QDevice witness — from a single prompt, with /etc/pve/corosync.conf already shaped, the cluster nodelist agreed across all three, and a mock PVE API reporting quorate: 1. Real pvecm create / pvecm add is the deploy-time step on top.

What you'll build

pve-1, pve-2, pve-3 (10.81.0.11–13:8006) — three PVE hosts each with a per-node /etc/pve/corosync.conf listing all three ring0 addresses + the QDevice tie-breaker, mock :5404 and :5405 Corosync listeners, and a mock cluster-status JSON reporting quorate: 1.
pbs-witness (10.81.0.20:5403) — QDevice / corosync-qnetd shape that breaks the tie on 2-node split-brain scenarios. Comes with a runbook explaining the real pvecm qdevice setup deploy step.
Replicated-storage intent — /etc/pve/storage.cfg on each PVE node pre-populated with zfspool: rpool-data as the replication target. Swap to Ceph via the next post in the series if you want full HA storage.

Why build it on OpenFactory

The ISO is the spec. The whole corosync.conf — nodelist, quorum-device config, transport — is baked into all three PVE node ISOs. No copy-paste between hosts at deploy.
Scenario assertions ride along. The build group fails closed if any node reports a different cluster name, if the nodelist drifts, or if the QDevice isn't reachable from any PVE host. You don't deploy and discover quorum is broken at 3am.
QDevice in the recipe, not an afterthought. Two-out-of-three quorum is a split-brain magnet. The witness is wired from day one.
Mesh reachability proved at build. Each PVE node verifies it can reach both peers on :8006 before the group reports built.

Topology

Three PVE nodes in a row, QDevice below. PVE↔PVE on :8006 (API), :5404/:5405 (Corosync ring); all three to the QDevice on :5403 for tie-breaking. Lab subnet 10.81.0.0/24.

The prompt

Paste this verbatim into the chat builder at console.openfactory.tech. Nothing above or below it — the builder expects the prompt body to start at the “Build a compact multi-node lab…” line.

Build a compact multi-node lab named `proxmox-3node-cluster`.

Output discipline: keep the plan small. Use one startup script per node, about 25 shell lines or less. Do not install `pve-manager`, `corosync`, `pve-cluster`, `pmxcfs`, or any Proxmox apt repos at build time. The cluster shape is mocked via deployment-time config templates and Python stdlib listeners — real `pvecm create` / `pvecm add` runs at provisioning on top of installed Proxmox VE hosts. Write deployment-time config examples and tiny Python stdlib or shell compatibility stubs only. The goal is a buildable preparation lab, not a production Proxmox install.

## Topology

Create 4 buildable `debian-trixie` nodes, all `x86_64`, SSH enabled, DHCP/default route intact with lab aliases, firewall disabled, DNS `1.1.1.1` and `8.8.8.8`, user `ops` password `pve-cluster-ops` in `sudo`. Every recipe must set top-level `test_config` to `{ "enabled": false, "tests": [] }`.

- `pve-1`: role `pve-host`, 4 GB RAM, 32 GB disk, alias `10.81.0.11/24`, x `110`, y `100`
- `pve-2`: role `pve-host`, 4 GB RAM, 32 GB disk, alias `10.81.0.12/24`, x `350`, y `100`
- `pve-3`: role `pve-host`, 4 GB RAM, 32 GB disk, alias `10.81.0.13/24`, x `590`, y `100`
- `pbs-witness`: role `qdevice-witness`, 2 GB RAM, 16 GB disk, alias `10.81.0.20/24`, x `350`, y `280`

Connections: `pve-1`, `pve-2`, `pve-3` to each other on `:8006` (PVE API) and `:5404/5405` (Corosync ring); all three to `pbs-witness:5403` (QDevice tie-breaker).

## Common Recipe Requirements

All nodes: features `headless`, `ssh`; packages `openssh-server`, `python3`, `curl`, `jq`, `iproute2`, `netcat-openbsd`, `ca-certificates`. Each startup script adds the alias with `IFACE=$(ip route show default | awk '{print $5; exit}')`, `ip link set "$IFACE" up || true`, and `ip addr add <alias> dev "$IFACE" || true`. If `os.startup_scripts[].after` is present, it must be the string `"network-online.target"`, not an array. Do not install `pve-manager`, `proxmox-backup-server`, `ceph`, `truenas-scale`, or any related apt packages — they are source-ISO deploys handled at provisioning time, not at build time.

## Node Requirements

All three `pve-1`, `pve-2`, `pve-3` share the same compatibility-service shape with different identity payloads. Each:

- Creates `/etc/pve/{nodes/<self>,storage,qemu-server,lxc,priv}` mode `0750 ops:ops`.
- Writes `/etc/pve/corosync.conf` with `totem { version: 2, cluster_name: pve-cluster, transport: knet, interface { linknumber: 0 } }`, `quorum { provider: corosync_votequorum, expected_votes: 4, device { model: net, votes: 1, net { tls: on, host: 10.81.0.20, algorithm: ffsplit } } }`, and `nodelist { node { name: pve-1, nodeid: 1, ring0_addr: 10.81.0.11 } node { name: pve-2, nodeid: 2, ring0_addr: 10.81.0.12 } node { name: pve-3, nodeid: 3, ring0_addr: 10.81.0.13 } }`.
- Writes `/etc/pve/storage.cfg` with `zfspool: rpool-data\n  pool rpool/data\n  content images,rootdir\n  sparse 1` (intent: replicated ZFS).
- Adds a Python stdlib HTTP service on `0.0.0.0:8006` exposing:
  - `GET /api2/json/version` -> `200 {"data":{"version":"compat-1.0","release":"pve-compat","repoid":"<node-id>"}}`
  - `GET /api2/json/cluster/status` -> `200 {"data":[{"type":"cluster","name":"pve-cluster","nodes":3,"quorate":1},{"type":"node","name":"pve-1","online":1,"id":"node/pve-1","nodeid":1},{"type":"node","name":"pve-2","online":1,"id":"node/pve-2","nodeid":2},{"type":"node","name":"pve-3","online":1,"id":"node/pve-3","nodeid":3}]}`
  - `GET /api2/json/nodes/<self>/status` -> `200 {"data":{"uptime":3600,"loadavg":["0.05","0.05","0.05"],"cpu":0.05}}`
- Adds Python stdlib TCP listeners on `0.0.0.0:5404` and `0.0.0.0:5405` accepting connections (no Corosync protocol needed; just proves the ports listen).
- Registers `pve-compat.service`.

`pbs-witness`: features `headless`, `ssh`. Add a Python stdlib service on `0.0.0.0:5403` accepting TCP connections (mock QDevice / corosync-qnetd) plus an HTTP `:9165/metrics` listener returning `qdevice_compat_up 1`. Register `qdevice-compat.service`. Write `/root/qdevice-runbook.md` documenting that real deployment installs `corosync-qnetd` from the Debian repos and registers each PVE node via `pvecm qdevice setup 10.81.0.20`.

## Scenario

Emit exactly one group scenario named `proxmox-3node-cluster-validation`. Put `custom_tests[].assertions[]` inside the scenario entry; leave `scenarios[].tests` empty. Every assertion needs `on_vm`. Use only `port_listening`, `command_output`, and `http_responds`; do not emit `vm_boots`, `network_reachable`, or `service_running`.

- `Cluster ports listen`: `port_listening` for `:8006`, `:5404`, `:5405` on each of `pve-1`, `pve-2`, `pve-3`; `port_listening` for `pbs-witness:5403`.
- `All three nodes report quorate`: on each pve-* node, `curl -fsS http://localhost:8006/api2/json/cluster/status | jq -e '.data[] | select(.type == "cluster") | .quorate == 1' >/dev/null && echo quorate`.
- `Per-node status`: on `pve-1`, `curl -fsS http://localhost:8006/api2/json/nodes/pve-1/status | jq -e '.data.uptime | type == "number"' >/dev/null && echo pve-1-status-ok` and similarly for `pve-2` and `pve-3`.
- `corosync.conf has all three nodes`: on each pve-* node, `grep -c 'ring0_addr: 10.81.0' /etc/pve/corosync.conf | awk '{exit ($1>=3)?0:1}' && echo corosync-nodelist`.
- `All nodes reach the QDevice`: on each pve-* node, `nc -z -w 5 10.81.0.20 5403 && echo qdevice-reachable`.
- `Mesh reachability`: on `pve-1`, `nc -z -w 5 10.81.0.12 8006 && nc -z -w 5 10.81.0.13 8006 && echo peers-reachable`.

Preserve warnings that real Proxmox VE installs on each node, `pvecm create pve-cluster` and `pvecm add 10.81.0.11` on the joining members, Corosync redundant ring (link0/link1) on a dedicated cluster network, QDevice TLS keys via `pvecm qdevice setup`, shared or replicated storage (Ceph or ZFS replication) so HA can fail VMs over, `ha-manager add` per VM, real fencing, dedicated NICs for cluster vs VM vs migration traffic, and `10.81.0.0/24` lab aliasing are deployment-time concerns.

Running it

Open the chat builder at console.openfactory.tech and paste the prompt into a new conversation.
Review the streamed build plan. You'll see the topology, per-node recipes, and the scenario assertions that will run after boot. Edit the prompt and re-run if anything is off.
Click Build group. OpenFactory fans the plan out to per-node ISO builds. When every ISO reaches built, boot the group on the runner network from the same UI.
Exercise the stack. The scenario assertions run automatically against the live VMs. From the host you can also hit the service ports directly to confirm end-to-end behavior.

Driving OpenFactory from an AI agent instead of the browser? The same flow is exposed through the OpenFactory MCP server — submit the prompt programmatically, get the build-plan preview back, and call create_build / start_vm on the resulting recipes. Single-image builds go straight through the openfactory CLI.

What's still your responsibility

The prompt produces a buildable preparation lab — the right topology, the right ports listening, deployment-time config templates dropped in the right places, and tiny compatibility services that prove the wiring works. A few things still sit outside the recipe and need operator attention before this carries real load:

Real Proxmox VE on each node. Boot the PVE installer; the corosync.conf shape, storage config, and ring layout are ready to drop onto a real /etc/pve/corosync.conf after pvecm create pve-cluster / pvecm add 10.81.0.11.
Dedicated cluster network. The lab puts everything on a single /24; production should isolate link0 Corosync on a dedicated NIC and add a redundant link1.
Replicated or shared storage. Corosync gives you quorum; HA fail-over needs the VM disks to exist on more than one node. ZFS replication (pvesr) is the cheapest; Ceph is the most resilient.
ha-manager add per VM. HA isn't automatic per-VM; you opt each guest into the HA group with ha-manager add vm:200 --group ha-default.
Fencing. Real HA needs hardware fencing (IPMI / iLO / iDRAC) so a stuck node can be killed cleanly. Out of scope of the lab; document yours.
Real QDevice install. corosync-qnetd on the witness host; pvecm qdevice setup 10.81.0.20 on each PVE node. TLS keys are generated during that setup.

Where to go next

If the next thing you want is real HA storage — survive a node loss with zero data motion at fail-over — see the Proxmox + Ceph cluster post. If backup is the bigger gap, the Proxmox + PBS post wires deduplicated incremental backups to a dedicated PBS target with off-site sync. Coming back from the entry point? See the single-node Proxmox lab.

Ready to ship this in production?

OpenFactory's free flow is for browsing. Persistent VMs, SSH access, snapshots, your own ISO, and fleet deployment live on a paid plan.

See pricing →Book a demo