
Four VMs from one prompt: three PVE nodes + a QDevice witness, Corosync wiring baked in
June 8, 2026
A single-node Proxmox box is the entry point. The day you stop being willing to lose state when that one box goes down, you need three nodes. That's the minimum for Corosync quorum and the first shape that survives a node loss with VMs still running.
This post walks through that exact shape as an OpenFactory build prompt: four buildable Debian Trixie VMs — three PVE nodes plus a QDevice witness — from a single prompt, with /etc/pve/corosync.conf already shaped, the cluster nodelist agreed across all three, and a mock PVE API reporting quorate: 1. Real pvecm create / pvecm add is the deploy-time step on top.
pve-1, pve-2, pve-3 (10.81.0.11–13:8006) — three PVE hosts each with a per-node /etc/pve/corosync.conf listing all three ring0 addresses + the QDevice tie-breaker, mock :5404 and :5405 Corosync listeners, and a mock cluster-status JSON reporting quorate: 1.pbs-witness (10.81.0.20:5403) — QDevice / corosync-qnetd shape that breaks the tie on 2-node split-brain scenarios. Comes with a runbook explaining the real pvecm qdevice setup deploy step./etc/pve/storage.cfg on each PVE node pre-populated with zfspool: rpool-data as the replication target. Swap to Ceph via the next post in the series if you want full HA storage.corosync.conf — nodelist, quorum-device config, transport — is baked into all three PVE node ISOs. No copy-paste between hosts at deploy.Three PVE nodes in a row, QDevice below. PVE↔PVE on :8006 (API), :5404/:5405 (Corosync ring); all three to the QDevice on :5403 for tie-breaking. Lab subnet 10.81.0.0/24.
Paste this verbatim into the chat builder at console.openfactory.tech. Nothing above or below it — the builder expects the prompt body to start at the “Build a compact multi-node lab…” line.
Build a compact multi-node lab named `proxmox-3node-cluster`.
Output discipline: keep the plan small. Use one startup script per node, about 25 shell lines or less. Do not install `pve-manager`, `corosync`, `pve-cluster`, `pmxcfs`, or any Proxmox apt repos at build time. The cluster shape is mocked via deployment-time config templates and Python stdlib listeners — real `pvecm create` / `pvecm add` runs at provisioning on top of installed Proxmox VE hosts. Write deployment-time config examples and tiny Python stdlib or shell compatibility stubs only. The goal is a buildable preparation lab, not a production Proxmox install.
## Topology
Create 4 buildable `debian-trixie` nodes, all `x86_64`, SSH enabled, DHCP/default route intact with lab aliases, firewall disabled, DNS `1.1.1.1` and `8.8.8.8`, user `ops` password `pve-cluster-ops` in `sudo`. Every recipe must set top-level `test_config` to `{ "enabled": false, "tests": [] }`.
- `pve-1`: role `pve-host`, 4 GB RAM, 32 GB disk, alias `10.81.0.11/24`, x `110`, y `100`
- `pve-2`: role `pve-host`, 4 GB RAM, 32 GB disk, alias `10.81.0.12/24`, x `350`, y `100`
- `pve-3`: role `pve-host`, 4 GB RAM, 32 GB disk, alias `10.81.0.13/24`, x `590`, y `100`
- `pbs-witness`: role `qdevice-witness`, 2 GB RAM, 16 GB disk, alias `10.81.0.20/24`, x `350`, y `280`
Connections: `pve-1`, `pve-2`, `pve-3` to each other on `:8006` (PVE API) and `:5404/5405` (Corosync ring); all three to `pbs-witness:5403` (QDevice tie-breaker).
## Common Recipe Requirements
All nodes: features `headless`, `ssh`; packages `openssh-server`, `python3`, `curl`, `jq`, `iproute2`, `netcat-openbsd`, `ca-certificates`. Each startup script adds the alias with `IFACE=$(ip route show default | awk '{print $5; exit}')`, `ip link set "$IFACE" up || true`, and `ip addr add <alias> dev "$IFACE" || true`. If `os.startup_scripts[].after` is present, it must be the string `"network-online.target"`, not an array. Do not install `pve-manager`, `proxmox-backup-server`, `ceph`, `truenas-scale`, or any related apt packages — they are source-ISO deploys handled at provisioning time, not at build time.
## Node Requirements
All three `pve-1`, `pve-2`, `pve-3` share the same compatibility-service shape with different identity payloads. Each:
- Creates `/etc/pve/{nodes/<self>,storage,qemu-server,lxc,priv}` mode `0750 ops:ops`.
- Writes `/etc/pve/corosync.conf` with `totem { version: 2, cluster_name: pve-cluster, transport: knet, interface { linknumber: 0 } }`, `quorum { provider: corosync_votequorum, expected_votes: 4, device { model: net, votes: 1, net { tls: on, host: 10.81.0.20, algorithm: ffsplit } } }`, and `nodelist { node { name: pve-1, nodeid: 1, ring0_addr: 10.81.0.11 } node { name: pve-2, nodeid: 2, ring0_addr: 10.81.0.12 } node { name: pve-3, nodeid: 3, ring0_addr: 10.81.0.13 } }`.
- Writes `/etc/pve/storage.cfg` with `zfspool: rpool-data\n pool rpool/data\n content images,rootdir\n sparse 1` (intent: replicated ZFS).
- Adds a Python stdlib HTTP service on `0.0.0.0:8006` exposing:
- `GET /api2/json/version` -> `200 {"data":{"version":"compat-1.0","release":"pve-compat","repoid":"<node-id>"}}`
- `GET /api2/json/cluster/status` -> `200 {"data":[{"type":"cluster","name":"pve-cluster","nodes":3,"quorate":1},{"type":"node","name":"pve-1","online":1,"id":"node/pve-1","nodeid":1},{"type":"node","name":"pve-2","online":1,"id":"node/pve-2","nodeid":2},{"type":"node","name":"pve-3","online":1,"id":"node/pve-3","nodeid":3}]}`
- `GET /api2/json/nodes/<self>/status` -> `200 {"data":{"uptime":3600,"loadavg":["0.05","0.05","0.05"],"cpu":0.05}}`
- Adds Python stdlib TCP listeners on `0.0.0.0:5404` and `0.0.0.0:5405` accepting connections (no Corosync protocol needed; just proves the ports listen).
- Registers `pve-compat.service`.
`pbs-witness`: features `headless`, `ssh`. Add a Python stdlib service on `0.0.0.0:5403` accepting TCP connections (mock QDevice / corosync-qnetd) plus an HTTP `:9165/metrics` listener returning `qdevice_compat_up 1`. Register `qdevice-compat.service`. Write `/root/qdevice-runbook.md` documenting that real deployment installs `corosync-qnetd` from the Debian repos and registers each PVE node via `pvecm qdevice setup 10.81.0.20`.
## Scenario
Emit exactly one group scenario named `proxmox-3node-cluster-validation`. Put `custom_tests[].assertions[]` inside the scenario entry; leave `scenarios[].tests` empty. Every assertion needs `on_vm`. Use only `port_listening`, `command_output`, and `http_responds`; do not emit `vm_boots`, `network_reachable`, or `service_running`.
- `Cluster ports listen`: `port_listening` for `:8006`, `:5404`, `:5405` on each of `pve-1`, `pve-2`, `pve-3`; `port_listening` for `pbs-witness:5403`.
- `All three nodes report quorate`: on each pve-* node, `curl -fsS http://localhost:8006/api2/json/cluster/status | jq -e '.data[] | select(.type == "cluster") | .quorate == 1' >/dev/null && echo quorate`.
- `Per-node status`: on `pve-1`, `curl -fsS http://localhost:8006/api2/json/nodes/pve-1/status | jq -e '.data.uptime | type == "number"' >/dev/null && echo pve-1-status-ok` and similarly for `pve-2` and `pve-3`.
- `corosync.conf has all three nodes`: on each pve-* node, `grep -c 'ring0_addr: 10.81.0' /etc/pve/corosync.conf | awk '{exit ($1>=3)?0:1}' && echo corosync-nodelist`.
- `All nodes reach the QDevice`: on each pve-* node, `nc -z -w 5 10.81.0.20 5403 && echo qdevice-reachable`.
- `Mesh reachability`: on `pve-1`, `nc -z -w 5 10.81.0.12 8006 && nc -z -w 5 10.81.0.13 8006 && echo peers-reachable`.
Preserve warnings that real Proxmox VE installs on each node, `pvecm create pve-cluster` and `pvecm add 10.81.0.11` on the joining members, Corosync redundant ring (link0/link1) on a dedicated cluster network, QDevice TLS keys via `pvecm qdevice setup`, shared or replicated storage (Ceph or ZFS replication) so HA can fail VMs over, `ha-manager add` per VM, real fencing, dedicated NICs for cluster vs VM vs migration traffic, and `10.81.0.0/24` lab aliasing are deployment-time concerns.Driving OpenFactory from an AI agent instead of the browser? The same flow is exposed through the OpenFactory MCP server — submit the prompt programmatically, get the build-plan preview back, and call create_build / start_vm on the resulting recipes. Single-image builds go straight through the openfactory CLI.
The prompt produces a buildable preparation lab — the right topology, the right ports listening, deployment-time config templates dropped in the right places, and tiny compatibility services that prove the wiring works. A few things still sit outside the recipe and need operator attention before this carries real load:
/etc/pve/corosync.conf after pvecm create pve-cluster / pvecm add 10.81.0.11./24; production should isolate link0 Corosync on a dedicated NIC and add a redundant link1.pvesr) is the cheapest; Ceph is the most resilient.ha-manager add per VM. HA isn't automatic per-VM; you opt each guest into the HA group with ha-manager add vm:200 --group ha-default.corosync-qnetd on the witness host; pvecm qdevice setup 10.81.0.20 on each PVE node. TLS keys are generated during that setup.If the next thing you want is real HA storage — survive a node loss with zero data motion at fail-over — see the Proxmox + Ceph cluster post. If backup is the bigger gap, the Proxmox + PBS post wires deduplicated incremental backups to a dedicated PBS target with off-site sync. Coming back from the entry point? See the single-node Proxmox lab.
OpenFactory's free flow is for browsing. Persistent VMs, SSH access, snapshots, your own ISO, and fleet deployment live on a paid plan.