
Four VMs from one prompt: three Ceph MON/OSD hosts + an RBD client, full HA storage shape
May 14, 2026
The 3-node Proxmox cluster gives you quorum. The 3-node Proxmox + Ceph cluster gives you quorum and distributed storage on the same three machines — pull a node, the VMs keep running, and the data they sit on is already replicated to the other two. It's the aspirational homelab HA shape.
This post walks through it as an OpenFactory build prompt: four buildable Debian Trixie VMs — three Ceph MON+OSD hosts plus an RBD client — from a single prompt, with ceph.conf, a sample CRUSH map, the MON/OSD listen ports, and a mock cluster/ceph/status JSON reporting HEALTH_OK already baked in. Real pveceph init / pveceph mon create / pveceph osd create are the deploy-time steps on top.
Before you commit hardware, do the capacity math. Ceph's default replicated pool is size = 3: every object is written to three OSDs on three different hosts, so the space amplification factor is exactly 3.0 — 3 GiB of raw disk stores 1 GiB of your data (Ceph docs). A nine-drive, 12 TB-raw cluster gives you roughly 4 TB usable before you should stop filling it. That feels expensive next to a single RAID box — you're buying the ability to lose a whole node mid-write and keep serving I/O, which is the entire point. This lab bakes those defaults (size = 3, min_size = 2) into ceph.conf so the shape you build matches the shape you deploy.
pve-mon-1, pve-mon-2, pve-mon-3 (10.82.0.11–13) — each runs the MON+OSD shape: /etc/ceph/ceph.conf with a shared fsid and the mon_host list, mock MON listeners on :6789 (v1) and :3300 (v2), mock OSD listeners on :6800/:6801, and a mock PVE API exposing cluster/ceph/status with HEALTH_OK and three OSDs in/up./etc/ceph/crushmap.txt with three hosts, one OSD each, and a replicated_rule that picks one OSD per host (the policy that makes a 3-replica pool survive a host loss).pve-client (10.82.0.20:9283) — an RBD-client VM with a matching ceph.conf and a mock ceph-mgr Prometheus exporter, plus a runbook documenting rbd create rbd-vm-disks/vm-100-disk-0 and how to wire it into PVE storage.cfg.ceph.conf, CRUSH map, MON list, and the keyring placeholder are baked into every node ISO. No "did I run ceph-deploy on the right host" ambiguity at deploy.HEALTH_OK isn't reported on every MON, if the monmap doesn't list three MONs, or if the OSD count drops below three up/in.osd_pool_default_size = 3, min_size = 2, public + cluster networks pre-set, host-spread CRUSH rule — the configuration choices people forget until 2am.pve-client VM proves the topology is reachable from outside the MON quorum before you trust real VM disks to the pool.Three MON+OSD hosts in a row, RBD client below. Lab subnet 10.82.0.0/24. MON traffic on :6789/:3300, OSD traffic on :6800/:6801, mgr exporter on :9283. In a real deployment the MON clients (port :6789/:3300) and the OSD replication/heartbeat traffic (the :6800-:7300 range) sit on two physically separate NICs — the public network and the cluster network. The diagram below shows the buildable lab shape, which collapses both onto one subnet; the dashed split is the production split you graduate to.
/24; production splits the dashed cluster-net replication traffic onto its own 10 GbE+ NIC.Paste this verbatim into the chat builder at console.openfactory.tech. Nothing above or below it — the builder expects the prompt body to start at the “Build a compact multi-node lab…” line.
Build a compact multi-node lab named `proxmox-ceph-cluster`.
Output discipline: keep the plan small. Use one startup script per node, about 25 shell lines or less. Do not install `ceph-mon`, `ceph-osd`, `ceph-mgr`, `pve-manager`, the Ceph apt repos, or any kernel `rbd` / `cephfs` modules at build time. The Ceph + PVE cluster is bootstrapped at deploy time via `pveceph init` and `pveceph mon create`; this lab only stages the configs and exposes mock listeners on the right ports. Write deployment-time config examples and tiny Python stdlib or shell compatibility stubs only. The goal is a buildable preparation lab, not a production Proxmox install.
## Topology
Create 4 buildable `debian-trixie` nodes, all `x86_64`, SSH enabled, DHCP/default route intact with lab aliases, firewall disabled, DNS `1.1.1.1` and `8.8.8.8`, user `ops` password `ceph-ops` in `sudo`. Every recipe must set top-level `test_config` to `{ "enabled": false, "tests": [] }`.
- `pve-mon-1`: role `pve-mon-osd`, 6 GB RAM, 48 GB disk, alias `10.82.0.11/24`, x `110`, y `100`
- `pve-mon-2`: role `pve-mon-osd`, 6 GB RAM, 48 GB disk, alias `10.82.0.12/24`, x `350`, y `100`
- `pve-mon-3`: role `pve-mon-osd`, 6 GB RAM, 48 GB disk, alias `10.82.0.13/24`, x `590`, y `100`
- `pve-client`: role `rbd-client`, 2 GB RAM, 16 GB disk, alias `10.82.0.20/24`, x `350`, y `280`
Connections: Three `pve-mon-*` nodes to each other on `:6789` (MON v1) and `:3300` (MON v2), plus `:6800-:6803` for OSDs; `pve-client` to all three MONs on `:6789` and `:3300`.
## Common Recipe Requirements
All nodes: features `headless`, `ssh`; packages `openssh-server`, `python3`, `curl`, `jq`, `iproute2`, `netcat-openbsd`, `ca-certificates`. Each startup script adds the alias with `IFACE=$(ip route show default | awk '{print $5; exit}')`, `ip link set "$IFACE" up || true`, and `ip addr add <alias> dev "$IFACE" || true`. If `os.startup_scripts[].after` is present, it must be the string `"network-online.target"`, not an array. Do not install `pve-manager`, `proxmox-backup-server`, `ceph`, `truenas-scale`, or any related apt packages — they are source-ISO deploys handled at provisioning time, not at build time.
## Node Requirements
All three `pve-mon-1`, `pve-mon-2`, `pve-mon-3` share the same shape with different MON id. Each:
- Creates `/etc/ceph/` mode `0755`, `/var/lib/ceph/{mon,osd}/ceph-<id>` mode `0750 ops:ops`.
- Writes `/etc/ceph/ceph.conf` with `[global]\nfsid = 00000000-0000-0000-0000-c0de1ab12026\nmon_initial_members = pve-mon-1,pve-mon-2,pve-mon-3\nmon_host = 10.82.0.11,10.82.0.12,10.82.0.13\npublic_network = 10.82.0.0/24\ncluster_network = 10.82.0.0/24\nauth_cluster_required = cephx\nauth_service_required = cephx\nauth_client_required = cephx\nosd_pool_default_size = 3\nosd_pool_default_min_size = 2\nosd_pool_default_pg_num = 128`.
- Writes `/etc/ceph/crushmap.txt` describing three hosts (`pve-mon-1`, `pve-mon-2`, `pve-mon-3`), one OSD each, and a `replicated_rule` that picks one OSD per host.
- Writes `/etc/ceph/ceph.client.admin.keyring.example` with `[client.admin]\n key = AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==\n caps mon = "allow *"\n caps osd = "allow *"\n caps mgr = "allow *"` (placeholder; real keyring is generated at deploy).
- Adds a Python stdlib TCP listener on `0.0.0.0:6789` accepting connections (mock MON v1 binary protocol port).
- Adds a Python stdlib TCP listener on `0.0.0.0:3300` (MON v2).
- Adds Python stdlib TCP listeners on `0.0.0.0:6800` and `0.0.0.0:6801` (OSD bind range).
- Adds a Python stdlib HTTP service on `0.0.0.0:8006` exposing `GET /api2/json/cluster/ceph/status` -> `200 {"data":{"health":{"status":"HEALTH_OK"},"fsid":"00000000-0000-0000-0000-c0de1ab12026","monmap":{"mons":[{"name":"pve-mon-1"},{"name":"pve-mon-2"},{"name":"pve-mon-3"}]},"osdmap":{"num_osds":3,"num_up_osds":3,"num_in_osds":3},"pgmap":{"num_pgs":128,"pgs_by_state":[{"state_name":"active+clean","count":128}]}}}` and `GET /metrics` with `ceph_compat_up 1` plus `ceph_health_ok 1`.
- Registers `ceph-compat.service`.
`pve-client`: features `headless`, `ssh`. Write `/etc/ceph/ceph.conf` mirroring the MON list above. Write `/etc/ceph/rbd-vm-disks.example` with `[rbd-vm-disks]\n pool rbd-vm-disks\n size 3\n min_size 2\n application rbd`. Add a Python stdlib HTTP service on `0.0.0.0:9283` (the Ceph mgr Prometheus exporter port) exposing `GET /metrics` with `ceph_client_compat_up 1` and `ceph_mon_targets 3`. Register `ceph-client-compat.service`. Write `/root/rbd-mount-runbook.md` documenting `rbd create rbd-vm-disks/vm-100-disk-0 --size 10G` and the matching PVE storage config.
## Scenario
Emit exactly one group scenario named `proxmox-ceph-cluster-validation`. Put `custom_tests[].assertions[]` inside the scenario entry; leave `scenarios[].tests` empty. Every assertion needs `on_vm`. Use only `port_listening`, `command_output`, and `http_responds`; do not emit `vm_boots`, `network_reachable`, or `service_running`.
- `Cluster ports listen`: on each `pve-mon-*`, `port_listening` for `:6789`, `:3300`, `:6800`, `:6801`, `:8006`; on `pve-client`, `port_listening` for `:9283`.
- `Ceph health is OK on every MON`: on each `pve-mon-*`, `curl -fsS http://localhost:8006/api2/json/cluster/ceph/status | jq -e '.data.health.status == "HEALTH_OK"' >/dev/null && echo health-ok`.
- `Three MONs in the monmap`: on each `pve-mon-*`, `curl -fsS http://localhost:8006/api2/json/cluster/ceph/status | jq -e '.data.monmap.mons | length == 3' >/dev/null && echo monmap-3`.
- `Three OSDs up + in`: on `pve-mon-1`, `curl -fsS http://localhost:8006/api2/json/cluster/ceph/status | jq -e '.data.osdmap.num_up_osds == 3 and .data.osdmap.num_in_osds == 3' >/dev/null && echo osds-up`.
- `ceph.conf agreement across nodes`: on each `pve-mon-*` and on `pve-client`, `grep -q 'mon_host = 10.82.0.11,10.82.0.12,10.82.0.13' /etc/ceph/ceph.conf && echo conf-agreed`.
- `Client reaches all three MONs`: on `pve-client`, `for h in 10.82.0.11 10.82.0.12 10.82.0.13; do nc -z -w 5 $h 6789 || exit 1; nc -z -w 5 $h 3300 || exit 1; done && echo mons-reachable`.
Preserve warnings that real Proxmox VE installation on each node, `pveceph init --network 10.82.0.0/24`, `pveceph mon create` and `pveceph osd create /dev/<disk>` on each host, real Ceph keyring distribution, separate `public_network` vs `cluster_network` on dedicated NICs (10 GbE+), at-least-3-OSD-per-host placement, MGR + MDS + Prometheus exporter daemons, CRUSH map host/rack/dc layers, real `pg_num` autoscaling, snapshot / scrub schedules, and `10.82.0.0/24` lab aliasing are deployment-time concerns.Driving OpenFactory from an AI agent instead of the browser? The same flow is exposed through the OpenFactory MCP server — submit the prompt programmatically, get the build-plan preview back, and call create_build / start_vm on the resulting recipes. Single-image builds go straight through the openfactory CLI.
The prompt produces a buildable preparation lab — the right topology, the right ports listening, deployment-time config templates dropped in the right places, and tiny compatibility services that prove the wiring works. A few things still sit outside the recipe and need operator attention before this carries real load:
pveceph init --network 10.82.0.0/24 on each PVE node, then pveceph mon create and pveceph osd create /dev/<disk> per host. The lab's ceph.conf shape lines up with the resulting layout./24 for buildability. Proxmox recommends a minimum of 10 GbE dedicated to Ceph, and 25 GbE+ for the cluster (replication + heartbeat) network when you run NVMe OSDs — a single modern NVMe drive can saturate 10 Gbps on its own (PVE Ceph wiki). Keep corosync on its own low-latency 1 GbE link so Ceph's bursty traffic never starves quorum.osd_memory_target but performs better with headroom), plus enough free memory for Ceph to rebalance after a host drops.pveceph mgr create (run at least two for an active/standby pair so the dashboard and balancer survive a node loss), pveceph mds create for CephFS, and ceph mgr module enable prometheus for the real :9283 metrics the lab mocks.pveceph generates the real one and distributes it under /etc/pve/priv/ceph/.Can I run Ceph on three nodes, or do I need more? Three is the supported floor: with size = 3 across three hosts you can lose one node and still hold two replicas (min_size = 2) and keep serving reads and writes. The catch is recovery — with only three hosts there's nowhere to re-replicate the lost copy until the node returns, so you run degraded. Four or five hosts let Ceph self-heal back to full redundancy while a node is down.
Why not size = 2 to save disk? Because min_size would have to drop to 1, and a single OSD blip during recovery can then lose the only remaining copy. Proxmox staff explicitly warn against 2/1 pools; the 3.0× cost is the price of not losing data.
Ceph is the storage half of HA; the cluster control plane is the 3-node Proxmox cluster the MONs sit on top of, and the single-node Proxmox lab is the place to start if you're new to the PVE API shape. The other common storage pattern is Proxmox + TrueNAS-as-a-VM if you already have a ZFS NAS and want PVE to mount its NFS export, and the VMs on a Ceph pool still want deduplicated off-host backups — replication is not a backup. For compliance-grade rollouts, the Enterprise & GxP page and pricing cover the managed path.
OpenFactory's free flow is for browsing. Persistent VMs, SSH access, snapshots, your own ISO, and fleet deployment live on a paid plan.