OpenFactory Proxmox + Ceph cluster lab with three MON/OSD hosts and an RBD client

Build a Proxmox + Ceph HA Cluster on OpenFactory

Four VMs from one prompt: three Ceph MON/OSD hosts + an RBD client, full HA storage shape

June 12, 2026

← Back to Blog

The 3-node Proxmox cluster gives you quorum. The 3-node Proxmox + Ceph cluster gives you quorum and distributed storage on the same three machines — pull a node, the VMs keep running, and the data they sit on is already replicated to the other two. It's the aspirational homelab HA shape.

This post walks through it as an OpenFactory build prompt: four buildable Debian Trixie VMs — three Ceph MON+OSD hosts plus an RBD client — from a single prompt, with ceph.conf, a sample CRUSH map, the MON/OSD listen ports, and a mock cluster/ceph/status JSON reporting HEALTH_OK already baked in. Real pveceph init / pveceph mon create / pveceph osd create are the deploy-time steps on top.

What you'll build

  • pve-mon-1, pve-mon-2, pve-mon-3 (10.82.0.11–13) — each runs the MON+OSD shape: /etc/ceph/ceph.conf with a shared fsid and the mon_host list, mock MON listeners on :6789 (v1) and :3300 (v2), mock OSD listeners on :6800/:6801, and a mock PVE API exposing cluster/ceph/status with HEALTH_OK and three OSDs in/up.
  • CRUSH map sample /etc/ceph/crushmap.txt with three hosts, one OSD each, and a replicated_rule that picks one OSD per host (the policy that makes a 3-replica pool survive a host loss).
  • pve-client (10.82.0.20:9283) — an RBD-client VM with a matching ceph.conf and a mock ceph-mgr Prometheus exporter, plus a runbook documenting rbd create rbd-vm-disks/vm-100-disk-0 and how to wire it into PVE storage.cfg.

Why build it on OpenFactory

  • The ISO is the spec. ceph.conf, CRUSH map, MON list, and the keyring placeholder are baked into every node ISO. No "did I run ceph-deploy on the right host" ambiguity at deploy.
  • Scenario assertions ride along. The build group fails closed if HEALTH_OK isn't reported on every MON, if the monmap doesn't list three MONs, or if the OSD count drops below three up/in.
  • Sane defaults baked in. osd_pool_default_size = 3, min_size = 2, public + cluster networks pre-set, host-spread CRUSH rule — the configuration choices people forget until 2am.
  • Client wired from day one. The pve-client VM proves the topology is reachable from outside the MON quorum before you trust real VM disks to the pool.

Topology

Three MON+OSD hosts in a row, RBD client below. Lab subnet 10.82.0.0/24. MON traffic on :6789/:3300, OSD traffic on :6800/:6801, mgr exporter on :9283.

The prompt

Paste this verbatim into the chat builder at console.openfactory.tech. Nothing above or below it — the builder expects the prompt body to start at the “Build a compact multi-node lab…” line.

Build a compact multi-node lab named `proxmox-ceph-cluster`.

Output discipline: keep the plan small. Use one startup script per node, about 25 shell lines or less. Do not install `ceph-mon`, `ceph-osd`, `ceph-mgr`, `pve-manager`, the Ceph apt repos, or any kernel `rbd` / `cephfs` modules at build time. The Ceph + PVE cluster is bootstrapped at deploy time via `pveceph init` and `pveceph mon create`; this lab only stages the configs and exposes mock listeners on the right ports. Write deployment-time config examples and tiny Python stdlib or shell compatibility stubs only. The goal is a buildable preparation lab, not a production Proxmox install.

## Topology

Create 4 buildable `debian-trixie` nodes, all `x86_64`, SSH enabled, DHCP/default route intact with lab aliases, firewall disabled, DNS `1.1.1.1` and `8.8.8.8`, user `ops` password `ceph-ops` in `sudo`. Every recipe must set top-level `test_config` to `{ "enabled": false, "tests": [] }`.

- `pve-mon-1`: role `pve-mon-osd`, 6 GB RAM, 48 GB disk, alias `10.82.0.11/24`, x `110`, y `100`
- `pve-mon-2`: role `pve-mon-osd`, 6 GB RAM, 48 GB disk, alias `10.82.0.12/24`, x `350`, y `100`
- `pve-mon-3`: role `pve-mon-osd`, 6 GB RAM, 48 GB disk, alias `10.82.0.13/24`, x `590`, y `100`
- `pve-client`: role `rbd-client`, 2 GB RAM, 16 GB disk, alias `10.82.0.20/24`, x `350`, y `280`

Connections: Three `pve-mon-*` nodes to each other on `:6789` (MON v1) and `:3300` (MON v2), plus `:6800-:6803` for OSDs; `pve-client` to all three MONs on `:6789` and `:3300`.

## Common Recipe Requirements

All nodes: features `headless`, `ssh`; packages `openssh-server`, `python3`, `curl`, `jq`, `iproute2`, `netcat-openbsd`, `ca-certificates`. Each startup script adds the alias with `IFACE=$(ip route show default | awk '{print $5; exit}')`, `ip link set "$IFACE" up || true`, and `ip addr add <alias> dev "$IFACE" || true`. If `os.startup_scripts[].after` is present, it must be the string `"network-online.target"`, not an array. Do not install `pve-manager`, `proxmox-backup-server`, `ceph`, `truenas-scale`, or any related apt packages — they are source-ISO deploys handled at provisioning time, not at build time.

## Node Requirements

All three `pve-mon-1`, `pve-mon-2`, `pve-mon-3` share the same shape with different MON id. Each:

- Creates `/etc/ceph/` mode `0755`, `/var/lib/ceph/{mon,osd}/ceph-<id>` mode `0750 ops:ops`.
- Writes `/etc/ceph/ceph.conf` with `[global]\nfsid = 00000000-0000-0000-0000-c0de1ab12026\nmon_initial_members = pve-mon-1,pve-mon-2,pve-mon-3\nmon_host = 10.82.0.11,10.82.0.12,10.82.0.13\npublic_network = 10.82.0.0/24\ncluster_network = 10.82.0.0/24\nauth_cluster_required = cephx\nauth_service_required = cephx\nauth_client_required = cephx\nosd_pool_default_size = 3\nosd_pool_default_min_size = 2\nosd_pool_default_pg_num = 128`.
- Writes `/etc/ceph/crushmap.txt` describing three hosts (`pve-mon-1`, `pve-mon-2`, `pve-mon-3`), one OSD each, and a `replicated_rule` that picks one OSD per host.
- Writes `/etc/ceph/ceph.client.admin.keyring.example` with `[client.admin]\n  key = AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==\n  caps mon = "allow *"\n  caps osd = "allow *"\n  caps mgr = "allow *"` (placeholder; real keyring is generated at deploy).
- Adds a Python stdlib TCP listener on `0.0.0.0:6789` accepting connections (mock MON v1 binary protocol port).
- Adds a Python stdlib TCP listener on `0.0.0.0:3300` (MON v2).
- Adds Python stdlib TCP listeners on `0.0.0.0:6800` and `0.0.0.0:6801` (OSD bind range).
- Adds a Python stdlib HTTP service on `0.0.0.0:8006` exposing `GET /api2/json/cluster/ceph/status` -> `200 {"data":{"health":{"status":"HEALTH_OK"},"fsid":"00000000-0000-0000-0000-c0de1ab12026","monmap":{"mons":[{"name":"pve-mon-1"},{"name":"pve-mon-2"},{"name":"pve-mon-3"}]},"osdmap":{"num_osds":3,"num_up_osds":3,"num_in_osds":3},"pgmap":{"num_pgs":128,"pgs_by_state":[{"state_name":"active+clean","count":128}]}}}` and `GET /metrics` with `ceph_compat_up 1` plus `ceph_health_ok 1`.
- Registers `ceph-compat.service`.

`pve-client`: features `headless`, `ssh`. Write `/etc/ceph/ceph.conf` mirroring the MON list above. Write `/etc/ceph/rbd-vm-disks.example` with `[rbd-vm-disks]\n  pool rbd-vm-disks\n  size 3\n  min_size 2\n  application rbd`. Add a Python stdlib HTTP service on `0.0.0.0:9283` (the Ceph mgr Prometheus exporter port) exposing `GET /metrics` with `ceph_client_compat_up 1` and `ceph_mon_targets 3`. Register `ceph-client-compat.service`. Write `/root/rbd-mount-runbook.md` documenting `rbd create rbd-vm-disks/vm-100-disk-0 --size 10G` and the matching PVE storage config.

## Scenario

Emit exactly one group scenario named `proxmox-ceph-cluster-validation`. Put `custom_tests[].assertions[]` inside the scenario entry; leave `scenarios[].tests` empty. Every assertion needs `on_vm`. Use only `port_listening`, `command_output`, and `http_responds`; do not emit `vm_boots`, `network_reachable`, or `service_running`.

- `Cluster ports listen`: on each `pve-mon-*`, `port_listening` for `:6789`, `:3300`, `:6800`, `:6801`, `:8006`; on `pve-client`, `port_listening` for `:9283`.
- `Ceph health is OK on every MON`: on each `pve-mon-*`, `curl -fsS http://localhost:8006/api2/json/cluster/ceph/status | jq -e '.data.health.status == "HEALTH_OK"' >/dev/null && echo health-ok`.
- `Three MONs in the monmap`: on each `pve-mon-*`, `curl -fsS http://localhost:8006/api2/json/cluster/ceph/status | jq -e '.data.monmap.mons | length == 3' >/dev/null && echo monmap-3`.
- `Three OSDs up + in`: on `pve-mon-1`, `curl -fsS http://localhost:8006/api2/json/cluster/ceph/status | jq -e '.data.osdmap.num_up_osds == 3 and .data.osdmap.num_in_osds == 3' >/dev/null && echo osds-up`.
- `ceph.conf agreement across nodes`: on each `pve-mon-*` and on `pve-client`, `grep -q 'mon_host = 10.82.0.11,10.82.0.12,10.82.0.13' /etc/ceph/ceph.conf && echo conf-agreed`.
- `Client reaches all three MONs`: on `pve-client`, `for h in 10.82.0.11 10.82.0.12 10.82.0.13; do nc -z -w 5 $h 6789 || exit 1; nc -z -w 5 $h 3300 || exit 1; done && echo mons-reachable`.

Preserve warnings that real Proxmox VE installation on each node, `pveceph init --network 10.82.0.0/24`, `pveceph mon create` and `pveceph osd create /dev/<disk>` on each host, real Ceph keyring distribution, separate `public_network` vs `cluster_network` on dedicated NICs (10 GbE+), at-least-3-OSD-per-host placement, MGR + MDS + Prometheus exporter daemons, CRUSH map host/rack/dc layers, real `pg_num` autoscaling, snapshot / scrub schedules, and `10.82.0.0/24` lab aliasing are deployment-time concerns.

Running it

  1. Open the chat builder at console.openfactory.tech and paste the prompt into a new conversation.
  2. Review the streamed build plan. You'll see the topology, per-node recipes, and the scenario assertions that will run after boot. Edit the prompt and re-run if anything is off.
  3. Click Build group. OpenFactory fans the plan out to per-node ISO builds. When every ISO reaches built, boot the group on the runner network from the same UI.
  4. Exercise the stack. The scenario assertions run automatically against the live VMs. From the host you can also hit the service ports directly to confirm end-to-end behavior.

Driving OpenFactory from an AI agent instead of the browser? The same flow is exposed through the OpenFactory MCP server — submit the prompt programmatically, get the build-plan preview back, and call create_build / start_vm on the resulting recipes. Single-image builds go straight through the openfactory CLI.

What's still your responsibility

The prompt produces a buildable preparation lab — the right topology, the right ports listening, deployment-time config templates dropped in the right places, and tiny compatibility services that prove the wiring works. A few things still sit outside the recipe and need operator attention before this carries real load:

  • Real Proxmox VE + Ceph install. pveceph init --network 10.82.0.0/24 on each PVE node, then pveceph mon create and pveceph osd create /dev/<disk> per host. The lab's ceph.conf shape lines up with the resulting layout.
  • Separate public_network + cluster_network. The lab collapses both onto the same /24 for buildability; production should give each its own NIC (10 GbE+ recommended).
  • At least one OSD per host. Production homelab Ceph typically wants 2-3 OSDs per host minimum — one OSD per host approaches the failure-domain limit.
  • MGR + MDS + Prometheus exporter daemons. pveceph mgr create, pveceph mds create for CephFS, ceph mgr module enable prometheus for metrics.
  • Keyring distribution. The lab ships a placeholder admin keyring; pveceph generates the real one and distributes it under /etc/pve/priv/ceph/.
  • Snapshot + scrub schedules. Ceph will scrub by default; tune the schedule to your disk class. RBD snapshots are free; use them.

Where to go next

Ceph is the storage half of HA; the cluster control plane is the 3-node Proxmox cluster the MONs sit on top of. The other common storage pattern is Proxmox + TrueNAS-as-a-VM if you already have a ZFS NAS and want PVE to mount its NFS export. And the Enterprise & GxP page covers compliance-grade rollouts.

Ready to ship this in production?

OpenFactory's free flow is for browsing. Persistent VMs, SSH access, snapshots, your own ISO, and fleet deployment live on a paid plan.