OpenFactory Proxmox + Ceph cluster lab with three MON/OSD hosts and an RBD client

Build a Proxmox + Ceph HA Cluster on OpenFactory

Four VMs from one prompt: three Ceph MON/OSD hosts + an RBD client, full HA storage shape

May 14, 2026

The 3-node Proxmox cluster gives you quorum. The 3-node Proxmox + Ceph cluster gives you quorum and distributed storage on the same three machines: pull a node, the VMs keep running, and the data they sit on is already replicated to the other two. It's the aspirational homelab HA shape.

This post walks through it as an OpenFactory build prompt: four buildable Debian Trixie VMs (three Ceph MON+OSD hosts plus an RBD client) from a single prompt, with ceph.conf, a sample CRUSH map, the MON/OSD listen ports, and a mock cluster/ceph/status JSON reporting HEALTH_OK already baked in. Real pveceph init / pveceph mon create / pveceph osd create are the deploy-time steps on top.

Before you commit hardware, do the capacity math. Ceph's default replicated pool is size = 3: every object is written to three OSDs on three different hosts, so the space amplification factor is exactly 3.0: 3 GiB of raw disk stores 1 GiB of your data (Ceph docs). A nine-drive, 12 TB-raw cluster gives you roughly 4 TB usable before you should stop filling it. That feels expensive next to a single RAID box. You're buying the ability to lose a whole node mid-write and keep serving I/O, which is the entire point. This lab bakes those defaults (size = 3, min_size = 2) into ceph.conf so the shape you build matches the shape you deploy.

What you'll build

pve-mon-1, pve-mon-2, pve-mon-3 (10.82.0.11–13). Each runs the MON+OSD shape: /etc/ceph/ceph.conf with a shared fsid and the mon_host list, mock MON listeners on :6789 (v1) and :3300 (v2), mock OSD listeners on :6800/:6801, and a mock PVE API exposing cluster/ceph/status with HEALTH_OK and three OSDs in/up.
CRUSH map sample: /etc/ceph/crushmap.txt with three hosts, one OSD each, and a replicated_rule that picks one OSD per host (the policy that makes a 3-replica pool survive a host loss).
pve-client (10.82.0.20:9283): an RBD-client VM with a matching ceph.conf and a mock ceph-mgr Prometheus exporter, plus a runbook documenting rbd create rbd-vm-disks/vm-100-disk-0 and how to wire it into PVE storage.cfg.

Why build it on OpenFactory

The ISO is the spec. ceph.conf, CRUSH map, MON list, and the keyring placeholder are baked into every node ISO. No "did I run ceph-deploy on the right host" ambiguity at deploy.
Scenario assertions ride along. The build group fails closed if HEALTH_OK isn't reported on every MON, if the monmap doesn't list three MONs, or if the OSD count drops below three up/in.
Sane defaults baked in. osd_pool_default_size = 3, min_size = 2, public + cluster networks pre-set, host-spread CRUSH rule, the configuration choices people forget until 2am.
Client wired from day one. The pve-client VM proves the topology is reachable from outside the MON quorum before you trust real VM disks to the pool.

Topology

Three MON+OSD hosts in a row, RBD client below. Lab subnet 10.82.0.0/24. MON traffic on :6789/:3300, OSD traffic on :6800/:6801, mgr exporter on :9283. In a real deployment the MON clients (port :6789/:3300) and the OSD replication/heartbeat traffic (the :6800-:7300 range) sit on two physically separate NICs: the public network and the cluster network. The diagram below shows the buildable lab shape, which collapses both onto one subnet; the dashed split is the production split you graduate to.

The lab collapses public + cluster onto one /24; production splits the dashed cluster-net replication traffic onto its own 10 GbE+ NIC.

The prompt

Choose Deploy to open this exact, registered prompt in Elster Terminal and start the build. Choose Copy prompt when you want to edit it first in build.openfactory.tech.

Build a compact multi-node lab named `proxmox-ceph-cluster`.

Output discipline: keep the plan small. Use one startup script per node, about 25 shell lines or less. Do not install `ceph-mon`, `ceph-osd`, `ceph-mgr`, `pve-manager`, the Ceph apt repos, or any kernel `rbd` / `cephfs` modules at build time. The Ceph + PVE cluster is bootstrapped at deploy time via `pveceph init` and `pveceph mon create`; this lab only stages the configs and exposes mock listeners on the right ports. Write deployment-time config examples and tiny Python stdlib or shell compatibility stubs only. The goal is a buildable preparation lab, not a production Proxmox install.

## Topology

Create 4 buildable `debian-trixie` nodes, all `x86_64`, SSH enabled, DHCP/default route intact with lab aliases, firewall disabled, DNS `1.1.1.1` and `8.8.8.8`, user `ops` password `ceph-ops` in `sudo`. Every recipe must set top-level `test_config` to `{ "enabled": false, "tests": [] }`.

- `pve-mon-1`: role `pve-mon-osd`, 6 GB RAM, 48 GB disk, alias `10.82.0.11/24`, x `110`, y `100`
- `pve-mon-2`: role `pve-mon-osd`, 6 GB RAM, 48 GB disk, alias `10.82.0.12/24`, x `350`, y `100`
- `pve-mon-3`: role `pve-mon-osd`, 6 GB RAM, 48 GB disk, alias `10.82.0.13/24`, x `590`, y `100`
- `pve-client`: role `rbd-client`, 2 GB RAM, 16 GB disk, alias `10.82.0.20/24`, x `350`, y `280`

Connections: Three `pve-mon-*` nodes to each other on `:6789` (MON v1) and `:3300` (MON v2), plus `:6800-:6803` for OSDs; `pve-client` to all three MONs on `:6789` and `:3300`.

## Common Recipe Requirements

All nodes: features `headless`, `ssh`; packages `openssh-server`, `python3`, `curl`, `jq`, `iproute2`, `netcat-openbsd`, `ca-certificates`. Each startup script adds the alias with `IFACE=$(ip route show default | awk '{print $5; exit}')`, `ip link set "$IFACE" up || true`, and `ip addr add <alias> dev "$IFACE" || true`. If `os.startup_scripts[].after` is present, it must be the string `"network-online.target"`, not an array. Do not install `pve-manager`, `proxmox-backup-server`, `ceph`, `truenas-scale`, or any related apt packages: they are source-ISO deploys handled at provisioning time, not at build time.

## Node Requirements

All three `pve-mon-1`, `pve-mon-2`, `pve-mon-3` share the same shape with different MON id. Each:

- Creates `/etc/ceph/` mode `0755`, `/var/lib/ceph/{mon,osd}/ceph-<id>` mode `0750 ops:ops`.
- Writes `/etc/ceph/ceph.conf` with `[global]\nfsid = 00000000-0000-0000-0000-c0de1ab12026\nmon_initial_members = pve-mon-1,pve-mon-2,pve-mon-3\nmon_host = 10.82.0.11,10.82.0.12,10.82.0.13\npublic_network = 10.82.0.0/24\ncluster_network = 10.82.0.0/24\nauth_cluster_required = cephx\nauth_service_required = cephx\nauth_client_required = cephx\nosd_pool_default_size = 3\nosd_pool_default_min_size = 2\nosd_pool_default_pg_num = 128`.
- Writes `/etc/ceph/crushmap.txt` describing three hosts (`pve-mon-1`, `pve-mon-2`, `pve-mon-3`), one OSD each, and a `replicated_rule` that picks one OSD per host.
- Writes `/etc/ceph/ceph.client.admin.keyring.example` with `[client.admin]\n  key = AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==\n  caps mon = "allow *"\n  caps osd = "allow *"\n  caps mgr = "allow *"` (placeholder; real keyring is generated at deploy).
- Adds a Python stdlib TCP listener on `0.0.0.0:6789` accepting connections (mock MON v1 binary protocol port).
- Adds a Python stdlib TCP listener on `0.0.0.0:3300` (MON v2).
- Adds Python stdlib TCP listeners on `0.0.0.0:6800` and `0.0.0.0:6801` (OSD bind range).
- Adds a Python stdlib HTTP service on `0.0.0.0:8006` exposing `GET /api2/json/cluster/ceph/status` -> `200 {"data":{"health":{"status":"HEALTH_OK"},"fsid":"00000000-0000-0000-0000-c0de1ab12026","monmap":{"mons":[{"name":"pve-mon-1"},{"name":"pve-mon-2"},{"name":"pve-mon-3"}]},"osdmap":{"num_osds":3,"num_up_osds":3,"num_in_osds":3},"pgmap":{"num_pgs":128,"pgs_by_state":[{"state_name":"active+clean","count":128}]}}}` and `GET /metrics` with `ceph_compat_up 1` plus `ceph_health_ok 1`.
- Registers `ceph-compat.service`.

`pve-client`: features `headless`, `ssh`. Write `/etc/ceph/ceph.conf` mirroring the MON list above. Write `/etc/ceph/rbd-vm-disks.example` with `[rbd-vm-disks]\n  pool rbd-vm-disks\n  size 3\n  min_size 2\n  application rbd`. Add a Python stdlib HTTP service on `0.0.0.0:9283` (the Ceph mgr Prometheus exporter port) exposing `GET /metrics` with `ceph_client_compat_up 1` and `ceph_mon_targets 3`. Register `ceph-client-compat.service`. Write `/root/rbd-mount-runbook.md` documenting `rbd create rbd-vm-disks/vm-100-disk-0 --size 10G` and the matching PVE storage config.

## Scenario

Emit exactly one group scenario named `proxmox-ceph-cluster-validation`. Put `custom_tests[].assertions[]` inside the scenario entry; leave `scenarios[].tests` empty. Every assertion needs `on_vm`. Use only `port_listening`, `command_output`, and `http_responds`; do not emit `vm_boots`, `network_reachable`, or `service_running`.

- `Cluster ports listen`: on each `pve-mon-*`, `port_listening` for `:6789`, `:3300`, `:6800`, `:6801`, `:8006`; on `pve-client`, `port_listening` for `:9283`.
- `Ceph health is OK on every MON`: on each `pve-mon-*`, `curl -fsS http://localhost:8006/api2/json/cluster/ceph/status | jq -e '.data.health.status == "HEALTH_OK"' >/dev/null && echo health-ok`.
- `Three MONs in the monmap`: on each `pve-mon-*`, `curl -fsS http://localhost:8006/api2/json/cluster/ceph/status | jq -e '.data.monmap.mons | length == 3' >/dev/null && echo monmap-3`.
- `Three OSDs up + in`: on `pve-mon-1`, `curl -fsS http://localhost:8006/api2/json/cluster/ceph/status | jq -e '.data.osdmap.num_up_osds == 3 and .data.osdmap.num_in_osds == 3' >/dev/null && echo osds-up`.
- `ceph.conf agreement across nodes`: on each `pve-mon-*` and on `pve-client`, `grep -q 'mon_host = 10.82.0.11,10.82.0.12,10.82.0.13' /etc/ceph/ceph.conf && echo conf-agreed`.
- `Client reaches all three MONs`: on `pve-client`, `for h in 10.82.0.11 10.82.0.12 10.82.0.13; do nc -z -w 5 $h 6789 || exit 1; nc -z -w 5 $h 3300 || exit 1; done && echo mons-reachable`.

Preserve warnings that real Proxmox VE installation on each node, `pveceph init --network 10.82.0.0/24`, `pveceph mon create` and `pveceph osd create /dev/<disk>` on each host, real Ceph keyring distribution, separate `public_network` vs `cluster_network` on dedicated NICs (10 GbE+), at-least-3-OSD-per-host placement, MGR + MDS + Prometheus exporter daemons, CRUSH map host/rack/dc layers, real `pg_num` autoscaling, snapshot / scrub schedules, and `10.82.0.0/24` lab aliasing are deployment-time concerns.

Running it

Choose Deploy. Elster Terminal opens with the registered prompt and immediately prepares the build plan or starts the requested workflow.
Review the streamed build plan. You'll see the topology, per-node recipes, and the scenario assertions that will run after boot. For build prompts, Elster confirms the generated preview automatically only after it matches the registered handoff.
Watch the group build. OpenFactory fans the confirmed plan out to per-node image builds, then boots the completed group on the runner network.
Exercise the stack. The scenario assertions run automatically against the live VMs. From the host you can also hit the service ports directly to confirm end-to-end behavior.

Driving OpenFactory from an AI agent instead of the browser? The same flow is exposed through the OpenFactory MCP server: submit the prompt programmatically, get the build-plan preview back, and call create_build / start_vm on the resulting recipes. Single-image builds go straight through the openfactory CLI.

What's still your responsibility

The prompt produces a buildable preparation lab: the right topology, the right ports listening, deployment-time config templates dropped in the right places, and tiny compatibility services that prove the wiring works. A few things still sit outside the recipe and need operator attention before this carries real load:

Real Proxmox VE + Ceph install. pveceph init --network 10.82.0.0/24 on each PVE node, then pveceph mon create and pveceph osd create /dev/<disk> per host. The lab's ceph.conf shape lines up with the resulting layout.
Separate public_network + cluster_network. The lab collapses both onto the same /24 for buildability. Proxmox recommends a minimum of 10 GbE dedicated to Ceph, and 25 GbE+ for the cluster (replication + heartbeat) network when you run NVMe OSDs: a single modern NVMe drive can saturate 10 Gbps on its own (PVE Ceph wiki). Keep corosync on its own low-latency 1 GbE link so Ceph's bursty traffic never starves quorum.
More OSDs, more memory. The lab models one OSD per host; production wants several so a single disk failure isn't a full failure-domain loss. Proxmox suggests a balanced starting point of ~12 OSDs across 3 nodes and budgeting 8 GiB of RAM per OSD (the daemon defaults to a 4 GiB osd_memory_target but performs better with headroom), plus enough free memory for Ceph to rebalance after a host drops.
MGR + MDS + Prometheus exporter daemons. pveceph mgr create (run at least two for an active/standby pair so the dashboard and balancer survive a node loss), pveceph mds create for CephFS, and ceph mgr module enable prometheus for the real :9283 metrics the lab mocks.
Keyring distribution. The lab ships a placeholder admin keyring; pveceph generates the real one and distributes it under /etc/pve/priv/ceph/.
Snapshot + scrub schedules. Ceph will scrub by default; tune the schedule to your disk class. RBD snapshots are free; use them.

Where to go next

Quick questions

Can I run Ceph on three nodes, or do I need more? Three is the supported floor: with size = 3 across three hosts you can lose one node and still hold two replicas (min_size = 2) and keep serving reads and writes. The catch is recovery. With only three hosts there's nowhere to re-replicate the lost copy until the node returns, so you run degraded. Four or five hosts let Ceph self-heal back to full redundancy while a node is down.

Why not size = 2 to save disk? Because min_size would have to drop to 1, and a single OSD blip during recovery can then lose the only remaining copy. Proxmox staff explicitly warn against 2/1 pools; the 3.0× cost is the price of not losing data.

Ceph is the storage half of HA; the cluster control plane is the 3-node Proxmox cluster the MONs sit on top of, and the single-node Proxmox lab is the place to start if you're new to the PVE API shape. The other common storage pattern is Proxmox + TrueNAS-as-a-VM if you already have a ZFS NAS and want PVE to mount its NFS export, and the VMs on a Ceph pool still want deduplicated off-host backups. Replication is not a backup. For compliance-grade rollouts, the Enterprise & GxP page and pricing cover the managed path.

Ready to ship this in production?

OpenFactory's free flow is for browsing. Persistent VMs, SSH access, snapshots, your own ISO, and fleet deployment live on a paid plan.

See pricing →Book a demo