
Karpathy's LLM Council pattern asks multiple models, anonymizes peer review, and synthesizes a final answer. Here's how to build your own council as a reproducible OpenFactory image.
May 30, 2026
The useful thing about an LLM council is not that it makes models sound more dramatic. It is that it turns a single answer into a structured process: independent answers, anonymous peer review, aggregate ranking, and final synthesis.
Andrej Karpathy's llm-council repo gave that pattern a clean, hackable form. Instead of asking one favorite model for an answer, the app sends the same prompt to several models through OpenRouter, has them review anonymized answers from the others, then asks a chairman model to produce the final response.
That is a good local app. But if you want to run an LLM council as a repeatable lab, a team tool, or an internal decision assistant, you eventually want more than a folder on your laptop. You want a pinned operating system, documented runtime secrets, services that come up after boot, validation checks, and a VM you can rebuild when the stack changes. That is exactly the shape OpenFactory is built for.
The reference implementation is intentionally small: a FastAPI backend, a React + Vite frontend, JSON conversation storage, and OpenRouter as the provider abstraction. The important part is the workflow.
The repo's technical notes call out two details that matter in practice: the system continues when one model fails, and the UI exposes the raw reviews and parsed rankings so the user can inspect how the final answer was produced.
A single LLM can be brilliant and still be overconfident, biased toward its own style, or weak on a specific task. A council gives you diversity of failure modes. One model might be better at mathematical rigor, another at product judgment, another at finding missing assumptions, and another at writing the final answer clearly.
This is not only a hobbyist instinct. The Language Model Council paper frames the same problem from an evaluation angle: using a single LLM judge can introduce intra-model bias, especially on subjective tasks, while a panel of models can produce rankings that are more robust and closer to human preferences in their case study.
The quality gains show up on hard benchmarks too. In the Mixture-of-Agents work, layering several open-source models so each one sees its peers' drafts reached 65.1% on AlpacaEval 2.0, ahead of GPT-4o's 57.5% — using only open weights. A council is a close cousin of that idea: independent drafts plus cross-model review beat any single member, even when no individual model is the strongest. The same authors are candid about the catch, though: the layered approach raises time-to-first-token, so you trade latency and tokens for quality.
The first version of an LLM council should be a weekend experiment. The second version should be reproducible. OpenFactory turns the council from an app install into an image build:
Paste this into console.openfactory.tech to build a first LLM council appliance. It keeps the app close to Karpathy's reference implementation while adding the operational pieces you need for a real VM.
Build a single-node lab image named llm-council-workstation.
Goal: create a reproducible local LLM Council appliance inspired by karpathy/llm-council. The image should boot into a developer-ready environment that runs the council backend, the React frontend, and validation checks. Treat API keys as runtime secrets, not build-time secrets.
Base image: ubuntu-24.04
Architecture: x86_64
Features: desktop, ssh, docker, nodejs, python, git, firewall
Packages: git, curl, jq, ca-certificates, build-essential, python3, python3-venv, python3-pip, nodejs, npm, nginx, supervisor, ufw
User: council, sudo-enabled, passwordless sudo for lab use
Runtime layout:
- Clone https://github.com/karpathy/llm-council into /opt/llm-council.
- Install uv for the council user with python3 -m pip install --user uv if an OS package is not available.
- Create /etc/llm-council/env.example with OPENROUTER_API_KEY=replace-me.
- Create /etc/llm-council/models.json with council models and a chairman model placeholder.
- Create /var/lib/llm-council/conversations for persisted conversation JSON.
- Do not bake any real API keys into the image.
Backend:
- Install Python dependencies with uv from the project root.
- Run the backend as user council on 127.0.0.1:8001.
- Add systemd unit llm-council-backend.service.
- The service must load EnvironmentFile=/etc/llm-council/env if present.
- If OPENROUTER_API_KEY is missing, the service should still start and expose a clear health/status response explaining that runtime configuration is needed.
Frontend:
- Install frontend dependencies with npm install under /opt/llm-council/frontend.
- Build the frontend if the repo supports a production build; otherwise run the Vite dev server bound to 127.0.0.1:5173.
- Add systemd unit llm-council-frontend.service.
- Configure nginx to expose the UI on http://0.0.0.0:8080 and proxy /api/ to 127.0.0.1:8001.
Security and operations:
- Enable SSH.
- Enable ufw with ports 22 and 8080 open.
- Add /root/llm-council-runbook.md explaining how to add OPENROUTER_API_KEY at /etc/llm-council/env, choose council models, restart services, and switch to local inference later.
- Add a shell helper /usr/local/bin/llm-council-status that prints service status, listening ports, and whether the API key is configured.
Validation:
- Confirm git clone exists at /opt/llm-council.
- Confirm llm-council-backend.service is enabled.
- Confirm llm-council-frontend.service is enabled.
- Confirm port 8080 listens after boot.
- Confirm curl -fsS http://localhost:8080 returns HTML.
- Confirm curl -fsS http://localhost:8001 or the backend health route returns either healthy or a missing-runtime-secret status, but not a crash.
Output:
- Produce one bootable ISO.
- Include the recipe, service files, nginx config, runbook, and validation results in the build artifacts.The point of a council is not to ask five nearly identical models to agree with one another. Pick models that disagree productively:
OpenRouter is the fastest way to get a council talking, but the reason to bake the appliance into a reproducible image is that you can later swap the provider for hardware you own — useful when the questions involve private data you would rather not send to a third party. Two serving stacks dominate in 2026: Ollama (a friendly wrapper over llama.cpp, ideal for a single workstation) and vLLM (a throughput-oriented server with PagedAttention for concurrent requests). A thin router like LiteLLM lets the council code keep speaking the same OpenAI-style API whether a member is local or hosted.
The hardware math is approachable. A useful rule of thumb is roughly 2 GB of VRAM per billion parameters at FP16, and 4-bit quantization (Q4_K_M) cuts that by about 75% with little quality loss. In practice an 8B-class model such as Llama 3.1 8B or Qwen3 8B fits comfortably in 8–12 GB and runs north of 40 tokens per second on a single consumer GPU, while a 70B model at Q4 needs around 40 GB. That makes a practical split easy: run one or two small, fast local members for privacy and cheap first opinions, and reserve a larger hosted or quantized 70B model for the chairman seat where final quality matters most.
Because the council is an OpenFactory image, “add local inference” becomes a recipe edit rather than a weekend of yak-shaving: add the GPU drivers and the serving runtime to the build, point models.json at the local endpoint, and rebuild. The validation step still proves the UI comes up and the backend reports a clear status if a key or endpoint is missing.
Do not use a council for every autocomplete, summary, or low-stakes chat. Use it where the extra latency and token spend are buying you something: architectural decisions, incident reviews, legal or policy drafts that still get human review, product strategy, model evaluation, and high-value writing where the final answer benefits from adversarial feedback.
The nice thing about building the council as an OpenFactory image is that the cost boundary is explicit. You can ship one small council for experimentation, one larger council for final review, and one local-only council for private data.
If you want the fastest path, open the OpenFactory console and paste the prompt above. If you want to adapt the image first, start with the custom Linux ISO builder guide or the GitHub-to-ISO workflow. The council pattern is young, but it already has the shape of a real operator tool: multiple opinions, visible disagreement, and a final answer you can audit.
An LLM council is a multi-model workflow where several language models answer the same prompt independently, review or rank each other's anonymized answers, and then a final chair model synthesizes the result.
Karpathy's llm-council repo made the pattern concrete as a local web app: Stage 1 collects first opinions, Stage 2 asks models to review anonymized responses, and Stage 3 has a chairman model compile the final answer.
OpenFactory turns the council into a reproducible bootable image with pinned packages, repeatable services, runtime secret placeholders, validation checks, and VM deployment instead of a one-off laptop setup.
No. Karpathy's reference app uses OpenRouter for easy access to multiple providers, but the same architecture can be adapted to local models through Ollama, vLLM, LiteLLM, or a private inference endpoint.
A rough rule of thumb is about 2 GB of VRAM per billion parameters at FP16, and 4-bit quantization cuts that by roughly 75%. An 8B model fits in 8-12 GB on a single consumer GPU and runs at 40+ tokens/second; a 70B model at Q4 needs around 40 GB. A common split is small fast local members for cheap first opinions plus a larger model in the chairman seat.
No. A council buys robustness and reduced single-model bias at the cost of extra latency and tokens, and research like Mixture-of-Agents notes the layered approach increases time-to-first-token. Use a council for high-stakes work like architecture decisions, incident reviews, and evaluation, not for autocomplete or low-stakes chat.
Build bootable Linux images from prompts, Git repositories, and reusable recipes.
Create repeatable Linux images for labs, fleets, and deployment workflows.
Turn a GitHub repo and its install docs into a bootable Linux image.
Connect OpenFactory to AI agents so they can build images, deploy VMs, and run tests.
Run untrusted agent and model workloads in disposable, hardware-isolated VMs that contain the blast radius.
OpenFactory's free flow is for browsing. Persistent VMs, SSH access, snapshots, your own ISO, and fleet deployment live on a paid plan.