
OpenFactory's Test Panel now organizes your GUI tests by project and app, re-runs a whole group with one click, and runs them in parallel on a per-app fleet of isolated tester VMs.
June 18, 2026
Your tests just grew up. The OpenFactory Test Panel now organizes every GUI test by project and app, re-runs a whole group with a single click, and runs the group in parallel on a fleet of isolated tester machines — so checking a release goes from a slow, manual chore to one button and a progress bar.
A test you never re-run is a screenshot of the past. The whole point of a test suite is to run it again — after a deploy, before a release, when a dependency bumps — and see what moved. But re-running a pile of end-to-end GUI tests has always been the painful part: they were a flat list with no structure, you triggered them one at a time, and they fought over a single machine. So people stopped re-running them, and the suite quietly rotted.
This update fixes all three problems at once. Tests now have a home, a one-click re-run at every level, and a pool of machines to run on. Here is what changed and how to use it.
The Test Panel mirrors how you already think about your work. At the top is a project — a product or initiative like “Knostra.ai.” Inside a project are the apps that make it up. And inside each app are the tests written for it, each with its own run history. Every test belongs to a project; there is no more “unassigned” limbo. If you create a test without picking a project, it is filed under a default project automatically so the tree always stays complete.
Each level of the tree has a Re-run button, and there is a global one at the top. Press the button on the knostra app and every test for that app replays. Press it on the Knostra.ai project and every app underneath it runs. Press “Re-run all” and your entire suite goes. You are never clicking into tests one by one to fire them off again.

A re-run starts a background batch and hands control straight back to you. A live progress strip shows how many tests have finished, how many passed and failed, and exactly which machine each in-flight test is running on. You can close the tab and the batch keeps going; you can cancel it and the test in flight finishes cleanly while the rest are skipped. Tests that need a credential the platform doesn't hold are flagged and skipped rather than failing the whole batch.

The reason re-running a suite used to crawl is that every test shared one machine and waited in line. Now each app gets its own pool of tester VMs. When a batch runs, tests for the same app spread across that app's pool — several at a time — and tests for different apps run on entirely separate machines. The wall-clock time for a big re-run drops from “sum of every test” to roughly “the longest test, a few at a time.”
That lease is also a lock. While a test holds a VM, no other test — yours or a teammate's — can touch it. Tester machines are scoped per user, so different people are never on the same box, and within your own pool the lease guarantees one test per machine at a time. If something crashes mid-test, the lease expires and the machine is reclaimed automatically, so a pool can never get permanently wedged.
Sharing machines across tests creates a classic hazard: one test leaves a modal open, a form half-filled, or a stale page loaded, and the next test inherits the mess and fails for the wrong reason. OpenFactory closes that door with build-up and tear-down around every test. Each test opens in a fresh browser tab and that tab is closed when the test ends, so the next test never sees the previous one's leftovers. Flakiness from cross-test contamination simply goes away.
The buttons live in the Test Panel of the OpenFactory console, but everything is also available through the OpenFactory MCP, so an agent can run your suite on a schedule, after a deploy, or because you asked it to in chat. The relevant tools are run_app_test_group, get_app_test_group, and cancel_app_test_group.
Re-run all of the app tests for my "knostra" app and tell me
which ones regressed.
Use run_app_test_group(scope="app", app_name="knostra"), then poll
get_app_test_group(batch_id) until it finishes and summarize the
pass/fail results with a link to each failing run's report.Grouping does more than tidy the panel — it makes history legible. Every test keeps a strip of its recent runs right beside its name, so you can tell at a glance whether a test is solidly green, freshly broken, or flickering between the two. A flickering test is a signal in its own right: it usually means a real race or a genuinely flaky workflow, and seeing the pattern is the first step to fixing it.
Because a test's identity is stable — it is keyed by its app and its name — that history accumulates across commits, deploys, and weeks. The strip becomes a small benchmark of how each workflow has held up over time, not just a snapshot of the most recent attempt. When something breaks, you are not asking “did this ever work?” You can see the exact run where green turned red, open it, and watch what changed.
The pool model is what makes all of this safe for more than one person. Tester machines are scoped per user and handed out under an exclusive lease, so two engineers can both press “Re-run all” at the same moment without their tests landing on the same machine and corrupting each other's results. Within your own pool, the lease guarantees one test per machine at a time; across users, the per-user scoping means you are never sharing a desktop with a colleague mid-run.
When a pool is at capacity, additional tests simply queue and start the instant a machine frees up — no errors, no manual retries, no “resource busy” dead ends. And when you need more throughput, you raise the pool size: the same suite finishes sooner because more of it runs at once. The system is designed to grow from one developer checking a branch to a whole team running suites in parallel without changing how any of it is triggered.
The three re-run scopes map cleanly onto how you actually work. Re-run a single app while you iterate on it — the tightest loop, fastest feedback. Re-run a whole project before you cut a release that spans several of its apps, so you catch a regression in one corner before it ships with the rest. And re-run everything on a cadence — nightly, or after a dependency bump — to keep the entire surface honest.
Because every batch runs in the background and reports a clean pass/fail rollup, any of these is cheap enough to hand to an agent. “Every morning, re-run the Knostra.ai project and message me only if something regressed” is a one-sentence instruction now, not a CI pipeline you have to build and babysit. The buttons are there for the moments you want to look; the automation is there for the moments you don't.
It is worth being clear about what the group re-run does and does not do. It does not reimplement testing — each test in a batch runs through the exact same engine described in our companion post on AI-in-the-loop visual testing: a fresh browser tab, the app driven by sight, assertions checked visually, and the whole run recorded. What the batch adds is orchestration — which tests run, on which machines, in what order, and how the results roll up. That separation is deliberate: the way a single test behaves is identical whether you launch it alone or as one of fifty, so a green batch means exactly what a green single run means.
Testing only pays off when it is cheap to repeat. By giving tests a structure that scales, a re-run that is one click at any level, a fleet of machines to run on in parallel, and clean isolation between every test, OpenFactory makes “run the whole suite again” something you do casually instead of dread. Catch the regression before your users do — and get back to building.
Next, see how each test actually runs — the AI drives your app by sight and records a replay you can scrub — and how to deploy an app to a live URL you can point these tests straight at.
In a three-level tree: Project, then App, then the individual tests. A project like 'Knostra.ai' contains one or more apps, and each app holds the tests written for it. Every test belongs to a project, so nothing floats around unattached and the panel mirrors how you actually think about your products.
There are three scopes. 'Re-run all' replays every saved test you own. A project-level button replays every test across that project's apps. An app-level button replays just that app's tests. Each kicks off a background batch you can watch live and cancel.
No. Each app has a pool of tester VMs, so an app's tests run several at a time, and different apps run fully in parallel on their own machines. A re-run that used to take an hour serially can finish in a fraction of the time.
Yes. Tester VMs are leased exclusively for the duration of a test and are scoped per user, so two people — or two batches — never drive the same machine at the same time. When a pool is busy, additional tests queue and start as soon as a VM frees up.
From the Test Panel in the OpenFactory console with the re-run buttons, or programmatically through the OpenFactory MCP with run_app_test_group, get_app_test_group, and cancel_app_test_group. You can also just ask an agent in chat to re-run a project's tests.
How each test actually runs: the AI drives your app by sight, asserts with image recognition, and records a replay.
Deploy a web app to a public URL, then point these tests straight at it.
Give your agents the tools they need to drive real workflows during a test.
OpenFactory's free flow is for browsing. Persistent VMs, SSH access, snapshots, your own ISO, and fleet deployment live on a paid plan.