A control room of screens running many application tests in parallel

App Tests, Organized: Group by Project, Re-run in One Click, Run in Parallel

OpenFactory's Test Panel now organizes your GUI tests by project and app, re-runs a whole group with one click, and runs them in parallel on a per-app fleet of isolated tester VMs.

June 18, 2026

← Back to Blog

Your tests just grew up. The OpenFactory Test Panel now organizes every GUI test by project and app, re-runs a whole group with a single click, and runs the group in parallel on a fleet of isolated tester machines — so checking a release goes from a slow, manual chore to one button and a progress bar.

A test you never re-run is a screenshot of the past. The whole point of a test suite is to run it again — after a deploy, before a release, when a dependency bumps — and see what moved. But re-running a pile of end-to-end GUI tests has always been the painful part: they were a flat list with no structure, you triggered them one at a time, and they fought over a single machine. So people stopped re-running them, and the suite quietly rotted.

This update fixes all three problems at once. Tests now have a home, a one-click re-run at every level, and a pool of machines to run on. Here is what changed and how to use it.

Tests live in a tree: Project → App → tests

The Test Panel mirrors how you already think about your work. At the top is a project — a product or initiative like “Knostra.ai.” Inside a project are the apps that make it up. And inside each app are the tests written for it, each with its own run history. Every test belongs to a project; there is no more “unassigned” limbo. If you create a test without picking a project, it is filed under a default project automatically so the tree always stays complete.

Project to App to tests hierarchyProjectKnostra.aiApp · webApp · bookingauth · email OTPinbox triagedraft & approvebooking CTA flowdemo form smokepublic page loads
Every test rolls up to an app, and every app to a project. The panel is a tree, not a flat list — so a hundred tests stay navigable.

One click re-runs the whole group

Each level of the tree has a Re-run button, and there is a global one at the top. Press the button on the knostra app and every test for that app replays. Press it on the Knostra.ai project and every app underneath it runs. Press “Re-run all” and your entire suite goes. You are never clicking into tests one by one to fire them off again.

Re-run scopesRe-run allScopeevery test you owne.g. all projectsRe-run projectScopeall apps in a projecte.g. Knostra.aiRe-run appScopeone app's testse.g. knostra
Three scopes, three buttons. The widest is one click; the narrowest is one click. The work scales, your effort does not.
The App Tests tab of the Test Panel, grouped under a project with a Re-run all button and per-scenario cards showing app, step count, and run count
The App Tests tab: scenarios grouped under their project, each card tagged with its app, and a single Re-run all at the top.

A re-run starts a background batch and hands control straight back to you. A live progress strip shows how many tests have finished, how many passed and failed, and exactly which machine each in-flight test is running on. You can close the tab and the batch keeps going; you can cancel it and the test in flight finishes cleanly while the rest are skipped. Tests that need a credential the platform doesn't hold are flagged and skipped rather than failing the whole batch.

The App Tests panel showing the Recent Runs tree: an App Tests project and its harness and selftest apps, each with a Re-run button, and test rows with passed/failed/error chips and run-history strips
The real Test Panel: tests grouped by project and app, a Re-run button at every level, and a per-test status + run-history strip so you can spot a flaky test at a glance.

A fleet of tester VMs, not one bottleneck

The reason re-running a suite used to crawl is that every test shared one machine and waited in line. Now each app gets its own pool of tester VMs. When a batch runs, tests for the same app spread across that app's pool — several at a time — and tests for different apps run on entirely separate machines. The wall-clock time for a big re-run drops from “sum of every test” to roughly “the longest test, a few at a time.”

Pool and queue: tests lease a free VM, queue when busyQueue (pending tests)t1t2t3t4leaseknostra poolVM #1leased · running t1VM #2leased · running t2Pool at capacity → t3, t4 wait in the queueA test finishes → its VM is released → next test leases itRaise the pool size for more parallelism and more users.
A test leases a free VM from its app's pool, runs, then releases it. When the pool is full, tests queue and start the instant a machine frees up.

That lease is also a lock. While a test holds a VM, no other test — yours or a teammate's — can touch it. Tester machines are scoped per user, so different people are never on the same box, and within your own pool the lease guarantees one test per machine at a time. If something crashes mid-test, the lease expires and the machine is reclaimed automatically, so a pool can never get permanently wedged.

Serial versus parallel test timingOne VM, serial6× longerThree VMs, parallel~2 slots wide
The same six tests: serialized on one machine versus spread across a three-VM pool. Pooling turns a coffee break into a glance.

Every test starts on a clean slate

Sharing machines across tests creates a classic hazard: one test leaves a modal open, a form half-filled, or a stale page loaded, and the next test inherits the mess and fails for the wrong reason. OpenFactory closes that door with build-up and tear-down around every test. Each test opens in a fresh browser tab and that tab is closed when the test ends, so the next test never sees the previous one's leftovers. Flakiness from cross-test contamination simply goes away.

Browser isolation between testsBefore · shared state bleedsold modalstale formTest 2 starts here — and trips over Test 1.After · fresh tab per testabout:blank — cleanTest 2 starts here — nothing carried over.
A fresh tab before each test and a close after it. No leftovers, no phantom failures.

Trigger it from the console, MCP, or chat

The buttons live in the Test Panel of the OpenFactory console, but everything is also available through the OpenFactory MCP, so an agent can run your suite on a schedule, after a deploy, or because you asked it to in chat. The relevant tools are run_app_test_group, get_app_test_group, and cancel_app_test_group.

Re-run all of the app tests for my "knostra" app and tell me
which ones regressed.

Use run_app_test_group(scope="app", app_name="knostra"), then poll
get_app_test_group(batch_id) until it finishes and summarize the
pass/fail results with a link to each failing run's report.
Live group re-run progressRe-running app · knostra4/9 · 3 passed · 1 failedCancelusherpa-sandbox-connect-and-readrunning on tester VM #1smart-inbox-triage-action-vs-fyirunning on tester VM #2Two tests run at once on this app's pool; the rest start as machines free up.
A live group re-run: completed-of-total, pass and fail counts, a progress bar, and which machine each in-flight test is running on. Close the tab and it keeps going; cancel and the running tests finish cleanly.

Run history you can actually read

Grouping does more than tidy the panel — it makes history legible. Every test keeps a strip of its recent runs right beside its name, so you can tell at a glance whether a test is solidly green, freshly broken, or flickering between the two. A flickering test is a signal in its own right: it usually means a real race or a genuinely flaky workflow, and seeing the pattern is the first step to fixing it.

Because a test's identity is stable — it is keyed by its app and its name — that history accumulates across commits, deploys, and weeks. The strip becomes a small benchmark of how each workflow has held up over time, not just a snapshot of the most recent attempt. When something breaks, you are not asking “did this ever work?” You can see the exact run where green turned red, open it, and watch what changed.

Built for teams, not just one laptop

The pool model is what makes all of this safe for more than one person. Tester machines are scoped per user and handed out under an exclusive lease, so two engineers can both press “Re-run all” at the same moment without their tests landing on the same machine and corrupting each other's results. Within your own pool, the lease guarantees one test per machine at a time; across users, the per-user scoping means you are never sharing a desktop with a colleague mid-run.

When a pool is at capacity, additional tests simply queue and start the instant a machine frees up — no errors, no manual retries, no “resource busy” dead ends. And when you need more throughput, you raise the pool size: the same suite finishes sooner because more of it runs at once. The system is designed to grow from one developer checking a branch to a whole team running suites in parallel without changing how any of it is triggered.

Choosing the right scope

The three re-run scopes map cleanly onto how you actually work. Re-run a single app while you iterate on it — the tightest loop, fastest feedback. Re-run a whole project before you cut a release that spans several of its apps, so you catch a regression in one corner before it ships with the rest. And re-run everything on a cadence — nightly, or after a dependency bump — to keep the entire surface honest.

Because every batch runs in the background and reports a clean pass/fail rollup, any of these is cheap enough to hand to an agent. “Every morning, re-run the Knostra.ai project and message me only if something regressed” is a one-sentence instruction now, not a CI pipeline you have to build and babysit. The buttons are there for the moments you want to look; the automation is there for the moments you don't.

The same engine, end to end

It is worth being clear about what the group re-run does and does not do. It does not reimplement testing — each test in a batch runs through the exact same engine described in our companion post on AI-in-the-loop visual testing: a fresh browser tab, the app driven by sight, assertions checked visually, and the whole run recorded. What the batch adds is orchestration — which tests run, on which machines, in what order, and how the results roll up. That separation is deliberate: the way a single test behaves is identical whether you launch it alone or as one of fifty, so a green batch means exactly what a green single run means.

Why it matters

Testing only pays off when it is cheap to repeat. By giving tests a structure that scales, a re-run that is one click at any level, a fleet of machines to run on in parallel, and clean isolation between every test, OpenFactory makes “run the whole suite again” something you do casually instead of dread. Catch the regression before your users do — and get back to building.

Next, see how each test actually runs — the AI drives your app by sight and records a replay you can scrub — and how to deploy an app to a live URL you can point these tests straight at.

Frequently asked questions

How are app tests organized now?

In a three-level tree: Project, then App, then the individual tests. A project like 'Knostra.ai' contains one or more apps, and each app holds the tests written for it. Every test belongs to a project, so nothing floats around unattached and the panel mirrors how you actually think about your products.

What do the re-run buttons do?

There are three scopes. 'Re-run all' replays every saved test you own. A project-level button replays every test across that project's apps. An app-level button replays just that app's tests. Each kicks off a background batch you can watch live and cancel.

Do tests run one at a time?

No. Each app has a pool of tester VMs, so an app's tests run several at a time, and different apps run fully in parallel on their own machines. A re-run that used to take an hour serially can finish in a fraction of the time.

Can two people run tests at once without stepping on each other?

Yes. Tester VMs are leased exclusively for the duration of a test and are scoped per user, so two people — or two batches — never drive the same machine at the same time. When a pool is busy, additional tests queue and start as soon as a VM frees up.

How do I trigger a group re-run?

From the Test Panel in the OpenFactory console with the re-run buttons, or programmatically through the OpenFactory MCP with run_app_test_group, get_app_test_group, and cancel_app_test_group. You can also just ask an agent in chat to re-run a project's tests.

Ready to ship this in production?

OpenFactory's free flow is for browsing. Persistent VMs, SSH access, snapshots, your own ISO, and fleet deployment live on a paid plan.