Parallel execution

Run Maestro flows in parallel across simulators and devices to cut wall-clock time on large test suites.

Once your suite passes ~30 flows, running them serially turns a one-coffee CI step into a meeting-length wait. Parallelising Maestro flows is straightforward but has a couple of pitfalls. This guide covers sharding, multi-device execution, and the failure modes you will hit.

TL;DR

Use --shards N and --shard-index I to split a suite across N parallel CI jobs. Each shard boots its own simulator/emulator, runs its slice, and emits a JUnit report you can merge afterwards. Aim for shards of roughly equal duration, not equal count.

Local parallel execution

Maestro can drive multiple devices in a single invocation by giving each flow a target:

maestro test .maestro/ \
  --device "iPhone 15" \
  --device "iPhone 15 Pro"

In Maestro Deck, the device picker lets you select multiple devices and the runner fans out automatically. Each device gets its own log pane.

CI sharding

The most common pattern in CI is one shard per job, not one shard per device. Each job is a clean VM:

strategy:
  matrix:
    shard: [0, 1, 2, 3]
steps:
  - run: maestro test .maestro/ --shards 4 --shard-index ${{ matrix.shard }} \
         --format junit --output report-${{ matrix.shard }}.xml

After all shards complete, aggregate the reports:

- name: Merge reports
  run: npx junit-report-merger merged.xml report-*.xml

Choosing a shard count

The right shard count is dictated by the slowest individual flow, not the suite total:

If your slowest flow is 90 s and your total runtime is 12 minutes, the floor is ~90 s — adding shards beyond total / slowest = 8 saves nothing.
CI minutes are billed per-job, so over-sharding can cost more than it saves.

Start with 4 shards. If wall-clock time is still too long, profile (see below) before bumping it.

Profiling shard balance

Maestro emits per-flow timings in its JUnit output. A two-line script can dump them sorted:

grep -h 'testcase' report-*.xml \
  | sed -E 's/.*name="([^"]+)" time="([^"]+)".*/\2 \1/' \
  | sort -nr | head -20

If one shard is consistently 2× another, your distribution is uneven. Either reorder flows manually with tags or fix the slow flow.

Don't. Each shard should be independent. Shared fixtures (a logged-in user, a seeded backend) are the #1 source of cross-shard flakes.

Acceptable shared state:

Read-only assets (test images, sample inputs).
A staging backend reset before the suite starts (not between shards).

Not acceptable:

A "first" shard that creates an account that later shards rely on.
Test data identified by integers (user_1, user_2) — collisions guaranteed.

Generate per-flow unique IDs (UUIDs, timestamps) instead.

Multi-device on one machine

Driving 3+ simulators on a single laptop is possible but fragile. The bottleneck is usually the host's CPU and the simulator's animation thread. Symptoms:

Tap latency goes from 50 ms to 800 ms.
Flows that pass at --device "iPhone 15" time out at --device "iPhone 15" × 4.

Treat local multi-device as a developer convenience, not a CI strategy.

Reporting

Pass --format junit --output report.xml to every shard. The JUnit format is supported by every CI provider's test UI. For a richer view, use --debug-output ./debug to keep per-step screenshots and the view hierarchy on failure.

Common pitfalls

Sharding non-deterministically. If you sort flows by Math.random() or by mtime, two shards on different jobs can run the same flow. Always shard by stable index or filename hash.
Shared device pool. If two CI jobs on the same self-hosted runner both grab "iPhone 15", they will fight. Tag runners or boot named simulators per job.
Hidden serial dependencies. A flow that creates a user followed by one that logs in as that user looks fine until they land in different shards.

On this page