Cursor + Self-Healing Automation: How to Code Review Without Slowing Delivery

Part 2 — How to review Cursor-generated test automation code before hits `master`

Then a senior engineer pinged me: “Cursor rewrote the failing tests and added self-healing so issue won’t happen again. Can we merge?”

I opened the PR. The diff was… enormous. Fuzzy matching. Locator fallbacks. An auto-retry loop that “stabilized” the flow. Fancy! But the real risk here isn’t flakiness anymore. It is tests that can pass while quietly rerouting the truth.

So picking up from Part 1, here’s how my QE code review changed, and how I review AI-generated test code and self-healing behavior so speed doesn’t turn into a blindfold.

Core problem: speed and self-heal both move the “truth boundary”

AI-generated automation code can absolutely accelerate delivery—especially when Cursor has enough repo context to stitch together page objects, fixtures, and stable locator patterns. Cursor also now advertises integrated review workflows (e.g., GitHub-connected review helpers) that can catch issues early.

Self-healing (locator fallback, semantic matching, auto-updating steps) reduces churn when UI changes. But the trade is subtle:

Generation risk: AI can produce valid-looking automation that asserts the wrong thing, at the wrong layer, with the wrong waits.
Self-heal risk: a tool can “find something clickable” and keep the test moving—masking regressions, role-based UI issues, and environment drift.

If Part 1 was “how to get value fast,” this Part 2 is the counterweight:

Treat AI-generated changes and healed changes as production code changes that require review-grade auditability.

What got easier in CR(honestly)

1) PRs arrive with structure, not fragments

Instead of 5 half-finished test files, you often get a cohesive set: fixtures, page helpers, and a first draft of selectors and assertions. That’s real velocity.

2) Consistency is easier to enforce

If your repo has a clear testing style (naming, locator strategy, assertion rules), Cursor tends to follow it after a few examples and guardrails. You can “shape” output faster than you can author from scratch.

3) Self-healing removes the most boring maintenance… sometimes

Locator-only healing can save hours when IDs/classes rename. But only when your test intent remains stable and the DOM shape doesn’t get ambiguous.

What got underestimated

1) The diff got bigger, and the risk got quieter

Cursor tends to refactor multiple files at once even when not “have to”. This means that most likely you end up reviewing a small testing system.

2) “Stability” features can become bug-masking features

Common patterns :

Retry loops that ignore state correctness
Fallback selectors accept the wrong element
Broad text matching (“closest match”)
Self-heal auto-updates that normalize unintended UI behavior

3) Failures stop being obvious

A broken test is noisy but honest.
A healed test can be green and wrong.

So your review has to shift from “does it run?” to “does it prove the thing we think it proves?”

Self-Healing Strategies Compared

Strategy	Typical Tools	What It Fixes Well	Where It Breaks	Risk Level	Required Guardrails
Locator fallback	Framework plugins	renamed IDs/classes	duplicate elements	Medium	page-state assertions
Semantic matching	NLP/vision	text/style changes	wrong “closest match”	High	change audit + approvals
Auto-update steps	SaaS platforms	minor UI churn	workflow changes	High	human review, diff logs

(Self-healing commonly spans more than locators. Some approaches aim to “heal” interaction or flow changes too, which raises governance stakes. )

The code review mindset shift: you’re auditing intent, not syntax

When I review Cursor/AI-generated automation code (especially with self-heal enabled), I assume three things until proven otherwise:

The test may be asserting the wrong contract.
A self-heal may have “fixed” the wrong element.
The code may be robust in the wrong direction (stable but meaningless).

Code Review focus: Human-authored vs Cursor/AI-generated tests

Review Dimension	Traditional Human Code	Cursor-Generated Code	What to do differently
Selector quality	Usually intentional	Often “works-first”	Enforce selector policy + ban brittle fallbacks
Assertions	Typically minimal but relevant	Can be verbose or misplaced	Require “assert the contract” rules
Waits/retries	Known anti-patterns	AI loves “stabilizers”	Reject silent retries without state checks
Test boundaries	Clear-ish (sometimes)	Can mix UI/API/fixtures in one step	Require layered structure & helper boundaries
Maintainability	Consistent with team norms	Can introduce new patterns fast	Block new patterns unless added to standards
Observability	Relies on CI output	May add logs but not signals	Require artifacts: screenshots, network logs, heal logs

Mini case study: web app checkout flow + self-healing that “lies”

Scenario

A B2C web app checkout: Cart → Address → Payment → Confirm.

A UI change happens:

“Place Order” button text changes to “Submit Order”
A new “Submit” button appears in a newsletter widget on the same page
Role-based users see different button order

What Cursor generates (typical failure mode)

Cursor updates the test to:

Search for Submit text
Add a fallback locator that chooses the first match
Wrap click in a retry loop
Add self-heal update to “remember” the new locator

Result: tests pass for admin users in staging, fail intermittently for standard users, and quietly click the newsletter widget in production-like data.

The QE review intervention

Review focuses:

1) Locator policy enforcement

Prefer user-facing, stable selectors (role/label/testid) over text fuzzy matches when the page is busy.
If a fallback exists, it must prove uniqueness (count = 1) and validate context.

2) Intent-anchored assertions before and after critical actions

Before clicking “Place Order”: assert you’re in the payment confirmation section (not just “button visible”).
After clicking: assert order confirmation banner + order ID format + backend call succeeded (if allowed).

3) Self-heal governance

The “healed” locator update is not a convenience, it’s a behavior change.
Treat the healed diff like any other code change: reviewed, approved, and traceable.

Where self-healing helps (the safe win)

In the same flow, a CSS class rename breaks a secondary selector used only for a stable container (data-testid remains). Locator fallback can safely recover if the test validates page-state and uniqueness.

Dividing line: self-heal is safe when it’s constrained to non-ambiguous recovery and proven by assertions.

Practical code review checklist for Cursor/AI-generated UI automation

Gate 1 — “What contract does this test prove?”

Require a one-paragraph PR description answering:

What user behavior is validated?
What critical states are asserted?
What’s explicitly not asserted?

Gate 2 — Selector integrity (ban the silent footguns)

Reject if you see:

nth-child, layout-based selectors, or brittle XPaths as primary path
“First match wins” behavior without uniqueness check
Text fuzzy matching on pages with repeated labels
Locators that bypass accessibility roles where available

Accept if you see:

getByRole, getByLabel, getByTestId patterns (or your equivalent)
Strict uniqueness validation before action
Page-object boundaries that keep locators centralized

(Playwright explicitly encourages resilient locator strategies and avoiding brittle selectors. )

Gate 3 — Anti-mask rules (the big one)

Hard “no” patterns:

Retries that don’t assert state change
Catch-and-continue blocks around clicks/navigation
Broad “waitForTimeout” stabilization sprinkled everywhere
Screenshot-on-failure disabled “because noisy”

Hard “yes” patterns:

Retries only around known transient infrastructure issues (and logged)
Every retry accompanied by state assertions (“we are on step X”)
Failure artifacts captured consistently

Gate 4 — Self-heal changes must be reviewable

If your tool is doing any auto-heal / auto-update, require:

A diff or artifact showing what changed (locator before/after)
A confidence score or rationale (if available)
A “why it’s safe” statement tied to uniqueness + page-state assertions
A flag if the healed element appears in multiple regions/components

Rule of thumb:

If a healed change would scare you as a manual PR, it should scare you as a healed change.

A step-by-step adoption playbook (with guardrails)

Step 1 — Define “reviewable output” before you scale usage

Deliverables:

Locator policy (preferred → fallback → forbidden)
Assertion policy (critical state checks required)
Self-heal policy (what can be healed automatically vs blocked)
PR template for AI-generated changes

Guardrail: Block merges that don’t use the template.

Step 2 — Start with “AI-assisted authoring,” not “AI autonomy”

Use Cursor to draft tests, but require:

Human-owned final assertions
Human-approved locator strategy
No self-heal auto-update to main branch in the first phase

Guardrail: Self-heal runs can propose changes, but can’t auto-merge.

Step 3 — Introduce self-healing in “observe-only mode”

Let self-heal:

Suggest alternative locators
Produce heal logs
Provide diffs

But it does not update baselines automatically.

Guardrail: Review heals weekly like flaky triage—same rigor.

Step 4 — Expand to controlled auto-heal with strict boundaries

Allow auto-heal only for:

Non-ambiguous elements (unique, anchored containers)
Approved pages/components
Low-risk flows

Guardrail: Any semantic/vision matching requires explicit approval.

Step 5 — Add metrics that expose “quiet reroutes”

Track:

Healed rate by suite/module
“Healed then failed” rate
Escaped defects correlated with healed flows
Blocked heals (good signal!) vs accepted heals

Guardrail: A rising healed rate without a falling escape rate is a governance smell.

Final Notes

Most teams fail here because they treat self-healing as a local test framework feature, not a governance problem.

The platform angle we push at Omniit.ai is simple and strict:

Healed changes must be reviewed like human-made code changes (same approval workflow).
Auto-flag suspicious heals (ambiguity, multiple matches, role drift, environment mismatch).
Track healed rate vs escaped defects as first-class signals.
Auto-create PRs for heal-generated updates (instead of silently accepting them).
Require diff logs and audit trails so you can answer: “When did the test truth change, and who approved it?”

Cursor for Test Automation: Senior QE Field Notes on What Works (and What Breaks)

Part 2 — How to review Cursor-generated test automation code before hits `master`

Core problem: speed and self-heal both move the “truth boundary”

What got easier in CR(honestly)