QA Engineering4 min read

I Replaced 500 Flaky Selectors With AI Self-Healing Tests — Here's What Happened

S

Suneet Malhotra

Mar 3, 2026

1 views
I Replaced 500 Flaky Selectors With AI Self-Healing Tests — Here's What Happened - QA Engineering blog post
🔧Playwright🔧AI Testing🔧Self-Healing Tests🦙Ollama📘TypeScript🔧MCP

The Night I Almost Quit Automation

It was 11 PM on a Thursday, and I was staring at 47 failed tests in our CI pipeline. Not because anything was actually broken — a frontend dev had renamed a few CSS classes during a refactor. Sound familiar?

After 20 years in QA engineering, I've watched this cycle repeat endlessly: write precise selectors, UI changes, selectors break, spend hours fixing them, repeat. In 2026, with AI agents building entire workflows autonomously, why are we still hardcoding data-testid="submit-btn-v3" and praying nobody touches it?

I decided to fix this once and for all. Here's exactly how I did it — and the surprising results.

The Problem: Brittle Selectors Are a Tax on Velocity

Every QA team knows the pain. You maintain hundreds (sometimes thousands) of CSS selectors, XPaths, and test IDs across your automation suite. One innocent UI refactor, and suddenly your pipeline is red.

At my current role at Motorola Solutions, I tracked the data: roughly 30% of our test failures weren't real bugs — they were broken selectors. That's not a testing problem. That's a maintenance tax that slows down every sprint.

The Solution: AI-Powered Semantic Element Discovery

Instead of hardcoding selectors, I built a system where an AI agent describes what it's looking for in natural language, and a local LLM figures out how to find it on the current page.

Here's the stack:

  • Playwright for browser automation
  • Model Context Protocol (MCP) as the communication layer
  • Ollama running a local LLM (gpt-oss:20b) for reasoning
  • TypeScript gluing it all together

The flow works like this:

  1. A test says: "Find the primary submit button in the checkout form"
  2. The agent takes a page snapshot (accessibility tree + DOM)
  3. The LLM analyzes the snapshot and generates a locator strategy
  4. If the locator fails, the agent automatically retries — taking a fresh snapshot and re-analyzing

No hardcoded selectors. No brittle XPaths. The AI adapts to whatever the UI looks like right now.

The Migration: 500 Selectors in 2 Weeks

I started with our most flaky test suite — the e-commerce checkout flow. Here's what the migration looked like:

Before (brittle):

await page.click('[data-testid="checkout-submit-btn"]');
await page.fill('#shipping-address-line-1', address);

After (self-healing):

await agent.action("Click the primary submit button on the checkout page");
await agent.action("Fill the first shipping address line with: " + address);

Over two weeks, I migrated 500+ selectors across 120 test files. The LLM handles the element discovery, and if the UI changes, it simply re-discovers the element on the next run.

The Results: 70% Fewer False Failures

After running the self-healing suite alongside our traditional tests for 30 days:

  • False failures dropped 70% — from ~15 per week to ~4
  • Maintenance hours fell by 60% — engineers stopped playing "fix the selector" every sprint
  • Test authoring got faster — writing natural language intent is quicker than crafting precise selectors
  • Slight speed tradeoff — AI-powered runs were ~20% slower due to LLM inference, but running Ollama locally kept latency manageable

The biggest win? Developer confidence in the test suite went up. When tests fail now, people actually investigate because they trust it's a real bug, not a stale selector.

What Google's Opal Announcement Means for QA

Last week, Google announced an AI agent for building automated workflows in Opal, powered by Gemini 3 Flash. It lets non-technical users build workflows with text prompts. This is the same direction QA is heading: natural language as the interface for automation.

The convergence is clear. In 2026, the best QA engineers aren't the ones writing the most precise XPaths — they're the ones designing systems where AI handles the fragile parts and humans focus on test strategy.

How to Start Self-Healing Your Own Suite

If you want to experiment, here's my recommendation:

  1. Start small — pick your flakiest 20 tests and migrate those first
  2. Use local LLMsOllama gives you privacy and zero API costs
  3. Keep fallbacks — if the AI can't find an element after 3 retries, fall back to a traditional selector
  4. Measure everything — track false failure rates before and after

I've open-sourced my Playwright Agent + MCP + Ollama setup if you want to see the full implementation.

The Bottom Line

Flaky selectors are a solved problem in 2026 — we just need to stop solving it the 2015 way. AI self-healing tests aren't science fiction. They're running in my CI pipeline right now, catching real bugs instead of crying wolf about renamed CSS classes.

The future of QA automation isn't writing better selectors. It's not writing selectors at all.

Fight On! ✌️

Share this post

You Might Also Like

Stay in the Loop

Get weekly insights on AI-driven QA, engineering leadership, and automation strategies.

No spam, ever. Unsubscribe anytime.