I Replaced 500 Flaky Selectors With AI Self-Healing Tests...

The Night I Almost Quit Automation

It was 11 PM on a Thursday, and I was staring at 47 failed tests in our CI pipeline. Not because anything was actually broken — a frontend dev had renamed a few CSS classes during a refactor. Sound familiar?

After 20 years in QA engineering, I've watched this cycle repeat endlessly: write precise selectors, UI changes, selectors break, spend hours fixing them, repeat. In 2026, with AI agents building entire workflows autonomously, why are we still hardcoding data-testid="submit-btn-v3" and praying nobody touches it?

I decided to fix this once and for all. Here's exactly how I did it — and the surprising results.

The Problem: Brittle Selectors Are a Tax on Velocity

Every QA team knows the pain. You maintain hundreds (sometimes thousands) of CSS selectors, XPaths, and test IDs across your automation suite. One innocent UI refactor, and suddenly your pipeline is red.

At my current role at Motorola Solutions, I tracked the data: roughly 30% of our test failures weren't real bugs — they were broken selectors. That's not a testing problem. That's a maintenance tax that slows down every sprint.

The Solution: AI-Powered Semantic Element Discovery

Instead of hardcoding selectors, I built a system where an AI agent describes what it's looking for in natural language, and a local LLM figures out how to find it on the current page.

Here's the stack:

Playwright for browser automation
Model Context Protocol (MCP) as the communication layer
Ollama running a local LLM (gpt-oss:20b) for reasoning
TypeScript gluing it all together

The flow works like this:

A test says: "Find the primary submit button in the checkout form"
The agent takes a page snapshot (accessibility tree + DOM)
The LLM analyzes the snapshot and generates a locator strategy
If the locator fails, the agent automatically retries — taking a fresh snapshot and re-analyzing

No hardcoded selectors. No brittle XPaths. The AI adapts to whatever the UI looks like right now.

The Migration: 500 Selectors in 2 Weeks

I started with our most flaky test suite — the e-commerce checkout flow. Here's what the migration looked like:

Before (brittle):

await page.click('[data-testid="checkout-submit-btn"]');
await page.fill('#shipping-address-line-1', address);

After (self-healing):

await agent.action("Click the primary submit button on the checkout page");
await agent.action("Fill the first shipping address line with: " + address);

Over two weeks, I migrated 500+ selectors across 120 test files. The LLM handles the element discovery, and if the UI changes, it simply re-discovers the element on the next run.

The Results: 70% Fewer False Failures

After running the self-healing suite alongside our traditional tests for 30 days:

False failures dropped 70% — from ~15 per week to ~4
Maintenance hours fell by 60% — engineers stopped playing "fix the selector" every sprint
Test authoring got faster — writing natural language intent is quicker than crafting precise selectors
Slight speed tradeoff — AI-powered runs were ~20% slower due to LLM inference, but running Ollama locally kept latency manageable

The biggest win? Developer confidence in the test suite went up. When tests fail now, people actually investigate because they trust it's a real bug, not a stale selector.

What Google's Opal Announcement Means for QA

Last week, Google announced an AI agent for building automated workflows in Opal, powered by Gemini 3 Flash. It lets non-technical users build workflows with text prompts. This is the same direction QA is heading: natural language as the interface for automation.

The convergence is clear. In 2026, the best QA engineers aren't the ones writing the most precise XPaths — they're the ones designing systems where AI handles the fragile parts and humans focus on test strategy.

How to Start Self-Healing Your Own Suite

If you want to experiment, here's my recommendation:

Start small — pick your flakiest 20 tests and migrate those first
Use local LLMs — Ollama gives you privacy and zero API costs
Keep fallbacks — if the AI can't find an element after 3 retries, fall back to a traditional selector
Measure everything — track false failure rates before and after

I've open-sourced my Playwright Agent + MCP + Ollama setup if you want to see the full implementation.

The Bottom Line

Flaky selectors are a solved problem in 2026 — we just need to stop solving it the 2015 way. AI self-healing tests aren't science fiction. They're running in my CI pipeline right now, catching real bugs instead of crying wolf about renamed CSS classes.

The future of QA automation isn't writing better selectors. It's not writing selectors at all.

Fight On! ✌️

I Replaced 500 Flaky Selectors With AI Self-Healing Tests — Here's What Happened

The Night I Almost Quit Automation

The Problem: Brittle Selectors Are a Tax on Velocity

The Solution: AI-Powered Semantic Element Discovery

The Migration: 500 Selectors in 2 Weeks

The Results: 70% Fewer False Failures

What Google's Opal Announcement Means for QA

How to Start Self-Healing Your Own Suite

The Bottom Line

You Might Also Like

I Replaced Half My QA Workflow with Playwright AI Agents — Here's What Actually Happened

I Replaced My Entire Playwright Test Maintenance Workflow With AI — And Saved 8 Hours a Week

The Ninety Minutes My Engine Sits Out

The Numbers I Used to Ask You to Trust

Latest Blog Posts

The Ninety Minutes My Engine Sits Out

The Numbers I Used to Ask You to Trust

Five Up, Three Down, Even Money

Related Tools & Demos

Multi-Model LLM Harness

Automated Trading System

Personal Health Analytics

Stay in the Loop