QA Engineering6 min read

I Replaced My Entire Playwright Test Maintenance Workflow With AI — And Saved 8 Hours a Week

S

Suneet Malhotra

Mar 24, 2026

1 views
I Replaced My Entire Playwright Test Maintenance Workflow With AI — And Saved 8 Hours a Week - QA Engineering blog post
🔧Playwright📘TypeScript🔧AI Testing🔧Self-Healing Tests🔧Test Automation🔧QA Engineering

I Replaced My Entire Playwright Test Maintenance Workflow With AI — And Saved 8 Hours a Week

Let me be honest about something: for years, Tuesdays were the worst day of my week. Not because of stand-ups, not because of sprint demos. Because of Playwright test failures.

Every sprint cycle, the UI team shipped changes. New class names, restructured components, updated aria labels. And every Monday night, our end-to-end suite would quietly break across a dozen tests. By Tuesday morning, my calendar was a graveyard of maintenance tickets. It was reactive, soul-crushing work that did nothing to actually improve quality — it just kept the suite from decaying.

Six months ago, I decided to fix it permanently. Here is the full story of how I rebuilt our test maintenance workflow around AI — and what I would do differently if I were starting from scratch today.

The Problem With Traditional Playwright Maintenance

Playwright is genuinely excellent. The auto-waiting, the locator API, the trace viewer — it is the best end-to-end testing tool I have used in twenty years of QA. But no framework protects you from the fundamental reality of web development: UIs change constantly, and tests written against a snapshot of the DOM become liabilities the moment a designer refactors a component.

The classic failure modes are familiar to every automation engineer:

  • Brittle selectors — tests written against data-testid attributes that the frontend team removed in a refactor
  • Timing assumptions — waits baked in for a loading state that no longer exists
  • Assertion drift — expected text that changed with copy updates or localization

The standard fix is discipline: always use semantic locators, always use getByRole, write tests that survive refactors. Great advice. Also completely insufficient when you are inheriting a 600-test suite built over four years by six different engineers.

Step 1: AI-Powered Failure Triage

The first thing I changed was how we diagnose failures. Instead of an engineer manually reading Playwright error output and tracing through the DOM, I built a pipeline that feeds failure data directly into an LLM.

When a test fails in CI, a script captures three things: the Playwright error message, the aria snapshot of the page at the point of failure, and the current source of the test file. It pipes all of that to Claude via the API with a single prompt:

"This Playwright test failed. Here is the error, the current page state, and the test source. Identify the root cause and suggest a corrected selector or assertion."

The output is a structured JSON object with a diagnosis and a proposed fix. Nine times out of ten, it is correct. We review it, accept or reject the change, and the test is updated. What used to take fifteen minutes of manual tracing now takes ninety seconds.

Step 2: Self-Healing Locator Generation

Diagnosis is one thing. Preventing the failure from recurring is another.

The second piece of the system is proactive locator analysis. Every time a PR merges to main, a lightweight GitHub Action runs our Playwright locator inventory against the new build. For any locator that has changed or is now ambiguous, it flags the affected tests and automatically proposes updated selectors using the Playwright accessibility tree.

This is the self-healing pattern in practice. The system does not blindly update tests — that would be dangerous. Instead it opens a draft PR with proposed changes, clearly labeled with confidence scores. High-confidence fixes (exact aria-label match found, one candidate) get auto-merged after a brief delay. Low-confidence ones require human review.

Since deploying this, our Monday morning failure count dropped from an average of fourteen broken tests per sprint to under two. The two that still break are almost always genuine regressions — actual bugs — rather than test maintenance issues. That is exactly what you want.

Step 3: AI-Generated Test Cases From User Stories

The third layer is the one that surprised me most with its impact: using AI to write new tests directly from Jira tickets.

Our workflow now looks like this. When a user story is marked "QA Ready," a Jira automation triggers a webhook that sends the story description and acceptance criteria to an agent. The agent reads the criteria, looks up existing tests for the affected component, and generates a Playwright test file covering the happy path and two to three edge cases.

An engineer reviews the generated tests before they merge. In practice, the review takes five to ten minutes rather than forty-five. We are writing roughly 30% more test coverage per sprint with the same headcount.

What I Would Do Differently

If I were rebuilding this from scratch, I would start with the failure triage layer. It delivers value immediately, requires almost no infrastructure, and builds team trust in AI-assisted testing before you introduce more autonomous components.

I would also invest earlier in good prompt design. The quality of AI-generated fixes is directly proportional to how precisely you describe the context. Sending raw error messages gets mediocre results. Sending structured context — error, DOM state, test intent — gets excellent ones.

Finally: do not skip the human review gate. Self-healing tests that update themselves without any oversight will eventually introduce false positives into your suite. The goal is to make the right thing easy for engineers, not to remove engineers from the loop entirely.

The Bottom Line

Test maintenance is not going away. UIs will keep changing, and suites will keep drifting. But the amount of human time that work requires is now genuinely negotiable — and in my experience, the answer is about eight hours less per week than it used to be.

If you are running Playwright and spending meaningful engineering time on maintenance, I would strongly encourage you to start with failure triage automation. It is the highest-leverage change I have made to my QA workflow in years, and it took less than a day to build the first version.

Fight On — and may your locators always find their targets. ✌️

Suneet Malhotra is a Sr. Manager of Test Engineering at Motorola Solutions and an AI-driven QA automation specialist with 20+ years of experience. Follow him for weekly insights on QA engineering, AI testing, and building better software.

Share this post

You Might Also Like

Stay in the Loop

Get weekly insights on AI-driven QA, engineering leadership, and automation strategies.

No spam, ever. Unsubscribe anytime.