QA Engineering5 min read

I Rewrote Our Entire Playwright Test Suite With AI in One Week — Here's What Actually Happened

S

Suneet Malhotra

Mar 17, 2026

1 views
I Rewrote Our Entire Playwright Test Suite With AI in One Week — Here's What Actually Happened - QA Engineering blog post
🔧Playwright🔧AI📘TypeScript🔧QA Automation🔧Test Engineering🔧Self-Healing Tests

I Rewrote Our Entire Playwright Test Suite With AI in One Week — Here's What Actually Happened

I have been building test automation frameworks for over 20 years. I have seen every silver bullet from record-and-playback tools to codeless testing platforms. Most of them overpromise and underdeliver when they hit real-world complexity. So when I decided to let an AI agent loose on our Playwright test suite for a full week, I went in with my skeptic hat firmly on.

Seven days later, flakiness dropped 60%. Coverage increased by 35%. Three hours of manual test maintenance per sprint disappeared entirely.

But I also deleted more AI-generated code than I kept, caught two subtle logical errors that would have caused false positives, and had to re-explain our domain model four times. It was the most productive and most humbling week of my QA career.

Here is what I actually learned.

The Setup

Our Playwright suite had grown organically over 18 months — around 340 tests covering a SaaS product with a complex multi-tenant checkout and user management flow. Classic QA debt: good coverage on happy paths, brittle locators scattered throughout, and a few hundred lines of duplicated page object code that nobody wanted to touch.

I gave the AI agent access to the repository, described our testing philosophy, and set three goals for the week: reduce flakiness, improve page object consistency, and add coverage for the five user flows we knew were under-tested.

Day 1-2: The Locator Audit

The first task was identifying every brittle locator — anything relying on CSS class names that could change, numeric XPaths, or hardcoded timeouts. The agent produced a report in about 12 minutes that would have taken me half a day.

It flagged 87 locators as high-risk and proposed replacements using Playwright's recommended data-testid pattern and accessible role-based selectors. About 70% of the replacements were drop-in correct. The other 30% required context the AI did not have — things like components that render inside iframes or elements whose accessible names are set dynamically at runtime.

This taught me the first rule of AI-assisted QA refactoring: the AI is excellent at pattern recognition and terrible at implicit domain knowledge. The moment a locator depends on business logic that lives in someone's head rather than in the codebase, the AI will guess wrong.

Day 3: Page Object Consolidation

We had three different page objects for variations of the same login flow — one from the original engineer, one from a contractor, one from me during a late-night hotfix sprint. The AI merged them into a single clean LoginPage class, identified all the places each was imported, and updated the references.

It also did something I did not ask for: it noticed that our AuthHelper utility was being used inconsistently across tests (some tests calling it directly, others relying on beforeEach hooks) and flagged this as a maintenance risk. That kind of unprompted architectural observation is where AI starts feeling genuinely useful rather than just fast.

Day 4-5: The New Test Cases

This is where things got interesting and where I had to be most careful.

I asked the agent to write tests for five under-covered flows: multi-user permission escalation, concurrent session handling, payment retry logic, API rate-limit error states, and multi-locale date formatting. It produced first drafts for all five within an hour.

Three of them were excellent — clean structure, proper assertions, realistic data setup. Two of them had a subtle problem: they tested the UI behavior without actually verifying the underlying state. For example, the payment retry test confirmed that the success message appeared, but never asserted that the database record was actually updated. The test would pass even if the retry logic silently failed.

This is the second rule: AI writes tests that look right. You still have to verify they test the right thing. A green test that does not actually protect your system is worse than no test at all.

Day 6: The Self-Healing Layer

The most forward-looking thing we added was a lightweight self-healing hook. When a locator fails to find an element, instead of immediately throwing, the test logs the failure context and attempts a fallback using an AI-assisted locator suggestion based on the page's current DOM structure.

This is not magic — it does not fix broken tests automatically. But it dramatically reduces the noise of flaky tests caused by minor UI changes, and it surfaces actionable information when something genuinely breaks rather than a cryptic "element not found" error.

Setting this up took about three hours and has already saved multiple false-alert pages since deployment.

What I Would Do Differently

Do not give the AI write access to your test files on day one. Use it in read-only advisory mode first — let it audit, suggest, and explain — before you start merging generated code. The review cost of bad AI output is higher than the time savings if you are not careful.

Also, write a clear brief before each task. Vague instructions produce vague code. "Improve our login tests" is useless. "Refactor the LoginPage class to consolidate three separate implementations into one, preserving all existing test behavior" is what actually works.

The Bottom Line

AI-assisted QA refactoring is real and it is powerful — but it is a force multiplier for a skilled QA engineer, not a replacement for one. The engineer's job shifts from writing boilerplate to making judgment calls: deciding what to trust, what to verify, and what the AI fundamentally cannot know about your system.

If you have a Playwright suite with accumulated debt, this approach is worth a week of your time. Go in with clear goals, review everything critically, and be ready to delete code you did not write. The results will surprise you.

Suneet Malhotra is a Sr. Manager of Test Engineering with 20+ years in QA automation and AI-driven testing. Read more at suneetmalhotra.com.

Share this post

You Might Also Like

Stay in the Loop

Get weekly insights on AI-driven QA, engineering leadership, and automation strategies.

No spam, ever. Unsubscribe anytime.