I Rewrote Our Entire Playwright Test Suite With AI in One Week — Here's What Actually Happened
Suneet Malhotra
Mar 17, 2026
I Rewrote Our Entire Playwright Test Suite With AI in One Week — Here's What Actually Happened
I have been building test automation frameworks for over 20 years. I have seen every silver bullet from record-and-playback tools to codeless testing platforms. Most of them overpromise and underdeliver when they hit real-world complexity. So when I decided to let an AI agent loose on our Playwright test suite for a full week, I went in with my skeptic hat firmly on.
Seven days later, flakiness dropped 60%. Coverage increased by 35%. Three hours of manual test maintenance per sprint disappeared entirely.
But I also deleted more AI-generated code than I kept, caught two subtle logical errors that would have caused false positives, and had to re-explain our domain model four times. It was the most productive and most humbling week of my QA career.
Here is what I actually learned.
The Setup
Our Playwright suite had grown organically over 18 months — around 340 tests covering a SaaS product with a complex multi-tenant checkout and user management flow. Classic QA debt: good coverage on happy paths, brittle locators scattered throughout, and a few hundred lines of duplicated page object code that nobody wanted to touch.
I gave the AI agent access to the repository, described our testing philosophy, and set three goals for the week: reduce flakiness, improve page object consistency, and add coverage for the five user flows we knew were under-tested.
Day 1-2: The Locator Audit
The first task was identifying every brittle locator — anything relying on CSS class names that could change, numeric XPaths, or hardcoded timeouts. The agent produced a report in about 12 minutes that would have taken me half a day.
It flagged 87 locators as high-risk and proposed replacements using Playwright's recommended data-testid pattern and accessible role-based selectors. About 70% of the replacements were drop-in correct. The other 30% required context the AI did not have — things like components that render inside iframes or elements whose accessible names are set dynamically at runtime.
This taught me the first rule of AI-assisted QA refactoring: the AI is excellent at pattern recognition and terrible at implicit domain knowledge. The moment a locator depends on business logic that lives in someone's head rather than in the codebase, the AI will guess wrong.
Day 3: Page Object Consolidation
We had three different page objects for variations of the same login flow — one from the original engineer, one from a contractor, one from me during a late-night hotfix sprint. The AI merged them into a single clean LoginPage class, identified all the places each was imported, and updated the references.
It also did something I did not ask for: it noticed that our AuthHelper utility was being used inconsistently across tests (some tests calling it directly, others relying on beforeEach hooks) and flagged this as a maintenance risk. That kind of unprompted architectural observation is where AI starts feeling genuinely useful rather than just fast.
Day 4-5: The New Test Cases
This is where things got interesting and where I had to be most careful.
I asked the agent to write tests for five under-covered flows: multi-user permission escalation, concurrent session handling, payment retry logic, API rate-limit error states, and multi-locale date formatting. It produced first drafts for all five within an hour.
Three of them were excellent — clean structure, proper assertions, realistic data setup. Two of them had a subtle problem: they tested the UI behavior without actually verifying the underlying state. For example, the payment retry test confirmed that the success message appeared, but never asserted that the database record was actually updated. The test would pass even if the retry logic silently failed.
This is the second rule: AI writes tests that look right. You still have to verify they test the right thing. A green test that does not actually protect your system is worse than no test at all.
Day 6: The Self-Healing Layer
The most forward-looking thing we added was a lightweight self-healing hook. When a locator fails to find an element, instead of immediately throwing, the test logs the failure context and attempts a fallback using an AI-assisted locator suggestion based on the page's current DOM structure.
This is not magic — it does not fix broken tests automatically. But it dramatically reduces the noise of flaky tests caused by minor UI changes, and it surfaces actionable information when something genuinely breaks rather than a cryptic "element not found" error.
Setting this up took about three hours and has already saved multiple false-alert pages since deployment.
What I Would Do Differently
Do not give the AI write access to your test files on day one. Use it in read-only advisory mode first — let it audit, suggest, and explain — before you start merging generated code. The review cost of bad AI output is higher than the time savings if you are not careful.
Also, write a clear brief before each task. Vague instructions produce vague code. "Improve our login tests" is useless. "Refactor the LoginPage class to consolidate three separate implementations into one, preserving all existing test behavior" is what actually works.
The Bottom Line
AI-assisted QA refactoring is real and it is powerful — but it is a force multiplier for a skilled QA engineer, not a replacement for one. The engineer's job shifts from writing boilerplate to making judgment calls: deciding what to trust, what to verify, and what the AI fundamentally cannot know about your system.
If you have a Playwright suite with accumulated debt, this approach is worth a week of your time. Go in with clear goals, review everything critically, and be ready to delete code you did not write. The results will surprise you.
Suneet Malhotra is a Sr. Manager of Test Engineering with 20+ years in QA automation and AI-driven testing. Read more at suneetmalhotra.com.
Share this post
You Might Also Like
I Replaced Half My QA Workflow with Playwright AI Agents — Here's What Actually Happened
After six months running AI-assisted testing with Playwright's MCP integration and self-healing tests in production, I have thoughts. Spoiler: it's not the apocalypse QA engineers feared.
QA EngineeringI Replaced My Entire Playwright Test Maintenance Workflow With AI — And Saved 8 Hours a Week
Test maintenance used to eat my Tuesdays alive. Flaky selectors, broken locators, UI drift after every sprint. Here's how I rebuilt the whole workflow around AI and got my time back.
AI & AutomationThe Number My Model Is Not Allowed to Know
There is a rule I enforce across every agent I run, and it has nothing to do with how good the model is. The model writes the words. It never computes the numbers.
Quantitative TradingWhat a Fifteen-Minute Bar Forgets
Every indicator my engine trusts is computed on fifteen-minute bars. A bar is a summary of those minutes, and the summary throws away the one thing that moved the price: the path.
Latest Blog Posts
The Number My Model Is Not Allowed to Know
There is a rule I enforce across every agent I run, and it has nothing to do with how good the model is. The model writes the words. It never computes the numbers.
What a Fifteen-Minute Bar Forgets
Every indicator my engine trusts is computed on fifteen-minute bars. A bar is a summary of those minutes, and the summary throws away the one thing that moved the price: the path.
The Check That Passes Until the Day It Does Not
Every day my engine reconciles its own record of open positions against the broker's. Almost every day the two lists match. I do not run the check for those days.
Related Tools & Demos
Automated Trading System
Multi-engine trading platform with real-time risk management, regime-based strategy selection, and automated order execution.
View Source Code →Personal Health Analytics
Multi-modal health data platform integrating wearables, lab results, and lifestyle tracking with predictive habit modeling.
View Source Code →AI Content Engine
Automated content pipeline with multi-platform distribution, engagement optimization, and editorial quality gates.
View Source Code →
Stay in the Loop
Get weekly insights on AI-driven QA, engineering leadership, and automation strategies.
No spam, ever. Unsubscribe anytime.