← Back to blog
Test automationApril 28, 20265 min

What automated accessibility tests actually cover — and what they miss

Automated accessibility tools catch structural violations well. What they miss is everything that depends on context, dynamic state, and actual user interaction — and that gap is larger than most teams expect.

There is a pattern that appears in projects with solid CI coverage: all Playwright tests pass, axe finds no violations, Lighthouse gives an accessibility score above ninety. Then a real user with a screen reader opens the page and cannot complete a basic task.

This is not a failure of the tools. It is a misunderstanding of what they were built to do.

What automated scanning is reliably good at

Axe-core and Lighthouse catch structural violations well. Missing alt attributes, forms without labels, invalid ARIA roles, colour contrast failures on static text — these are detectable by reading the DOM and comparing it against known rules. Worth running on every CI build, and worth taking seriously when they fire.

The real value is regression prevention. An aria-label quietly removed during a refactor, a heading order broken by a component update, a landmark that disappears after a layout change — automated scanning catches these silent regressions before they reach production. That is not a small thing.

But research consistently puts the detection ceiling somewhere between thirty and fifty-seven percent of real WCAG violations. Deque, the company that builds axe, estimates their tool catches around 57% on real-world audited pages. W3C places only 20–30% of WCAG success criteria as fully automatable. This is not pessimism — it is a precise description of what DOM-level scanning can evaluate and what it structurally cannot.

The gaps that matter most in practice

The gaps are not random. They cluster around three areas: context-dependent meaning, dynamic state, and focus management.

Context and meaning. A link that reads "Read more" passes every structural check. A screen reader user navigating by links hears "Read more" with no surrounding context. Axe validates presence and format — not whether the text is meaningful when read in isolation. The same applies to alt text. alt="image" is detectable as generic. alt="person smiling" passes without question, even if it describes nothing useful about what the image contributes to the page.

Dynamic state. A theme toggle labeled "Switch to dark mode" passes checks. But if the page is already in dark mode, the label is factually wrong. Automated scanning evaluates a snapshot of the DOM at a single moment. It cannot verify that ARIA labels stay accurate as state changes.

Focus management. Modals are a classic failure point. Static DOM scanning confirms that role="dialog" is present. It cannot test whether focus moves into the modal when it opens, whether Tab is trapped inside, or whether focus returns to the trigger when the modal closes. From a structural perspective, everything looks fine. From a keyboard user's perspective, the modal may be completely unreachable.

What Playwright can test that axe can't

The boundary between axe and Playwright is not just about axe's limits — it is about what Playwright enables that axe can never provide: interaction. Running axe after triggering specific states closes a significant part of the gap.

Hover and focus state contrast:

await page.getByRole('button', {name: 'Subscribe'}).hover()
const results = await new AxeBuilder({page}).analyze()
expect(results.violations).toEqual([])

Dark mode scanning, which is a completely separate colour context from the default scan:

await page.emulateMedia({colorScheme: 'dark'})
await page.goto('/')
const results = await new AxeBuilder({page}).analyze()
expect(results.violations).toEqual([])

Dynamic ARIA labels that change with state:

const toggle = page.getByRole('button', {name: /switch to/i})
await expect(toggle).toHaveAttribute('aria-label', 'Switch to dark mode')
await toggle.click()
await expect(toggle).toHaveAttribute('aria-label', 'Switch to light mode')

Modal focus trapping and Escape key behaviour:

await page.getByRole('button', {name: 'Open dialog'}).click()
for (let i = 0; i < 10; i++) {
	await page.keyboard.press('Tab')
	const isInsideDialog = await page.evaluate(
		() => document.activeElement?.closest('[role="dialog"]') !== null
	)
	expect(isInsideDialog).toBe(true)
}
await page.keyboard.press('Escape')
await expect(page.getByRole('dialog')).not.toBeVisible()

Skip link — whether it appears as the first focusable element and whether it actually works:

await page.keyboard.press('Tab')
const focused = page.locator(':focus')
await expect(focused).toHaveAttribute('href', '#main-content')
await page.keyboard.press('Enter')
await expect(page).toHaveURL(/#main-content$/)

None of these are edge cases. Theme toggles, modals, skip links, and colour schemes are standard parts of most production applications. If tests do not cover these interaction states, the accessibility coverage has real gaps regardless of what the axe score shows.

What automation cannot replace

Some categories have no automated solution. Form error messages that say "Invalid input" without specifying what is wrong. Required fields indicated only by a red asterisk with no text label. Long forms where error announcements appear at the top but focus does not move there, leaving keyboard users to search.

None of these fail structural checks. All of them create genuine barriers for real users.

The practical approach is to treat automated scanning as a floor, not a ceiling. It provides regression coverage and evaluates every page on every CI run — two things that manual testing cannot do at that scale. But the screen reader walk-through, the keyboard-only navigation session, the evaluation of whether error messages are actually helpful — these remain necessary regardless of what the automated report shows.

Summary

  • Run automated scanning on every CI build: it reliably catches structural regressions
  • Extend beyond the default scan — trigger hover, focus, dark mode, and interactive states before running axe
  • Test dynamic ARIA labels, modal focus trapping, and skip link behaviour directly in Playwright
  • Context-dependent meaning, form usability, and screen reader experience require human judgement

A passing axe score is a starting point. Whether the product is actually accessible to real users is a different question, and one that requires more than DOM analysis to answer.

Is your site ready for AI?

Check whether AI tools read your content correctly — or fill in the gaps themselves. A quick test that shows what AI sees and what it ignores.

Test my site

More articles