Acceptance Test-Driven Development — Find the Bug¶

Category: Craftsmanship Disciplines — drive development from executable acceptance criteria agreed with the business.

12 broken acceptance tests, scenarios, and step definitions. Spot the bug, then expand the fix and the lesson. These are the failures that turn an acceptance suite from an asset into a liability.

Table of Contents¶

Bug 1: Imperative, UI-Coupled Scenario
Bug 2: Step That Asserts Nothing
Bug 3: Scenario Tests Implementation, Not Behavior
Bug 4: Multiple Whens in One Scenario
Bug 5: Assertion in the Given
Bug 6: Order-Dependent Scenarios
Bug 7: sleep() Instead of Wait-for-Condition
Bug 8: Step Hits a Stub, Not the Real Service
Bug 9: Acceptance Test Written After the Code
Bug 10: Missing Three-Amigos Clarity (Ambiguous Scenario)
Bug 11: Everything Tested at the UI Layer
Bug 12: Time/Random Nondeterminism
Practice Tips

Bug 1: Imperative, UI-Coupled Scenario¶

Scenario: Login
  Given I open "/login"
  When I type "ada@example.com" into "#email"
  And I type "secret" into "#password"
  And I click "#login-btn"
  Then "#welcome-banner" should contain "Hello, Ada"

Symptoms: Breaks on every CSS/layout change. The banner id is renamed in a redesign and 40 scenarios go red even though login works perfectly.

Find the bug

The scenario describes *clicks and selectors*, not *behavior*. It couples the spec to incidental DOM structure, so cosmetic changes break it and a non-developer can't read it.

Fix¶

Scenario: Registered user logs in
  Given a registered user "ada@example.com"
  When she logs in with the correct password
  Then she sees her dashboard

Move all the typing/clicking into step definitions (and ideally drive the service layer).

Lesson¶

Scenarios state what the user achieves, not how they click. Mechanism belongs in step definitions; the .feature file stays declarative and business-readable.

Bug 2: Step That Asserts Nothing¶

@then('the order is confirmed')
def step_order_confirmed(context):
    pass    # BUG: placeholder never filled in — can never fail

Symptoms: The scenario is green from the day it was written. The feature is subtly broken in production; the test never noticed.

Find the bug

The `Then` step contains no assertion. A scenario whose `Then` does nothing always passes — it's a lie that reports "verified."

Fix¶

@then('the order is confirmed')
def step_order_confirmed(context):
    assert context.order.status == "CONFIRMED", \
        f"expected CONFIRMED, got {context.order.status}"

Lesson¶

Every Then must assert an observable outcome. A Then with no assertion is a guard that doesn't guard. Add a CI check that Then steps assert, and always watch a new scenario fail first.

Bug 3: Scenario Tests Implementation, Not Behavior¶

Scenario: Place an order
  Given a cart with one item
  When I check out
  Then a row is inserted into the "orders" table with status = 2
  And a record exists in "order_events" with type = "CREATED"

Symptoms: A storage refactor (renaming the column, changing the status enum to a string) breaks the scenario though the feature behaves identically. The business can't read it.

Find the bug

The `Then` asserts on the database schema and a magic number (`status = 2`) — implementation details. It couples the spec to storage and is unreadable as a requirement.

Fix¶

Scenario: Place an order
  Given a cart with one item
  When I check out
  Then the order is confirmed

The step definition may check the DB internally, but assert via the behavior:

@then('the order is confirmed')
def step(context):
    assert context.order_service.status_of(context.order_id) == OrderStatus.CONFIRMED

Lesson¶

Test through behavior, not internals. Couple to the stable public contract (a service method), never to schema columns or magic numbers. This is "testing through the wrong layer."

Bug 4: Multiple Whens in One Scenario¶

Scenario: Account lifecycle
  Given a new user
  When they register
  When they verify their email
  When they place an order
  When they request a refund
  Then they have a refunded order

Symptoms: When it fails, you can't tell which action broke — register? verify? order? refund? Debugging is a bisect.

Find the bug

Four `When`s mean four behaviors crammed into one scenario. A failure gives no signal about which behavior is broken, and the scenario is testing a whole lifecycle instead of one rule.

Fix¶

Split into focused scenarios, each with one When:

Scenario: A verified user can place an order
  Given a verified user
  When they place an order
  Then the order is created

Scenario: A user can refund a recent order
  Given a verified user with a recent order
  When they request a refund
  Then the order is refunded

Lesson¶

One When, one behavior per scenario. Set up prior state in Given (not by replaying earlier actions). A focused scenario fails with a clear message about which behavior broke.

Bug 5: Assertion in the Given¶

@given('a user with a verified email')
def step(context):
    context.user = db.find_user("ada")
    assert context.user.verified    # BUG: assertion in setup

Symptoms: When the user isn't verified, the test fails in the Given with a confusing message that looks like the feature failed, when really the setup is wrong.

Find the bug

The `Given` asserts instead of establishing state. Preconditions should *create* the world the scenario needs, not check a pre-existing one. A failing assertion in setup masquerades as a behavioral failure.

Fix¶

@given('a user with a verified email')
def step(context):
    context.user = make_user("ada", verified=True)   # establish the precondition

Lesson¶

Given sets up, Then checks. Build the required state in the Given rather than asserting it exists — that keeps failures attributable to behavior, not flaky test fixtures.

Bug 6: Order-Dependent Scenarios¶

# scenario A creates the user...
@given('the standard test user')
def step(context):
    db.insert_user("ada")          # BUG: leaks into the shared DB

# scenario B (later) assumes it exists
@given('the existing user "ada"')
def step(context):
    context.user = db.find_user("ada")   # only works if A ran first

Symptoms: Passes when run in file order; fails intermittently once CI runs scenarios in parallel or shuffles them. Classic "works on my machine."

Find the bug

Scenario B depends on data scenario A created in a shared database. The scenarios aren't independent, so any change to order (parallelism, sharding) breaks B.

Fix¶

Each scenario owns its world, with per-scenario isolation:

def before_scenario(context, scenario):
    context.tx = db.begin()
def after_scenario(context, scenario):
    context.tx.rollback()          # nothing leaks between scenarios

@given('the existing user "ada"')
def step(context):
    context.user = make_user("ada")   # B creates its own data

Lesson¶

Scenarios must be independent: own setup, own teardown (transactional rollback or fresh data), no reliance on order. Independence is the prerequisite for parallel CI and the cure for the most common flakiness.

Bug 7: sleep() Instead of Wait-for-Condition¶

@when('she submits the form')
def step(context):
    context.browser.click("#submit")
    time.sleep(2)                   # BUG: fixed sleep
    context.result = context.browser.text("#status")

Symptoms: Flaky. On a slow CI runner the response takes 2.5s and the test reads the old status (fail); on a fast run it wastes 2s every time.

Find the bug

A fixed `sleep` races the system: too short → flaky failure, too long → slow suite. It guesses at timing instead of waiting for the actual condition.

Fix¶

@when('she submits the form')
def step(context):
    context.browser.click("#submit")
    wait_until(lambda: context.browser.text("#status") != "",  timeout=10)
    context.result = context.browser.text("#status")

Lesson¶

Never sleep; wait for a condition (element present, status changed, network idle) with a timeout. Fixed sleeps are the #1 source of UI-test flakiness — and another reason to prefer the synchronous service layer.

Bug 8: Step Hits a Stub, Not the Real Service¶

@when('the discount is applied')
def step(context):
    context.total = 90.0           # BUG: hard-codes the answer instead of calling the system

Symptoms: The scenario is permanently green regardless of whether the discount logic works. A bug in Checkout ships undetected.

Find the bug

The step fakes the result instead of exercising the real system. The "acceptance test" tests nothing about the actual code — it asserts a constant against a constant.

Fix¶

@when('the discount is applied')
def step(context):
    context.total = Checkout().total_for(context.cart, context.member)  # real service

@then('the total is ${expected:f}')
def step(context, expected):
    assert context.total == expected

Lesson¶

Acceptance steps must drive the real collaborators (the service layer, real domain logic). Stubbing the system-under-test out of its own test makes the green meaningless. Stub only external dependencies (email, payment), never the code you're verifying.

Bug 9: Acceptance Test Written After the Code¶

Workflow observed in the PR:
  1. Developer builds the feature.
  2. Developer writes a Gherkin scenario that matches what they built.
  3. Scenario passes on first run. ✅

Symptoms: The scenario codifies whatever the code happens to do — including the misunderstanding. It passes immediately, so it never caught "built the wrong thing," and nobody saw it red.

Find the bug

The acceptance test was written *after* the code, so it can only confirm the existing behavior, not challenge whether that behavior is what the business wanted. It's a regression test masquerading as a design tool, and it was never seen failing.

Fix¶

Reverse the order: Three Amigos / Example Mapping → write the scenario → see it red → build outside-in until green. The conversation and the red-first step are where ATDD's value lives.

Lesson¶

ATDD is test-first at the feature level. A scenario written after the code (and never seen red) proves regression-safety at best and, at worst, locks in the wrong behavior. Order matters.

Bug 10: Missing Three-Amigos Clarity (Ambiguous Scenario)¶

Scenario: Apply the right discount
  Given a customer with an order
  When they check out
  Then they get an appropriate discount

Symptoms: "Appropriate" means different things to product, dev, and QA. The scenario can't be automated (what's the expected number?) and three people will implement three rules.

Find the bug

The scenario is abstract prose, not a concrete example. "An order," "appropriate discount" — nothing is pinned down. This is exactly the ambiguity a Three Amigos conversation exists to remove, skipped.

Fix¶

Run Example Mapping to extract concrete examples, then a Scenario Outline:

Scenario Outline: Tier discount
  Given a <tier> customer with a $<subtotal> order
  When they check out
  Then they are charged $<total>

  Examples:
    | tier   | subtotal | total |
    | bronze | 100      | 100   |
    | silver | 100      | 95    |
    | gold   | 100      | 90    |
    | gold   | 100.00   | 90.00 |

Lesson¶

Specify with concrete examples, not adjectives. If a scenario can't be turned into a number, the requirement isn't understood yet — that's a red card for the Three Amigos, caught before coding.

Bug 11: Everything Tested at the UI Layer¶

Suite composition:
  • 240 Selenium scenarios (every business rule, every edge case)
  • 8 unit tests
  • CI: 38 minutes, ~12% flaky → developers merge through red

Symptoms: Slow build, chronic flakiness, vague failures, fear of refactoring. A real regression ships behind the flaky noise.

Find the bug

This is the **ice-cream-cone** anti-pattern: an inverted pyramid with business rules and edge cases verified through the slow, brittle UI instead of fast service-layer and unit tests.

Fix¶

Rebalance toward the pyramid:

  • Move business RULES from UI → service-layer scenarios (fast, stable)
  • Keep ~15 UI SMOKE tests: prove screens are wired to services
  • Build out UNIT tests for edge cases / pure logic
  → target build < 5 min, flakiness < 1%

Lesson¶

Push every test as far down the pyramid as it can go. UI tests verify wiring, not business rules. The cone is the most common large-scale suite pathology and the most valuable thing to fix.

Bug 12: Time/Random Nondeterminism¶

@then('the refund is a full cash refund')
def step(context):
    days = (datetime.now() - context.order.delivered_at).days   # BUG: real clock
    assert days <= 30
    assert context.refund.type == "CASH"

Symptoms: A scenario set up with "delivered 30 days ago" passes today and fails tomorrow as real time advances past the boundary. Intermittent failures with no code change.

Find the bug

The step reads the real wall-clock (`datetime.now()`), so the result depends on *when* the test runs. Time-dependent and random behavior must be controlled, or scenarios near boundaries flake as the world moves.

Fix¶

Inject a fixed clock (and seed any RNG):

def before_scenario(context, scenario):
    context.clock = FixedClock("2026-06-10")        # deterministic "now"
    context.refunds = RefundService(clock=context.clock)

@given('an order delivered {n:d} days ago')
def step(context, n):
    context.order = make_order(delivered_at=context.clock.now() - timedelta(days=n))

Lesson¶

Control time and randomness — inject a clock, seed RNG, assert on sets not order. Nondeterminism near boundaries is a classic flakiness source; a fixed clock makes "30 days ago" mean the same thing on every run.

Practice Tips¶

Read every scenario as a non-developer — if it has ids/clicks/SQL, it's coupled to the wrong layer.
Check every Then asserts an observable outcome — hunt for pass and "didn't throw" steps.
Count the Whens — more than one means split the scenario.
Verify Given sets up, doesn't assert.
Look for shared/leftover data — can scenarios run in parallel and any order?
Grep for sleep(, datetime.now(), random( — nondeterminism and flakiness.
Confirm steps drive the real system, not stubbed constants — and that the scenario was seen red first.
Check the pyramid ratio — is business logic being verified through the UI?

← Tasks · Craftsmanship Disciplines · Roadmap · Next: Optimize