Recently, we released the Cypress Real World App - a modern web application with a full set of E2E and API tests showing the recommended best practices for writing tests. From the start, we had Linux continuous integration running tests on every commit. After the release, we added a Windows continuous integration workflow in #421- and it started failing in a few tests. In this blog post, I will show how we debugged the failure and updated our tests to be non-flaky.
The tests were passing locally, yet failing on CI. Since Cypress records a video of the test run and automatically takes the screenshot on failures, we always store those test artifacts on CI itself and upload them to our Dashboard. The Dashboard makes debugging the tests faster, since it presents the run, tests, and error details in a single place.
The Dashboard shows the relevant error information: the error message, the test name, the test source code and all executed hooks, the screenshot, and the video. In this error, the failed test statement was:
cy.getBySelLike("notification-list-item")
.first()
.should("contain", "Edgar")
We can look at the high resolution screenshot to see that the name "Edgar" is present in the shown list of notifications.
Where did the test go wrong? Why is the test looking for "Edgar" in the first notification?
To debug a flaky test further, we recommend comparing the command log from this failed test with the command log from a passing test. We can either run the test locally, or find a passing test in the Dashboard. For example, when running this test locally, we can pause the command right before getting the first notification element.
cy.getBySelLike("notification-list-item")
.pause()
.first()
.should("contain", ctx.userA?.firstName)
When the test pauses, we already see a slight difference from the CI screenshot
Notice that the previous command cy.getBySelLike("notification-list-item")
returned 9 elements locally, but only 8 elements on CI. Since the new notifications are at the top of the list, this means the "Edgar" notification is missing on CI for some reason.
First, let's make the above assertion explicit. We need to make sure there are 9 notifications before getting the first one to make the test more robust.
cy.getBySelLike("notification-list-item")
.should('have.length', 9)
.pause()
.first()
.should("contain", ctx.userA?.firstName)
The test still fails on CI, but at least we know a little bit more about the error: it is NOT an error in the order of notifications, but a missing notification. Let's backtrack the test to see where this notification is created.
The test has the command cy.getBySelLike("like-button").click();
which is shown in the successful Command Log at positions 11-12
We can let the test finish and then hover over the "get" and "click" commands to see what they did.
We can see the XHR call POST /likes/...
right after the click but before the "LOGOUT by XSTATE" command. The web application makes an XHR call to the server to create the notification. Let's look at the screenshot from the failed CI run to compare.
Notice the "Click" command (partially obstructed by the sticky header). The test does click on the "Thumbs up" button. But look what Cypress shows immediately after it. The POST /likes/...
calls fails with status 401 - Unauthorized. This happens because our test code has a race condition.
cy.getBySelLike("like-button").click();
cy.switchUser(ctx.userB.username);
The test clicks the "Like" button and immediately removes the user token or cookie. In some situations (like running locally) the application gets the POST /likes/...
XHR call out with the right credentials, but on some CI runs, the test is faster and the call goes out without a valid user token.
Our test has to wait for the call to finish before logging out the current user. We can wait for the XHR, or for a DOM change. In this case when the user clicks the button the Like button becomes disabled, and the count is increments - these things happen after the POST call completes. So let's change our test to wait for the action to finish before logging out
const likesCountSelector = "[data-test*=transaction-like-count]";
cy.contains(likesCountSelector, 0);
cy.getBySelLike("like-button").click();
// a successful "like" should disable the button
// and increment the number of likes
cy.getBySelLike("like-button").should("be.disabled");
cy.contains(likesCountSelector, 1);
// now the test can safely log out
cy.switchUser(ctx.userB.username);
We went through all our specs to ensure the same error does not happen in other places. Now the tests pass on every CI without a hitch.
Related blog posts: When Can The Test Start?, When Can The Test Stop?, When Can The Test Click?