At Yext we strive to have comprehensive test coverage for all our customer facing interfaces to protect against regressions. The way we test has evolved greatly from when there was essentially one team that produced UIs at Yext to the current setup where we have more than 15 teams, all collaborating on a customer-facing interface containing a few hundred pages. There were many incremental changes that were made over time and below I will cover some of the things that have worked for us.

Introduction

In the spring of 2013, I was the lone Software Engineer in Test and built out a suite of acceptance tests using Selenium in Ruby, to add to our many exisiting unit tests. As we grew, we decided we wanted teams to write their own test suites. Since we develop primarily in Java, I converted the suites to Java using jUnit. We set them up to run on Hudson and set up a nightly build.

Responsibility shift + Accessibility

We have a microservices architecture, with more than 250 different processes today. As our team grew, we needed a better tool to manage every team’s deploy process. We decided on TeamCity to manage our build, test, and deployment pipeline. Every team that has a front-end component has an Acceptance Tests step that runs in the pipeline for each deploy, and each team is responsible for their team’s tests.

We have an Engineering wiki page on Selenium. This document has information on running tests, debugging failures, writing tests, a checklist for adding tests to TeamCity, and information on the underlying configuration.

Test Speed

As the test suites were rapidly expanding, it became important to ensure that the speed of running the tests remains fast. To address the run-time problem we employed a number of solutions.

Parallelization

Initially all our tests ran sequentially, and as we added more tests the total time it took became unacceptable. To speed things up we decided to run our tests in parallel instead. We used this extension of the JUnit runner, which executes all @Test methods concurrently. We added this to our Selenium driver controller class which manages things like initializing and quiting the driver, retry logic and window dimensions.

  • @RunWith(ParallelRunner.class)

This required all tests within a test class to be independent of each other. I changed the code to dynamically create a new account for each test. As a part of the effort to improve the speed of the tests, we broke up the longer, rambling, test scenarios covering many cases, into shorter, more focused test cases.

Removing unecessary waiting in tests

When writing our tests, we’ve often had to insert pauses to ensure that an action fully completes (for example a field goes from “edit” to “display” mode, after a save button is clicked) before the next thing happens to avoid browser interactions interfering with one another. In the process of translating the tests into Java, we also replaced almost all of our sleeps in favor of waiting fluently:

  • Uninterruptibles.sleepUninterruptibly(3, TimeUnit.SECONDS)

with a fluentWait() function that only waited however long it took to locate the element, polling the page every 200 milliseconds. Our built in click(), enterText(), clear() methods were all updated to use the fluentWait() prior to attempting the web element interaction, to ensure that the web element is ready to be interacted with.

    FluentWait<WebDriver> wait = new FluentWait<WebDriver>(driver)
        .withTimeout(timeoutInSeconds, TimeUnit.SECONDS)
        .pollingEvery(200, TimeUnit.MILLISECONDS)
        .ignoring(Exception.class);

    Boolean elementExists = wait.until(new Function<WebDriver, Boolean>() {
        @Override
        public Boolean apply(WebDriver driver) {
            WebElement element = getElement(selector, index);
            return element != null && element.isDisplayed() && element.isEnabled();
        }
    });

    return elementExists;

Helpers to directly create test data

Previously, the creation of entities such accounts, users, permissions etc. for the tests were done through the product web interface using Selenium. We have since moved the creation of all these entities server side, and restricted the scope of Selenium tests to true user actions. These changes brought down our overall run time and reduced the number of spurious errors.

Spurious errors

We have spent time learning how to deal with spurious errors in the Selenium tests. There are various factors that can contribute to these, ranging from brittleness of tests to environment issues. One solution we introduced early on was to allow tests to automatically retry (to a limit) if they failed unexpectedly. This was first done via jUnit as a parameter in our test runner file, and now we can set it directly in TeamCity.

Increasing the time of the fluent waits, as well as introducing new fluent waits in brittle areas, helped reduce the failures that were due to not enough time given for an element to be “interactable”. We found that sometimes waiting for an element to be present had not been working (for example, we should instead be waiting for a certain dialog box to close) so we added functions for that.

In other brittle places, executing javascript code was necessary. When certain elements are not consistently “interactable” some of the below solutions may be employed:

  • executeScript(“jQuery(‘” + descSelector + “’).get(0).scrollIntoView();”);
  • executeScript(“$(‘” + selector + “’).get(0).trigger(‘mousedown’)”);
  • executeScript(“$(‘” + selector + “’).get(0).focus();”);

To click into a field that is only visible with a hover:

  • executeScript(“jQuery(‘.gallery-edit-caption-container .js-show-embeddable-fields’).show();”);

Selecting from dropdowns, when it is not important to do this “natively”:

  • executeScript(“jQuery(‘select[name=countryCode]’).val(‘” + countryCode + “’);”);

Sometimes, an expected state would not be present on the page due to environment slowness on the backend. In this case, we implemented other solutions, like adding re-ries around specific functions. For example, when we add special fields on the backend we use a refresh with a retry on the Location Edit page, in case the special fields show up with a delay.

    Stopwatch stopwatch = Stopwatch.createStarted();
    while (stopwatch.elapsed(TimeUnit.SECONDS) < maxWaitSeconds) {
        fluentWaitForElement("body", 4);
        Uninterruptibles.sleepUninterruptibly(1, TimeUnit.SECONDS);
        if (isElementPresent(selector)) {
            System.out.println("found it! continuing...");
            break;
        }
        System.out.println("Could not find element, refreshing page...");
        refresh();
    }

Reduced maintenance

Page Object Model

We wanted the tests to enforce a strict discipline on code factoring, so we switched to using the Selenium Page Object Model. Instead of converting older specs to this model, we write all of our new tests using it. Whenever we need to update older tests, we create page object models for them, or use existing ones if they exist.

As part of this, we pulled out methods to make the tests more readable. And we had already swapped out most of the select-by-xpath to the more maintainable select-by-css during the Ruby -> Java conversion, which made the tests much more maintainable.

Troubleshooting

When we made the switch to TeamCity, we improved failure diagnosis by capturing relevant information for each failed tests. This includes a screenshot, a link to the url it failed on, and Sentry links with a stack trace of the error, for instances when one occurred.

Future Improvements and conclusions

We strive to keep our tests independent and separate out as many stand-alone components as we can. A good example of this is the current project of extract the Enhanced Content Lists (ECLs) code into its own project so that it can be deployed and tested separately. This feature was initially included together with the main Knowledge Manager build and deploys due to its initial small footprint. As new features were added on, ECLs became a large project with its own large suite of test cases. Taking such components out will make dealing with test failures in the core build more manageable.

We also plan to set up a system in which people can push their changes and run tests against the current state of the build independently of the main TeamCity project. The idea is that TeamCity would create branches of the most recent binary, engineers could verify their changes against that, and then push to the main branch.

Today we use a version of Selenium that is a few years old, which Firefox stopped supported as of Firefox42. Unfortunately, simply upgrading the JAR resulted in many test failures. I’ll be working through those in the coming weeks to allow developers to be able to upgrade their Firefox and take advantage of all the new features.

Certain things were tried and abandoned, such as a Huxley, a test suite that compares screenshots and complains if there are large differences between them, possibly indicating HTML or CSS problems. We found that breakages were reliably caught by our existing Selenium tests, and that the visual regression suite had too many false positives.

We have come a long way since the initial way we handled regression testing and continue to adapt as our teams expand and add more tests. We have learned that we must keep trying various strategies to refine our process.