Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Extending Selenium With Image Recognition

DZone's Guide to

Extending Selenium With Image Recognition

How a developer team was able to test complex UI features using some custom-made Selenium plugins.

· DevOps Zone
Free Resource

“Automated Testing: The Glue That Holds DevOps Together” to learn about the key role automated testing plays in a DevOps workflow, brought to you in partnership with Sauce Labs.

What Do We Have?

Selenium is the de-facto tool for functional web tests. Yet, Selenium has its own limitations. Standard API allows interacting only with the browser, and it's hard to test image based applications using Selenium.

In most cases, Selenium's capabilities meet requirements of functional web application testing. These standard tests will be able to perform operations inside the browser:

  • Locate elements by the selector
  • Retrieve their state
  • Perform actions on UI

But how do you test an application which uses Canvas, Flash, or a complex DOM tree? An example of such an application could be Pixlr, Ace editor, any maps application, or many other rich web applications. In our case, the application used a browser based VNC client.

Requirements

Providing a way to test a web application that uses a VNC client is not the only requirement. We have a bunch of extra requirements:

  • Use Selenium to write part of the test which doesn't involve a VNC client test

  • Be able to execute tests in many threads on different machines to speed up execution

It is clear that Selenium with its default features can't provide us a way to test visual content on the page. Yet, the available set of features provided by Selenium is extremely important to our requirements. Our team is using Selenium Grid to run tests distributively. We were likely to re-use the same environment for our future tests.

It's Important to Understand How Tests Are Executed Using Selenium Grid

When a test wants to run on a Selenium grid, it should pass configuration requirements for a node. Selenium Hub knows all available node configurations. Using this information, it selects the appropriate node from the list and opens the session. After the session is opened, the test initiates Selenium commands and sends them to the Hub. The hub then passes commands to the node assigned to that test. Node runs the browser and executes the commands within that browser against the application under test.

It is possible to extend Selenium Hub and Node with custom plugins. This can be done by creating a custom servlet and registering it inside the configuration file.

Sikuli as a Missing Part

There is a reason why Sikuli is a good tool for solving our problem. It can automate anything you see on the screen. Using image recognition we can locate GUI elements inside a VNC session. Sikuli has a Java API which additionally provides mouse and keyboard access. This is enough to implement the required GUI element interaction.

The idea was to extend each node with a custom plugin which would use the Sikuli Java API. Tests would send remote commands to the hub that will redirect them to the node, where the client resides. This implementation required us to develop:

Sikuli Limitations

Sikuli demands real screen or virtual frame buffer. Without it, Sikuli can't perform any image recognition. It also means that having overlapping windows on the screen would make image search fail.

This was against the Selenium Node default configuration we had used. In it, a browser is brought to the front on every action. There is constant overlapping happening in the case of many open browsers.

To solve this problem we have configured each Selenium Node to have a max session count equal to one.

Max session is a parameter that tells how many instances of the browser can run in parallel on Selenium Node.

Additionally, we have assigned a dedicated display for each Selenium Node process in the VM. As a result, we have virtual machines with many Selenium node processes. Each process can use only a single browser session on a dedicated screen.

Sikuli needs image resources to be present on the file system, where script execution happens. This required us to develop a file uploading extension. It accepts a compressed file archive, extracts it to the random directory, and returns the path. The path is later used as a prefix to locate images for Sikuli commands.

An additional problem was related to Selenium sessions. Each time a Selenium command passes through the hub, it updates the Selenium session. In our case, requests were bypassing this default behavior. Yet, it was easy to fix by touching "session object" each time our custom hub extension redirects a request to the node.

Example of Selenium Test With Image Recognition

Here is an example of a test using Selenium with the Sikuli extension. This code contains a few custom wrapper classes to provide an abstraction of UI element and implicit retries during lookup. This is for cases when UI is slower than our test.

As a first step, define capabilities for the browser. Note the custom capability sikuliExtension is set to true. With this statement, the test requests Hub for a node which has the Sikuli extension installed.

private DesiredCapabilities desiredCapabilitiesForSeleniumNode() {
    DesiredCapabilities desiredCapabilities = new DesiredCapabilities();
    desiredCapabilities.setBrowserName("firefox");
    desiredCapabilities.setPlatform(Platform.ANY);
    desiredCapabilities.setCapability("sikuliExtension", true);
    return desiredCapabilities;
}

The next part is to construct a Sikuli extension client and upload the images bundle.

SikuliExtensionClient sikuliExtensionClient = new SikuliExtensionClient(
        GridSettings.HOST, GridSettings.PORT, remoteWebDriverSessionId);

sikuliExtensionClient.uploadResourceBundle(ACE_IMAGES_BUNDLE);

SikuliHelper sikuliHelper = new SikuliHelper(sikuliExtensionClient);

The last part is element lookup and interaction. In this case, the image name acts as an element selector.

TextBox editor = sikuliHelper.findTextBox("js_body.png");

editor.click();
editor.deleteAllText();

editor.press(KeyEvent.VK_ENTER);
editor.write(SOME_JS_FUNC);

editor.press(KeyEvent.VK_ENTER);

Conclusion

By extending Selenium with an image recognition feature, we were able to create tests for complex UI within our application.

There was no need to rewrite any existing test step written using Selenium. It also saved us from setting up a new infrastructure for a different tool, as we could re-use Selenium Grid, which removed efforts on future maintenance activities.

You can find these extensions at sterodium.io

Learn about the importance of automated testing as part of a healthy DevOps practice, brought to you in partnership with Sauce Labs.

Topics:
selenium ,test automation ,automation testing tool ,automation tools ,java

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}