Extending Selenium With Image Recognition

How a developer team was able to test complex UI features using some custom-made Selenium plugins.

Alexey Nikolaenko

Jun. 20, 16 · Tutorial

Likes (5)

Comment

Save

21.0K Views

What Do We Have?

Selenium is the de-facto tool for functional web tests. Yet, Selenium has its own limitations. Standard API allows interacting only with the browser, and it's hard to test image based applications using Selenium.

In most cases, Selenium's capabilities meet requirements of functional web application testing. These standard tests will be able to perform operations inside the browser:

Locate elements by the selector
Retrieve their state
Perform actions on UI

But how do you test an application which uses Canvas, Flash, or a complex DOM tree? An example of such an application could be Pixlr, Ace editor, any maps application, or many other rich web applications. In our case, the application used a browser based VNC client.

Requirements

Providing a way to test a web application that uses a VNC client is not the only requirement. We have a bunch of extra requirements:

Use Selenium to write part of the test which doesn't involve a VNC client test
Be able to execute tests in many threads on different machines to speed up execution

It is clear that Selenium with its default features can't provide us a way to test visual content on the page. Yet, the available set of features provided by Selenium is extremely important to our requirements. Our team is using Selenium Grid to run tests distributively. We were likely to re-use the same environment for our future tests.

It's Important to Understand How Tests Are Executed Using Selenium Grid

When a test wants to run on a Selenium grid, it should pass configuration requirements for a node. Selenium Hub knows all available node configurations. Using this information, it selects the appropriate node from the list and opens the session. After the session is opened, the test initiates Selenium commands and sends them to the Hub. The hub then passes commands to the node assigned to that test. Node runs the browser and executes the commands within that browser against the application under test.

It is possible to extend Selenium Hub and Node with custom plugins. This can be done by creating a custom servlet and registering it inside the configuration file.

Sikuli as a Missing Part

There is a reason why Sikuli is a good tool for solving our problem. It can automate anything you see on the screen. Using image recognition we can locate GUI elements inside a VNC session. Sikuli has a Java API which additionally provides mouse and keyboard access. This is enough to implement the required GUI element interaction.

The idea was to extend each node with a custom plugin which would use the Sikuli Java API. Tests would send remote commands to the hub that will redirect them to the node, where the client resides. This implementation required us to develop:

RPC protocol for Sikuli commands, to make remote execution possible
Hub proxying servlet, to redirect commands to a proper node
Sikuli extension servlet, to receive Sikuli commands and execute them

Sikuli Limitations

Sikuli demands real screen or virtual frame buffer. Without it, Sikuli can't perform any image recognition. It also means that having overlapping windows on the screen would make image search fail.

This was against the Selenium Node default configuration we had used. In it, a browser is brought to the front on every action. There is constant overlapping happening in the case of many open browsers.

To solve this problem we have configured each Selenium Node to have a max session count equal to one.

Max session is a parameter that tells how many instances of the browser can run in parallel on Selenium Node.

Additionally, we have assigned a dedicated display for each Selenium Node process in the VM. As a result, we have virtual machines with many Selenium node processes. Each process can use only a single browser session on a dedicated screen.

Sikuli needs image resources to be present on the file system, where script execution happens. This required us to develop a file uploading extension. It accepts a compressed file archive, extracts it to the random directory, and returns the path. The path is later used as a prefix to locate images for Sikuli commands.

An additional problem was related to Selenium sessions. Each time a Selenium command passes through the hub, it updates the Selenium session. In our case, requests were bypassing this default behavior. Yet, it was easy to fix by touching "session object" each time our custom hub extension redirects a request to the node.

Example of Selenium Test With Image Recognition

Here is an example of a test using Selenium with the Sikuli extension. This code contains a few custom wrapper classes to provide an abstraction of UI element and implicit retries during lookup. This is for cases when UI is slower than our test.

As a first step, define capabilities for the browser. Note the custom capability sikuliExtension is set to true. With this statement, the test requests Hub for a node which has the Sikuli extension installed.

private DesiredCapabilities desiredCapabilitiesForSeleniumNode() {
    DesiredCapabilities desiredCapabilities = new DesiredCapabilities();
    desiredCapabilities.setBrowserName("firefox");
    desiredCapabilities.setPlatform(Platform.ANY);
    desiredCapabilities.setCapability("sikuliExtension", true);
    return desiredCapabilities;
}

The next part is to construct a Sikuli extension client and upload the images bundle.

SikuliExtensionClient sikuliExtensionClient = new SikuliExtensionClient(
        GridSettings.HOST, GridSettings.PORT, remoteWebDriverSessionId);

sikuliExtensionClient.uploadResourceBundle(ACE_IMAGES_BUNDLE);

SikuliHelper sikuliHelper = new SikuliHelper(sikuliExtensionClient);

The last part is element lookup and interaction. In this case, the image name acts as an element selector.

TextBox editor = sikuliHelper.findTextBox("js_body.png");

editor.click();
editor.deleteAllText();

editor.press(KeyEvent.VK_ENTER);
editor.write(SOME_JS_FUNC);

editor.press(KeyEvent.VK_ENTER);

Conclusion

By extending Selenium with an image recognition feature, we were able to create tests for complex UI within our application.

There was no need to rewrite any existing test step written using Selenium. It also saved us from setting up a new infrastructure for a different tool, as we could re-use Selenium Grid, which removed efforts on future maintenance activities.

You can find these extensions at sterodium.io

Testing Web application Command (computing) Session (web analytics)

Opinions expressed by DZone contributors are their own.

Related

Trending