DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Cookies Revisited: A Networking Solution for Third-Party Cookies
  • Top 10 Best Practices for Web Application Testing
  • Accessibility Testing vs. Functional Testing
  • Cypress.io — The Rising Future of Web Automation Testing

Trending

  • While Performing Dependency Selection, I Avoid the Loss Of Sleep From Node.js Libraries' Dangers
  • Solid Testing Strategies for Salesforce Releases
  • Is Agile Right for Every Project? When To Use It and When To Avoid It
  • AI's Dilemma: When to Retrain and When to Unlearn?
  1. DZone
  2. Testing, Deployment, and Maintenance
  3. Testing, Tools, and Frameworks
  4. Extending Selenium With Image Recognition

Extending Selenium With Image Recognition

How a developer team was able to test complex UI features using some custom-made Selenium plugins.

By 
Alexey Nikolaenko user avatar
Alexey Nikolaenko
·
Jun. 20, 16 · Tutorial
Likes (5)
Comment
Save
Tweet
Share
21.0K Views

Join the DZone community and get the full member experience.

Join For Free

What Do We Have?

Selenium is the de-facto tool for functional web tests. Yet, Selenium has its own limitations. Standard API allows interacting only with the browser, and it's hard to test image based applications using Selenium.

In most cases, Selenium's capabilities meet requirements of functional web application testing. These standard tests will be able to perform operations inside the browser:

  • Locate elements by the selector
  • Retrieve their state
  • Perform actions on UI

But how do you test an application which uses Canvas, Flash, or a complex DOM tree? An example of such an application could be Pixlr, Ace editor, any maps application, or many other rich web applications. In our case, the application used a browser based VNC client.

Requirements

Providing a way to test a web application that uses a VNC client is not the only requirement. We have a bunch of extra requirements:

  • Use Selenium to write part of the test which doesn't involve a VNC client test

  • Be able to execute tests in many threads on different machines to speed up execution

It is clear that Selenium with its default features can't provide us a way to test visual content on the page. Yet, the available set of features provided by Selenium is extremely important to our requirements. Our team is using Selenium Grid to run tests distributively. We were likely to re-use the same environment for our future tests.

It's Important to Understand How Tests Are Executed Using Selenium Grid

When a test wants to run on a Selenium grid, it should pass configuration requirements for a node. Selenium Hub knows all available node configurations. Using this information, it selects the appropriate node from the list and opens the session. After the session is opened, the test initiates Selenium commands and sends them to the Hub. The hub then passes commands to the node assigned to that test. Node runs the browser and executes the commands within that browser against the application under test.

It is possible to extend Selenium Hub and Node with custom plugins. This can be done by creating a custom servlet and registering it inside the configuration file.

Sikuli as a Missing Part

There is a reason why Sikuli is a good tool for solving our problem. It can automate anything you see on the screen. Using image recognition we can locate GUI elements inside a VNC session. Sikuli has a Java API which additionally provides mouse and keyboard access. This is enough to implement the required GUI element interaction.

The idea was to extend each node with a custom plugin which would use the Sikuli Java API. Tests would send remote commands to the hub that will redirect them to the node, where the client resides. This implementation required us to develop:

  • RPC protocol for Sikuli commands, to make remote execution possible

  • Hub proxying servlet, to redirect commands to a proper node

  • Sikuli extension servlet, to receive Sikuli commands and execute them

Sikuli Limitations

Sikuli demands real screen or virtual frame buffer. Without it, Sikuli can't perform any image recognition. It also means that having overlapping windows on the screen would make image search fail.

This was against the Selenium Node default configuration we had used. In it, a browser is brought to the front on every action. There is constant overlapping happening in the case of many open browsers.

To solve this problem we have configured each Selenium Node to have a max session count equal to one.

Max session is a parameter that tells how many instances of the browser can run in parallel on Selenium Node.

Additionally, we have assigned a dedicated display for each Selenium Node process in the VM. As a result, we have virtual machines with many Selenium node processes. Each process can use only a single browser session on a dedicated screen.

Sikuli needs image resources to be present on the file system, where script execution happens. This required us to develop a file uploading extension. It accepts a compressed file archive, extracts it to the random directory, and returns the path. The path is later used as a prefix to locate images for Sikuli commands.

An additional problem was related to Selenium sessions. Each time a Selenium command passes through the hub, it updates the Selenium session. In our case, requests were bypassing this default behavior. Yet, it was easy to fix by touching "session object" each time our custom hub extension redirects a request to the node.

Example of Selenium Test With Image Recognition

Here is an example of a test using Selenium with the Sikuli extension. This code contains a few custom wrapper classes to provide an abstraction of UI element and implicit retries during lookup. This is for cases when UI is slower than our test.

As a first step, define capabilities for the browser. Note the custom capability sikuliExtension is set to true. With this statement, the test requests Hub for a node which has the Sikuli extension installed.

private DesiredCapabilities desiredCapabilitiesForSeleniumNode() {
    DesiredCapabilities desiredCapabilities = new DesiredCapabilities();
    desiredCapabilities.setBrowserName("firefox");
    desiredCapabilities.setPlatform(Platform.ANY);
    desiredCapabilities.setCapability("sikuliExtension", true);
    return desiredCapabilities;
}

The next part is to construct a Sikuli extension client and upload the images bundle.

SikuliExtensionClient sikuliExtensionClient = new SikuliExtensionClient(
        GridSettings.HOST, GridSettings.PORT, remoteWebDriverSessionId);

sikuliExtensionClient.uploadResourceBundle(ACE_IMAGES_BUNDLE);

SikuliHelper sikuliHelper = new SikuliHelper(sikuliExtensionClient);

The last part is element lookup and interaction. In this case, the image name acts as an element selector.

TextBox editor = sikuliHelper.findTextBox("js_body.png");

editor.click();
editor.deleteAllText();

editor.press(KeyEvent.VK_ENTER);
editor.write(SOME_JS_FUNC);

editor.press(KeyEvent.VK_ENTER);

Conclusion

By extending Selenium with an image recognition feature, we were able to create tests for complex UI within our application.

There was no need to rewrite any existing test step written using Selenium. It also saved us from setting up a new infrastructure for a different tool, as we could re-use Selenium Grid, which removed efforts on future maintenance activities.

You can find these extensions at sterodium.io

Testing Web application Command (computing) Session (web analytics)

Opinions expressed by DZone contributors are their own.

Related

  • Cookies Revisited: A Networking Solution for Third-Party Cookies
  • Top 10 Best Practices for Web Application Testing
  • Accessibility Testing vs. Functional Testing
  • Cypress.io — The Rising Future of Web Automation Testing

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!