Thu 22 Dec 2011
Hackathon: Upgrading our browser tests to use WebDriver API
Currently, our browser tests are written in Ruby and RSpec, using Selenium to interact with our site through the browser. Awhile back, Selenium 2.0 was released, the primary new feature being integration with the WebDriver API. According to their documentation:
This addresses a number of limitations along with providing an alternative, and simpler, programming interface. The goal is to develop an object-oriented API that provides additional support for a larger number of browsers along with improved support for modern advanced web-app testing problems.
Although we’ve since upgraded the selenium server, our client-side testing code still uses the Selenium 1.x Client API. While this has been working for us, I wanted to spend my hackathon moving us to the newer WebDriver API to take advantage of its improvements. A great starting point has been Selenium’s migration documentation. I’m currently knee-deep in the migration process and below are some of the issues I’ve had to solve.
WebDriverBackedSelenium
To ease the developer’s burden of migrating their entire codebase to use the new WebDriver API, the Selenium team has included a WebDriverBackedSelenium class which looks like the old Selenium 1.x object and internally adapts those 1.x commands to the 2.x WebDriver class’s commands. This means that you can pass this class to all your tests and all their calls will still work while you migrate your tests one-by-one. While this approach is great and what I want to do, it’s unfortunately only available in their Java client download and we are using the Ruby selenium-webdriver gem, which does not have this handy class.
The approach I’ve taken is to create a Ruby version of this class for the commands we use, except for specific cases where it makes sense to simply migrate the client code on the spot. Some of the translation is very straightforward, like migrating from the old API call of
selenium.open(url)
to the WebDriver API call of
webdriver.navigate.to(url)
However, some of the work is not as easy or does not have an exact mapping between the 2.x and 1.x API’s. In these cases, it has been helpful to look at how the Java WebDriverBackedSelenium client translates these commands. All of the command mappings can be found in the WebDriverCommandProcessor class.
Element locators have changed
In Selenium 1.x, you could locate elements with a few different strategies: by id, name, identifier (id or name), link text, DOM, XPath, and CSS. You indicate which strategy you are using by prefixing your locator string with the strategy name (e.g. “id=element_id”). If you don’t specify the prefix, it guesses which one to use based on a few simple rules.
In the WebDriver API, the prefix and locator are separated out as separate arguments named ‘how’ and ‘what’ (e.g. {:id => “element_id”}). A simple translation layer has helped to allow me to migrate the client code one at a time, allowing for both syntaxes. One case that has proven difficult is when the prefix is not specified and the string could either be an id or a name (the “identifier” strategy). Unfortunately, we’ve done this with both id’s and names, which has made it difficult to know which one is correct without running the test and looking for a pass or fail.
CSS3 selectors no longer use the sizzle library
In Selenium 1.x, the CSS selectors were handled via the Sizzle library, which implements a superset of the CSS3 spec. The WebDriver API, however, delegates CSS selecting to the browser’s native engine. Of course, we’re using some of the Sizzle features which are not part of the CSS3 spec. The primary offender has been searching for an element that contains a specific piece of text. Here’s an example of such a locator:
"css=div.simple-picker-menu div.simple-picker-menu-item:contains('#{value}')"
There’s not an equivalent CSS3 selector to find this element. The approach I’ve taken is to format the locator with an additional element to specify the text, which looks like this:
{:css => "div.simple-picker-menu div.simple-picker-menu-item", :text => value}
What this does, under the covers, is to use the first hash element to find all matching elements, not using the text. Then it selects the first matching element containing the given text.
But, what about crazier examples like this Selenium 1.x locator:
"css=.rally-dropdown-component:contains('Type') input.x4-form-field[value='#{type.name}']"
In this example, the check for text is in the middle of the selector, not at the end. These situations generally have 2 possible solutions:
- Rewrite the locator using XPath instead of CSS, which allows for selecting elements based on their textual contents.
- Break the locator in two. The first finds the element (or elements) with the matching text and the second finds the matching downstream element.
These are typically more difficult to implement as it is common for these strings to be used in different places from where they are defined. This makes the second option a bit more difficult to implement, without some refactoring.
Filling out form fields
In Selenium 1.x, you can put text into an input field like so:
selenium.type(locator, value)
The WebDriver approach is to use a command like this:
webdriver.find_element(locator).send_keys(value)
However, there is a slight behavioral difference. The 1.x command will replace the existing text with the new value whereas the WebDriver command will simply have the browser type the given keys, adding to any existing text. In these situations, I’ve needed to call element.clear before element.send_keys.
Interacting with elements that are not visible on the page
Related to the previous topic, Selenium 1.x lets you perform actions on elements that are not visible to the page. This means that I could call
selenium.type(locator, value)
which would populate, for instance, a hidden input field. Changing this code to use the new send_keys method won’t work because WebDriver will complain that it can’t interact with elements that are not visible. The solution I’ve taken for hidden input fields is to set them by executing JavaScript to set the element’s value attribute.
Some other instances of this issue that I’ve encountered are:
- Shared behaviors that click on a button that may or may not be visible and if it is not visible, it doesn’t matter. In 1.x, the click would still happen. I now need to add conditional logic to only click on the button if it is visible.
- We have some dropdown menus with links that only show up when the user hovers over a button. Our tests were written to simply click on the menu item link, which worked fine with Selenium 1.x. Now, we have to hover over the button first.
Wrapping up
I still haven’t quite finished migrating all the tests yet (we have over 2,500 of them). One of the last big hurdles remaining is migrating the tests that drag and drop elements on the page. Like some of the issues above, it initially seemed trivial to migrate to the WebDriver drag and drop API, but the first attempts didn’t work properly.
Have you had to migrate from Selenium 1.x to 2.0? Is there any particular aspect that you found challenging?
I also wanted to investigate how we could run our test suite against multiple browsers. My goal is to run all of our tests against all of our supported browsers on every commit (or at least once per day). But, how can we run the tests against multiple browsers? The testing framework typically specifies which browser to test against? Have you solved this problem?

If you like the Sizzle locators (in particular, “:contains”), check out http://nuget.org/packages/SizSelCsZzz, which lets you add them into WebDriver.
And you should seriously consider the RemoteWebDriver driver instead of the FirefoxDriver (or IEDriver, or …). Because you can choose a browser like you used to in Selenium RC.
I agree that RemoteWebDriver is the way to go, we’re having good success with using this and a Selenium Grid2 to spawn multiple different browsers to run our tests.
“Cucumber Sauce” is supposedly able to do this pretty easily (at least with cucumber tests). Whatever approach it’s using might be extendable to pure rspec.
I was more interested in the easily-run-tests-in-parallel bit, but I immediately slammed into ruby dependency hell and gave up.
Thanks for the comments!
@Ross @Ed
We are using the RemoteWebDriver which attaches to our existing Selenium Grid. I agree that coding to a specific WebDriver platform is not the right approach.
@Stefan
Great tip, I’ll have to check out Cucumber Sauce. It looks to achieve exactly what I would like to do with regards to running a test suite against multiple browsers in parallel. I’m interested to see how they are accomplishing it. It wouldn’t be too hard to set up multiple Jenkins CI jobs, one for each browser. But it would be nicer to have one job run the tests against all browsers in one go.
We did look into using Cucumber during the “Great YUNO Browser Test Refactoring” era but decided that we didn’t need the extra overhead/work at that time. The Cucumber DSL looks awesome though.
The lack of :contains in CSS3 is something we hit a while back too. IIRC, Matt Farrar coded support to get around it (probably similar to what you did). In general we tried to not use this.
Instead we have utility code in Java, JSP, JavaScript and Ruby that injects/removes/searches for special IDs on HTML elements (bt-id). These IDs are only generated when running the app in “test” mode. That way we can create test hooks to grab elements when running GUI-based tests.
@mparrish
A caution on using xpath- IME, Selenium & XPath locators didn’t play nicely at all with any version of IE available ~2yrs ago, including IEHTA mode. I had to convert a metric tonne of xpaths in one test suite when it was clear that executing xpath in IE added an insane amount of runtime. I remember seeing it take upwards of a _second_ between commands. as in clickElement, waitForElement, refresh… it was awful. I hope that WebDriver is different but haven’t researched that yet.
Even if performance isn’t a concern, I believe that we’d be better served restructuring the codebase to allow us to use locators that don’t require the use of the :contains concept. Getting better at making use of the utility code @Steve brought up is vital, though ideally we write what we need from the start.
@Steve
I think that the problem was cases where we had dropped out of Selenium’s Sizzle library into other Ruby libraries. e.g., Selenium captured the html body, then we’d try to parse that capture with Hpricot – expecting it to respect the :contains psuedoclass. I think we just got smarter about how that stuff worked. But I’ll certainly take the credit.
I have a similar issue with all locator prefix. In my company we were using selenium 1.0 (rc) with a wrapper class and we have to migrate the whole test suite to webdriver 2.0 (not webdriverbackedselenium) which does not support locator prefix. for example
Rc: String textbox = “id=someID”
WD: we can not use the locator prefix “id=”
If i have to update all locator class, that will be lots of work. Please share any solution that anyone can think of to skip the ID prefix when we driver use the locator. Thanks in advance.
What made this problem easier for us was that all of our selenium calls go through our own object, where we can intercept them and modify them to better suit our needs. In this case we would translate all the locators in this object before forwarding them on to selenium/webdriver. Depending on the design of your testing code, this may or may not be a viable option for you.
Even though your end goal is to completely migrate to WebDriver, using the WebDriverBackedSelenium could still be a useful tool. We’ve finally removed it ourselves, but it made it easier to migrate tests to pure WebDriver in small groups instead of having to change locators everywhere all at once.