In order for us to gain confidence in our tests, they needed to be faster. Faster at succeeding and faster at failing. What we call our “continuous tests” (i.e. unit & integration) took less than ten minutes, we should be able to make these run in something closer to that rather than the 9 hour mark they were loitering at. Above all, if these tests were ever going to be made a build requirement (in that we cannot deploy a build with a failing test), they would have to run in a space of time less than what can be considered an excellent night’s sleep. Through various discussions with the team, we decided that a 30 minute runtime was a good target for making the build dependent on these tests. Obviously we would try to continuously improve on the runtime, and it would likely vary as tests were added, but it was a good starting point.
Step 1: Cheat | 9hrs => 4.5hrs
The first large gain that we made is almost cheating, in that it would actually be cheating but for our fortuitous circumstances at the time which made it ok. We were in a rather privileged position in that we had deprecated a mode of our application, Use Case, and so we were actively removing a large section of our code base (Ryan touches on this topic in this post). This allowed us to also remove a huge swath of tests that were no longer needed – effectively cutting the runtime in half with one fell swoop of the delete key.* A doubleplusgood start, but not something that could be repeated.
* Ok, there were rather more keys and swoops involved but please allow me some literary licence.
Step 2: Parallel Parking | 4.5hrs => 45mins
We decided to follow the path we had taken with out continuous tests and set about making them run concurrently, by creating a number of parallel processes that would each work through a fraction of the test suite. This would allow us to shorten our runtime dramatically, and also gave us the fringe benefit of showing us instances of test bleed we needed to clean up. We created a rakefile that would spin up a given number of test processes, divide our test suite into that many portions and hand those portions to the Selenium server for execution on x instances of the browser. This presented us with a runtime of ~1hr locally, ~45mins on the build machines. The overhead of running both the application and Selenium servers plus many instances of the browser taxed our machines. We had learned that a 1:1 ratio of processes to cores gave us the best overall results in terms of execution time and machine stability, so we configured the rakefile to read the execution machine’s environment variables, allowing us to set at runtime the amount of processes started.
Step 3: Deal With Icky Sleeps | 45mins => ?
We found that previous test authors had struggled with some synchronization issues, and had liberally used the sleep method to compensate. Adding a sleep is only a good strategy when it’s being used to ascertain if there’s a timing issue that needs to be addressed so that a specific wait can be implemented, it is not a fix in itself and actually adds waste. We set about converting all the sleeps into specific waits and gained some time and robustness to boot – some of the sleeps weren’t long enough if the test machine was under heavy load. I wish I’d captured some metrics about how much wasted time we shaved off the suite by that particular refactoring…
Step 4: Arrange, Act, Assert | 45mins => 40mins
It’s the common pattern for any test. Arrange the preconditions, perform the action, assert the outcome. We were doing all of these steps by driving the browser to the relevant pages. This is an expensive operation, and is unnecessary when we have access to our full application stack. Why not do the arrange and (at least certain parts of) the assert behind the scenes where it’s not so costly? So we refactored the tests to use our WSAPI endpoints. During this refactoring, we learned more about how to write a better browser test, where we needed to enhance the WSAPI interface and each test was markedly quicker to execute. Some tests saw an order of magnitude gain in execution time. The overall gain from this refactoring was mainly seen in earlier completion times for the individual processes, the overall suite execution time didn’t lessen all that much. One of the prime reasons for not seeing a large benefit in overall time was our longest running tests were still reliant on front end data creation, so we added stories to the backlog to get certain CRUD operations added to the WSAPI.
Step 5: Oooooh! Shiny New JRuby | 40mins => 35mins
Ultimately we wanted to move away from using many processes to using many threads in a single process, to even out the execution times. The threading support we needed was lacking in Ruby 1.8.7. We decided to start using JRuby in order to use it’s native threading, plus we knew from earlier experiments we’d have some issues to overcome getting the suite running fluidly within Ruby 1.9.2 (which also has better threading support). We began using RVM for our Ruby version management, so getting JRuby in place was easy and we didn’t have too many problems running our test suite on the new platform – which was nice. We also gained a noticeable speed increase from making that switch, down to ~35mins - also nice.
Step 6: Buy Some Speed | 35mins => 30mins
I wavered on whether or not to put this in, but it happened so seems only right. Somewhere around this time, the company (big thanks guys!) bought us some beastly new build machines as we were starting to grow out of our old ones. On those build machines, we were suddenly under 30 minutes for all processes to complete. Our initial target was met, but we knew that we could do better by threading the tests so didn’t stop there.
Step 7: Threading the Needle | 30mins => 22mins
As mentioned previously, there was plenty of slack time in the suite execution. A couple of the processes would complete fairly quickly (~10min), and usually we would only have one or two processes that would run the full duration. So we needed a mechanism to enable threads to pull tests from a queue so they didn’t sit idle once they’d worked through their initial allocation. I wasn’t really involved in this somewhat meaty endeavor so another post will detail the action taken to get RSpec and Selenium threadsafe and pulling tests from a single queue. That effort really paid off though, making a pretty big contribution to our suite runtime.
Step uhhh… n for oNgoing: Where It’s At | 22mins => 19mins
At time of writing, our last browser test run took 1144.53 seconds to run. That’s a _whopping_ 19 minutes, or approximately 3.5% of the original execution time. We still want to go faster and have some long running tests on deck for speed refactoring. We’ll always be working on keeping the test time down to a reasonable level while we add new tests, and we’re implementing some new tooling to help us trend that runtime data more effectively. But that’s another story post, for another time.
