I gave a presentation today to one of our large customers. It was a whirlwind tour of how we do testing, both automated and manual, at Rally. One of the points I mentioned that got a fair bit of attention was the treatment of feature toggles as testing phase. In that presentation, I gratuitously included links to posts on the engineering blog. A curious observer asked whether we have any posts related to feature toggles. To the best of my knowledge, we don’t. Let’s rectify that.
For those of you who are reading who were also in my presentation today, I’m sorry. There’s going to be some repetition here between what I said and what’s here. Hopefully you’ll still learn something. I promise I’ll blog some more about it if you ask questions in the comments that require more detail.
First, what is a feature toggle? The basic idea of a feature toggle is that it allows you to turn code on or off for some subset of the people who use your software.
Let me be very clear in saying that feature toggle is NOT a term or concept that we invented. Martin Fowler and Jay Fields are notable bloggers who’ve written about them before. Other companies call them other things. Very recently, the gmail team blogged about them, calling them “conditional features”.
Next, lets talk about why or when you’d use feature toggles. These reasons are in the order that they popped into my head. That does not mean they are in order of importance.
#1 => Less branching
Feature toggles allow you to write code that’s in your production codebase for a long time before users ever access it. This is important because it saves you from merge hell. Long running feature branches are the enemy of continuous integration.
#2 => Phased rollout
A good feature toggle implementation allows you to roll code out to progressively larger groups of users. At Rally, we have three levels at which we can toggle on features: for individual users, for whole subscriptions, or globally. For clarification, a subscription in Rally vernacular is all of the users for a particular customer. We have our own subscription, just like any customer. We dogfood our app, so we have our own subscription in the product, and that’s usually the first subscription that gets toggled on.
#3 => Safe to fail
The reason I equated toggles to testing in my presentation is that toggles allow you to turn stuff off without releasing code to do it. Despite the best efforts of automated and manual testing, defects find their way into production systems. By turning on a feature for all users who work at Rally in advance of turning it on for others, we turn everyone here into an additional manual testers. Turning a feature off after we turn it on isn’t a common practice, but feature toggles provide us that safety net.
This post is not comprehensive. It’s intended as something to prime the pump for more thorough discussion of toggles. I’ve got some thoughts on where this series of posts could go, but I’d really love some feedback. If any of the following topics are more interesting than another, please let me know in the comments. Also, feel free to suggest a new direction for the series.
- Feature toggles require some level of branching in code. Why is that better than branching in source control?
- There are testing hurdles for toggles. Specifically global toggles can be hard to test, but some toggles have to be global. Why is that and how do you fight it? Also, having many toggles can present a combinatorial explosion of testing scenarios.
- How do you implement a toggle? Where in your code should you check a toggle?
- Once the code behind the toggle is fully rolled out, what do you do with the toggled and the old code?
- Who toggles features on? The end user or some admin?
- I’m developing a new application and really need to get it out the door, do I need to do this first? When do I know I need to start doing this?
- How do you test that the toggles themselves don’t introduce problems by not behaving as expected and correctly turning off the code? Have you ever had a case where a toggle logic failed and it exposed code that shouldn’t have been exposed?
- How, specifically, have you implemented the ability to toggle in your own code?
- Your topic could go here…