Managing Feature Flags at Scale - Retire Your Flags
Managing Feature Flags at Scale - Retire Your Flags
A large number of active flags in development isn't necessarily a good thing. Learn how to create a plan to retire your feature flags.
Join the DZone community and get the full member experience.Join For Free
Easily enforce open source policies in real time and reduce MTTRs from six weeks to six seconds with the Sonatype Nexus Platform. See for yourself - Free Vulnerability Scanner.
Teams working with feature flags usually come to the conclusion that a large number of active flags isn't necessarily a good thing. While each active feature flag in your system delivers some benefit (I hope!), each flag also comes with a cost. I'm going to explain those costs and explain how to avoid them.
Flags Ain't Free
Every flag under management adds cognitive load, increasing the set of flags you have to reason about when you're working with your flagging system. In addition, every active flag is by definition a flag which could be either on or off for a user, which means you need to maintain test coverage for both scenarios. Perhaps the biggest cost from active flags comes in increased complexity within your codebase in the form of conditional statements or polymorphic behavior. This "carrying cost" for feature flagging is very real - flagged code is harder to understand, and harder to modify.
The Case of the Zombie Flag
In a previous post, we saw the benefits of categorizing flags as either long or short-lived. However, even flags that have been explicitly identified as short-lived can still end up outstaying their welcome. A flag that's no longer in active use might still remain in the system, with its implementation still muddying the codebase. What causes these zombie flags to stay hanging around?
Sometimes a flag that was intended to be short-lived is simply forgotten, perhaps lost amongst a large number of other flags. This is, in and of itself, another reason to keep the number of flags in your system low - it prevents a broken windows culture where actively managing your flag count doesn't seem worth the investment, creating a vicious cycle.
It's also possible that a team is aware that a flag is past its expiration date but can't quite prioritize getting rid of the flag. Retiring the flag always seems to be near the top of the task backlog for next sprint, never the current one. This is a variant of the general challenge that many delivery teams face in balancing urgent work vs important work; building a high-visibility feature vs. paying down tech debt.
Plan for Retirement
The key to ensuring that short-lived flags live a short - but hopefully productive - life is in being intentional on retiring these flags, along with having established processes to help everyone stick to those intentions.
The first step is in explicitly identifying when a flag should be short-lived. As we've discussed, placing flags into defined categories can help, but isn't the only solution. Simply instituting a rule that every new flag have a stated expiration date can get you a lot of the way there. The key is in making that expiration date a requirement for every new flag. Of course, there also needs to be some mechanism to mark a flag that's intended to be long-lived, with no expiration date. An example of this would be a flag controlling access to a paid-only feature. The ability to control access to that feature will always be required, so the flag should never expire.
Once you have the concept of an expiration date in place, the next step is to enforce that expiration.
A technique which I would consider a bare minimum is to proactively place a flag retirement task on the team's backlog whenever a new short-lived flag is created. This doesn't entirely solve the issue - those tasks have a tendency of being serially deprioritized - but it's a good start.
A rather extreme technique - and one that I'm rather fond of - is to attach a time bomb to every short-lived flag. The expiration date for such flags is included in the flagging system's configuration, and a process simply refuses to launch if a flag that it uses has expired. Slightly less extreme variants of this approach would be for a process to alert loudly if it was using an expired flag, or having the flagging system itself send out alerts when an active flag expires, or drawing attention to expired flags in a flag management UI. I'm a fan of the time bomb though.
There's a concept from Lean Manufacturing which can be applied to feature flag management. When running a manufacturing production line it's beneficial to reduce the amount of stuff piling up between lines by enforcing a "Work in Progress limit" or WIP limit. The same technique can be applied with feature flags. A team declares that they will only allow themselves to have 4 (or 6, or 20) short-lived feature flags active at any one time. If a team has reached that WIP limit for flags and a product manager or tech lead wants to add a new flag they must first identify which flag the team is going to retire in order to "make room" for the new flag. This can be a very effective technique, mostly because it aligns incentives. The person who wants to add a flag is incentivized to also make sure that it will subsequently be removed - so that they can keep adding flags! For the same reason, WIP limits are best applied within the boundaries of a team - there's nothing more frustrating than a limitation that you don't have the power to fix.
Finish the Job
A flag should never be considered retired until the code which implements the flag is also removed. The most direct cost of the flag is on the codebase itself. Removing a flag from the flagging system also removes visibility of the cost of that flag, increasing the risk that a team will pay that carrying cost within their codebase for longer.
A good hard-and-fast rule to prevent this from happening is to only allow the configuration for a flag to be removed once there are no references to that flag within a codebase.
The key to keeping the number of active feature flags in your system under control is intention, coupled with some good practices. Explicitly identifying short-lived flags and then applying the techniques discussed in this post will help your engineering org to succeed with feature flags at scale.
For more ideas on how to succeed with feature flags check out Chapter 3 of the O'Reilly eBook produced by Split titled Managing Feature Flags.
Published at DZone with permission of Pete Hodgson , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.