Originally written by Marten Terpstra at the Plexxi blog.
Throughout the development cycle of new features and functions for any network platform (or probably most other products not targeted at the mass market consumer) this one question will always come up: should we protect the user of our product from doing this? And “this” is always something that would allow the user of the product to really mess things up if not done right. As a product management organization you almost have to take a philosophical stand when it comes to these questions.
Protect the user
Sure enough, the question came up last week as part of the development of one our features. When putting the finishing touches on a feature that allows very direct control over some of the fundamental portions of what creates a Plexxi fabric, our QA team (very appropriately) raised the concern: if the user does this, bad things can happen, should we not allow the user to change this portion of the feature?
This balancing act is part of what as made networking as complex as it has become. As an industry we have been extremely flexible in what we have exposed to our users. We have given access to portions of our products that 99.9% of customers will never need, but unfortunately because of that 0.1% every networking product has tons of these little tweaks and knobs that could wreak havoc if used the wrong way.
We take a lot of pride in creating a network solution that is simple to use, simple to interact with, but extremely powerful under the hood. Direct access to all that power will lead to not only giving the customer a powerful weapon, but also the ammunition to use it. And like handling any weapon, you can really hurt yourself if you are not careful. Which comes back to the question at hand, how many safety valves do you put in place to make sure the user cannot hurt themselves?
The reason why
Some of these controls are buried fairly deep inside our products. They are meant for true power users and for the support teams of the vendors. And even beyond the support teams, there are tools and tricks inside our products that only the engineering teams know about, hidden even beyond the knowledge of support teams. Several years ago (in a previous job), we had a customer with a complex problem. Traffic was inconsistently forwarded and the belief was that there were communication problems between line cards and the main CPU card that would create inconsistant tables (the biggest challenge for any chassis based system).
Of course our development teams had tools embedded in the code to carefully examine and manipulate the tables and communications between these cards. Not exposed to a regular user, because they were potentially dangerous. And we proved that they were. During the execution of the command by one of my developers, he made a small typo in one of the arguments and boom went the switch. Crash and reboot. Customer very upset (for good reason, this was a production network), executive management very upset (also for good reason) and worse, the problem disappeared without us collecting the information we needed to attempt to fix it.
Different Answers for Different tools
There is a difference between debug tools that allow engineers to look deep inside the switch versus common features that may have significant service consequences if not in expert hands. No matter how hard we try, the first category will continue to exist. As vendors we will bring portions of these tools to the user or support visible spectrum, but at the same time we will create new ones buried deep.
The latter category though is one where I favor a less protective approach. There are many ways by which you can completely disrupt your network service. Most of the services your network provide have been created with your own hands through provisioning and configuration and can therefore be disrupted by those same actions. When we create features and functions that are potentially dangerous, it is on us the vendor to make sure it is properly documented and explained. This way when you do make that mistake (and it will happen) we can refer to that 4 letter “read the documentation” response.
Off come the training wheels
When it comes to user configurable features and functions, every single one of them has the potential to disrupt service when used the wrong way. We as vendors should not shy away from giving you all the tools you need to create (and destroy) the service you need. And I do not believe anyone wants to step through one “Are you sure (Y/N)?” after another. Of course we need to make creating services easier for you. If you are a frequent reader of our blogs you know that is what we stand for. But we should not take away features because we are afraid you can shoot yourself in the foot. Any time in the past where we opted to give you a gun but keep the bullets behind a locked door, we have found someone that legitimately explained that he or she needed the bullets to solve their specific problem. And we unlocked the door.
There are ways to teach someone how to ride a bike without providing permanent training wheels. Documentation (for those few that read it), workflow based provisioning and configuration and solid default behaviors with predictable results can steer you clear of the dangers we have provided. And when you do fall off the bike and hurt your knee or elbow, well, you are less likely to try that maneuver again next time. That is how most of us learn. Including those developers that crash a customer production switch during a debug session. For every one of those “oops” moments there will be many where those hidden gems may have saved your network from disaster. Just like there is one customer for whom having the bullets makes the difference between a working service and one that just limps along.