This article was written by Marten Terpstra at the Plexxi blog.
This week I read a perfectly reasonable article by Tom Hollingsworth, which then deteriorated in the comments section to a “you don’t know how a switch works” exchange.
Both participating parties miss the boat on several of their comments, but the far more interesting question is why we all of sudden need to understand the ins and the outs of how a switch is constructed, how exactly packets are moved, the impact of VOQs, queue thresholds, buffer allocation schemes, hash distribution, jitter, you name it. As consumers, why do we feel the need to understand all this?
I believe this is because of us. Us the switch and router vendors. We have been creating these switches and routers and for 25 years now and we have created a few bugs and problems along the way. And in our resolution of these bugs and issues, we have had to explain the gory details of the internals of the device. And we have exposed extremely detailed control knobs that give you access to portions of the hardware functionality the vast majority of folks never should have had access to.
Why do most of you know what Head of Line blocking is? Unless you are interested in queueing theory, there is really no need to know. But we all know about it because at some point in the now distant past, we (yes, us vendors) created switches that exposed Head of Line blocking behaviors, and in our resolution we explained it to everyone, then implemented Virtual Output Queueing as the resolution, and then explained in excruciating details why that solves this specific issue, while giving you all sorts of tools to influence how it works.
Then we had to explain how many queues were created and we have to tell you there is enough queues for multiple priorities for all the possible egress ports. Only to go on to explain where the buffer memory is and gave you tools to carve it into segments just for your specific application. For this example specifically, we have now created the illusion that bigger buffers are better, more queues are better. There are switches out there with Gigabytes of buffer memory. Can someone explain to me why you the majority user should a) care and b) assume that that means you get a better end to end solution?
Not only have we explained bits and parts of our internals in painful details, we have then turned them into marketing materials. More queues! Bigger buffers! Lower latency! How are you supposed to analyze that and come to the conclusion you have bought a superior solution?
I am totally guilty. I have spent many hours on the phone and in front of a large telco customer explaining exactly how the memory allocation and garbage collection worked in a Broadband Remote Access Switch. I have explained to them how packets received on PPPoE connections were distributed across multiple CPUs and how the queueing and prioritization was done between packets and CPUs. All because we had bugs and I had to convince the customer that we actually had found and fixed the bug, and provided them with tools to monitor that portion of the system in excruciating detail to make sure that problem was really gone.
Don’t get me wrong, I don’t mind explaining the guts of a switch or router to anyone. But for you the customer this should be purely because of technology curiosity, not as an essential part of your design and purchasing process. Because I will bet you that trying to apply this detailed knowledge to your application is really hard at best and more likely completely impossible. 99% of you (and us) only understand your traffic patterns at very high aggregate levels. And when you do know some reasonable detail, deriving the behavior of a switch on that traffic is near impossible without some very serious traffic modeling.
For that same reason there is a good chance that a switch that uses a better distribution of traffic across many more links may well outperform that switch with the Gbytes worth of buffer memory. There are most definitely characteristics of a switching platform you should care about if you know that certain aspects of your network may push the scaling limitations of the hardware. Talk to your vendor about how their end to end solution provides the best possible transport for the application you need to network. And if the discussion gets to queues, buffers, latency and jitter, there is a good chance you are wasting your time unless you are in that very small group of a few percent for whom this is truly relevant. Networking should be much simpler than that.