Topology-aware Software Switches
Topology-aware Software Switches
Join the DZone community and get the full member experience.Join For Free
See why enterprise app developers love Cloud Foundry. Download the 2018 User Survey for a snapshot of Cloud Foundry users’ deployments and productivity.
The following article contrasts two models for network virtualization:
● In OpenFlow, a central controller “renders” a virtual/logical network topology by pre-computing the flows needed at each of the flow-based switches. The switches may be hypervisor-based software switches or hardware switches. In the latter case, hardware-based flow switches may be located only at the physical network edge or they may compose the entire physical fabric.
● The model I call “Topology Aware Edge,” TAE for short, pushes the logical network topology itself to hypervisor-based software switches (MidoNet, PLUMgrid, OpenContrail) or to Top-of-Rack network operating systems (Pluribus Networks). In both cases, the SDN switches are located at the edge of the physical topology.
Note that the contrast between reactive and proactive models is orthogonal to this discussion, both TAE and OpenFlow can and should be implemented in such a way that switches forward packets/flows based on local knowledge and without needing to query an off-box agent/controller.
Also note that in this article we use the term “switch” loosely, without implying L2 or L3 semantics.
In TAE, the intelligent switches understand “high-level” models; virtual L2 domains or devices, L3 switches/routers, NAT, Load-balancers, and so on. They may also understand non-wires-and-device models like Endpoint Groups and Policies.
The TAE implementations may do fast-packet forwarding by: 1) using “OpenFlow-like” flow-tables in kernel or user-space (MidoNet can use Open vSwitch’s kernel module or the DPDK equivalent); or 2) by leveraging a VRF-like routing stack (OpenContrail’s vrouter); or 3) by pushing the models themselves into the kernel (PLUMgrid with eBPF). But they share a philosophy of making the edge of the physical network topology-aware, and therefore “more intelligent.” In MidoNet’s case, each hypervisor-based switch is aware of a subset/subgraph of the logical topology - the subgraph that is traversed by flows emitted and/or received by local VMs.
Let’s review how the MN Agent forwards packets and flows. A VM’s vnic is plugged into the Open vSwitch (OVS) kernel datapath. The OVS datapath is controlled by the MN Agent in the same way that OVS vswitchd controls the datapath in standard Open vSwitch installations. When a packet misses in the OVS datapath, the MN Agent receives a message containing the packet and the pre-parsed header fields. The MN Agent matches the flow signature in its own user-space flow-tables (again, like vswitchd, MN Agent must store the datapath flows in user-space to deal with race conditions and flow life-cycle because the datapath does not have any flow filter/expiration capabilities) and if no match is found the flow is passed to a simulation layer. The MN Agent now simulates how the packet would traverse the virtual device topology.
Here’s an example to help visualize this. A flow from VM1 on host1/tap1 ingresses port1 of bridge1 and is forwarded to port2 which is linked to port3 of router1; router1 load-balances the flow to one of a set of back-end IPs reachable from port4 which is linked to port5 of bridge2; bridge2 forwards the flow to port6, an exterior port (exterior ports are the edge of the logical topology). The simulation of the flow’s path through the logical topology is complete and the Agent now maps the exterior port to the physical topology. In this example, port6 of bridge2 is bound to tap2 on Host2. The Agent on Host1 installs a flow that transforms the flow header (in this example, both mac addresses as well as the destination IP and port have changed) and prepends encapsulation headers (VXLAN+UDP or GRE) and outer tunnel headers (Ethernet+IP) such that the flow can be tunneled to Host2.
So, what’s the advantage of the intelligent edge? Why is it better for the hypervisor-based agent/switch to understand the logical network topology instead of just dealing with flows?
● Distributes flow computation resource usage (mostly CPU) to the edge.
● Scales better. As you add hypervisors, the OpenFlow model needs to propagate flow rules to more hypervisors and therefore needs to scale the central controller. In TAE, the central component only needs to store state.
● Debugging is easier. Each time the MN Agent computes a new flow it can log/report exactly what happened in the logical topology simulation (reflecting both the agent’s knowledge of the topology and the simulation logic itself) AND the resulting flow rule in the datapath. Contrast this with the OpenFlow model where you can see the flows that were matched in a sequence of OpenFlow tables and the flow that was finally installed in the datapath. The difference in troubleshooting is similar to the difference between debugging Java vs. debugging byte-code. We refer to this property of MidoNet as Just-in-time datapath flow computation.
● State synchronization is easier and the consistency model is simpler. Instead of receiving batches of OpenFlow rule updates, the MN Agent receives transactional updates to the topology, e.g. the virtual devices (ports, internal mac and arp tables, filters and NAT rules).
● Enables local decision-making. For example, balancing flows among several gateway nodes, taking into account locality (rack affinity for example) and other metrics.
MidoNet is Open Source. To find out more, please visit http://midonet.org/
About the Author
Pino de Candia joined Midokura as a Software Engineer in late 2010. He helped build early versions of MidoNet and in 2011 became the manager of the Network Controller team, based in Barcelona.
Pino is responsible for the design and architecture of MidoNet.
Prior to Midokura, Pino was at Amazon.com: building Dynamo, a NoSQL data store, for the first 2 years; then managing an internal infrastructure software team focused on caching tools/systems for the last 2.5 years.
Opinions expressed by DZone contributors are their own.