Table of Contents
As mentioned previously we do some customizations in Neutron in order to support our layer 3 networking model. There are no tenant networks, tunneling or overlays.
We believe in keeping things as simple as possible and letting the network do what the network is good at. An added layer of SDN on top of the network adds complexity, is difficult to troubleshoot, and does not add value for our customers.
Background on our Network Architecture
Our network is a folded clos design (also called spine-and-leaf.) In this model, the distribution layer is scaled out horizontally (the spine) and is connected to the access layer (the leaves) with a full mesh.
Our leaf switches are typically top-of-rack which support 44 or 92 servers each, depending on the datacenter density and architecture.
Everything above the leaf (access) layer is layer 3 only. Layer 2 scope is confined to each leaf pair, and therefore to each rack. There is a layer 2 VLAN with a locally bound subnet created for each rack, and all servers or VMs in that rack sit on that VLAN. Each server or VM gets a primary IP address from the local subnet. Any additional IP addresses are layer 3 routed to that primary IP address.
This layer 3 model scales better because it reduces the size and scope of the ARP and CAM tables, which are now limited to the leaf switches. It also limits the broadcast domain and prevents flooding issues where layer 2 traffic is spanned across the whole architecture.
Our network is partitioned into security zones, which enforce macro security policy for network traffic. Each zone is a logical network spine (as above) with a separate VRF (L3 domain.)
We serve several security zones with OpenStack, but each hypervisor is dedicated to a single zone. Per security policy requirements, there is never any “mixing of the streams” within a single hypervisor.
Neutron Network Setup
Because our L2 scope is limited to the access switch, and Neutron networks are tied to the L2 domain, we create a separate Neutron provider network for each rack of hypervisor servers. There is no L2 connectivity across networks or racks (we don’t do any overlays or tunneling.) Our users are familiar with this constraint, as it also applies to physical servers on our network.
As stated above, we don’t do any private tenant networks. VMs get a port directly on the provider network and all communication is done there.
Floating IPs are implemented by injecting host routes into the network, whereby the floating IP is routed directly to the fixed IP of the VM. The floating IP is bound directly on an interface in the VM (there is no virtual router or NAT.)
We’ve modified the normal L3 extension to call into an internal static route injection API, which is how we get the routes into the network. (We did not use BGP because there’s precedent for using that static route API by other Go Daddy products.)
Customizations to Abstract Away Layer 2
Neutron has no notion of higher-level network objects, so we are stuck with tens to hundreds of Neutron networks (one per rack.) This is troublesome for the end user because they’d need to know in what physical rack their VM will be placed in order to choose the right network for that VM.
Instead, we customized the Nova scheduler to choose an appropriate network based on the placement of the VM by Nova. Said another way, the user is not required to specify a network for the VM. The scheduler provides that information inline during the nova boot request.
More on this in our Nova architecture post. But the bottom line is that networking is largely abstracted away from users and everything is figured out on the backend.
Hypervisor and Agent Configuration
We use the ML2 plugin with the Open vSwitch mechanism driver on hypervisors. The wiring inside OVS is a little complicated: we create a bridge for each VLAN so we don’t need to dedicate separate physical NICs for VM traffic. The VLAN tagging is done in OVS and not Linux because we ran into some NIC driver problems with tagging in Linux.
The Open vSwitch agent runs on every hypervisor, of course. We only run two DHCP agents in each rack, those are just colocated on two hypervisor machines. Most VMs get their network configuration via config drive, so DHCP is really just a backup service for us.
One change we’re looking at is switching to the Linux Bridge mechanism driver, now that it supports ARP spoofing prevention. This would remove the complexity of Open vSwitch and give us a more “standard” network stack with Linux bridging.
IP Usages Extension
To support network scheduling in Nova, we gather statistics about IP address usage in the Neutron networks. The stock Neutron API doesn’t have anything like this, so we built an IP usages extension which provides it.
The Future of Neutron
One thing we’re really pushing for in Neutron is the notion of routed/segmented/layer 3 networks. We worked with the Large Deployments Team in Vancouver and wrote up a RFE bug for this type of functionality. There have been a couple specs floating around, but the current one for the Mitaka cycle is here.
We hope that an agreement can be made on how to move this forward. Neutron operators should have some different deployment choices available to more closely match their network models.