After our first two posts detailing OpenStack architecture at Go Daddy, this third one in the series focuses on Nova.

Our Nova deployment is pretty standard, with a few caveats.  We run Nova Cells (v1) to enable greater scalability, do some scheduler customizations to support our Neutron deployment, and have some additional business logic built on to the API.

Nova with v1 Cells

Deployment

Early on we recognized our need to scale Nova beyond the limits of a traditional monolithic deployment.  Cells v1 is still an experimental feature in Nova, but some large operators were using it with decent results.  About a year ago we converted our deployment to use cells.  The earlier cells post provides much more detail on the feature itself.

The top level Nova services (the API cell) are colocated with the other OpenStack API services on the control plane servers.  The compute cell services (which are essentially a full Nova deployment except for Nova API), as well as the RabbitMQ cluster for the compute cell, run on dedicated Cell API servers.

This illustrates the separation, and how we split the services when doing the conversion:

Cells Migration

Challenges

One big challenge with running Cells v1 is there are many resources in Nova that are not “cell-aware”, basically meaning they don’t work.  This includes host aggregates, availability zones, flavors and server groups, just to name a few.  As a result, we’ve had to carry several patches to keep the functionality we need.

Turns out this is a common theme for operators using Cells v1.  The Large Deployments Team is gathering a list of common Cells v1 patches, and is working toward getting them merged into upstream Nova.

Even though the current development effort in Nova is on Cells v2, we still see value in merging the v1 patches.  It will still be another cycle or two before we have a clean migration path from v1 to v2, so we will have to continue maintaining the patches for some time.

Scheduler Customization

As described in the Neutron post, we do some scheduling customizations in Nova to support our unique Neutron deployment.  We modify the scheduler to automatically place VMs on appropriate networks based on what compute node they are scheduled to.

A host aggregate is created for each pod (or rack) of hypervisor servers.  The aggregate corresponds to all the hosts which serve the Neutron network for that rack.  We add a metadata field that lists which network(s) are tied to that aggregate.  This is how hypervisor servers are tied to particular networks.

Users do not have to (and generally should not) choose a Neutron network when creating a VM.  Our Nova scheduler patch selects one for them before requesting a port creation in Neutron.  It determines what networks are available on the hypervisor selected for the VM (using the host aggregate), and chooses one that has the most available IP addresses.

Aside from having to port the scheduler patch at every Nova release, this solution has worked well for us.  We hope to convert it to a Waffle (see below) to ease the patching issue in the future.

Waffles for Business and Policy Logic

We have some business and policy requirements that we must enforce in the Nova API.  These are mainly around server naming conventions and metadata fields that are needed for all VMs.

Instead of patching Nova directly to do this enforcement, we use the Wafflehaus tool to inject the logic via the paste pipeline.  This is much cleaner and easier to maintain, as there is no patching of Nova code directly.

We’re encouraged by this initial work we’ve done using Wafflehaus and plan to convert more of our patches and customizations to use it in the future.