We have given a few “show and tell” talks at the last summits and operators’ meetups about the OpenStack architecture we’re running at Go Daddy.

We thought we’d some more in-depth posts to dive into it further.

(Because our Nova and Neutron architectures are quite non-standard, we’ll focus on those services in later posts.)

History

Our OpenStack initiative began at Havana, just before the Hong Kong summit.  As a company, we recognized the need for a more efficient, self-service way to provision compute resources in order to keep up with growth and enable our development teams to be more agile.

The initial deployment included Keystone, Nova, Glance, and Neutron.  We chose to not do Swift or Cinder on day one, but did bring in the ancillary services Heat and Ceilometer.

The decision to go with Neutron instead of nova-network from the get-go turned out to be a really good one!  (Even though nova-network more closely aligned with our network architecture.  More on this later.)

While our primary charter is to provide private cloud services to internal Go Daddy teams, our hosting department has developed a customer-facing Cloud Servers product that utilizes OpenStack on the backend.

Global Deployment

Go Daddy maintains data center presence at four global locations:

cloudregions

In each geographic region we deploy two OpenStack instances (a complete set of independent OpenStack services):  a “private” instance for internal Go Daddy applications, and a “public” instance for shared and dedicated hosting.

Stats

Our approximate capacity under management by OpenStack across all regions:

  • 350 compute nodes (hypervisors)
  • 8000 CPU cores
  • 36 TB RAM
  • 3800  instances

We expect these numbers to grow by 100% – 200% over the next year as we expand capacity in our international data centers and take on more production workloads.

Physical Layout

At a high level, we run a three-tier architecture:

  • Control plane servers (green) provide the top level APIs and services, including the Nova API cell.
  • Cell API and Image servers (blue) run the core Nova services and Glance for the Nova compute cells.
  • Hypervisor aka Compute servers (black) provide the compute capacity and are grouped into pods and availability zones (we call them servicing zones.)

All server types (except compute) are deployed in groups of two or three (or more) redundant machines.  The end user APIs are behind A10 load balancers and all services are fully active-active.

Keystone

Like many enterprises, we are heavily integrated with Active Directory for authentication and user management.  We use the Keystone LDAP backend for identity and group membership, and use the MySQL backend for role assignment.  This is a common setup so we won’t go into all the details here.  We run with a read-only AD backend, which really just means we don’t create users and groups through Keystone directly.

At this time we do no federation of Keystone across regions.  Users must authenticate to each separately and must have projects created in each.

OpenStack VMs are also tightly coupled to AD, although that’s a bit outside the scope of this post.  Craig Jellick and I (Mike Dorman) gave a talk at the Atlanta summit with some more details about our other AD integrations (slides, video.)

We use policy.json extensively to restrict some operations to only certain roles (project creation, security group rules, and availability zone/security zone selection.)  This is mainly to conform to enterprise policy and/or security requirements.

Networking and Neutron

We use Neutron with the ML2 plugin and the Open vSwitch mechanism driver, but not with the traditional “layer 2 everywhere” model (we do no overlays or tunneling.)  A Neutron network is created for every L2 scope on the physical network, which most of the time means a network per rack.  We have some customizations in the Nova scheduler to abstract these details away from the user.

For complete details on our Neutron deployment, see our next post OpenStack Architecture at Go Daddy, Part 2: Neutron.

Nova

We were an early user of Nova Cells, along with Rackspace, NeCTAR, and CERN.  This has been a challenge given that Cells v1 is still considered experimental and doesn’t get a lot of development attention.  The migration path to v2 is unclear, and isn’t coming until at least Mitaka.

However, cells has enabled us to scale faster and further than we could have with a single monolithic Nova deployment.

See OpenStack Architecture at Go Daddy, Part 3: Nova for a fully analysis of how we’ve deployed Nova.

Compute Configuration

We use the standard KVM hypervisor on CentOS 7.  The CPU allocation ratio is the default 16.0, but the RAM and disk ratios are tweaked slightly.  We never oversubscribe memory, so the RAM allocation ratio is 1.0.  The disk allocation ratio varies between 1.0 and 1.3, depending on the hardware and environment.

All VM root disks are ephemeral and stored on local-attached disk in the hypervisor servers.  The backend storage is all RAID-5 and is pretty well protected, so it’s rare for us to lose volumes.  We don’t offer any storage services or additional volumes today, but will be able to do that once Cinder is up and running.

Hypervisor hosts are grouped into “pods”, which are the hypervisors attached to the same pair of access switches (normally top-of-rack.)  We create a host aggregate for each pod, which is used for scheduling VMs to networks.  (More on this in the Neutron and Nova posts.)

Another notable feature we deploy on the hypervisors is traffic shaping and rate limiting using the Linux Traffic Control system.  This is to help protect against DDoS and other nefarious activity.  It’s mainly a concern for the public cloud, however we plan to deploy this to our private cloud as well.  (Jim Gorz is giving a #vBrownBag talk on this topic in Tokyo.)

Glance

There are dedicated Glance servers in each cell in an effort to keep the image transfer traffic closer to the hypervisors.  nova-compute is configured to directly hit the Glance servers in its local cell.

End user Glance API calls come in through the top-level control plane servers via the main API load balancer.  SSL termination for the Glance API is done at that layer with HAProxy.  From there, the API calls are proxied out to the real Glance servers in the compute cells.  This setup eliminates the SSL overhead when transferring images down to the hypervisors.

glance architecture (1)

Backend storage is provided by the Ceph RADOS Gateway S3 API.  We originally used the RADOS Swift API, but ran into this bug which made deleting larger images take a very long time.  The storage infrastructure is managed by a different group within Go Daddy, so at this level we are just a consumer of the storage.

Ceilometer (but not for metrics)

We originally deployed Ceilometer because we expected to use the metrics collection along with Heat to enable autoscaling and other orchestration features.  Unfortunately, like so many others, we had trouble with MongoDB, so we’ve largely abandoned this effort.

However, we have utilized the notification event publishing features of the Ceilometer notification agent to provide an event bus for more advanced users.  Events from the notifications.info queue are published to a Kafka broker.  Users subscribe to these events in Kafka to get notifications about OpenStack resource state changes.  This is much more efficient than polling the APIs at small intervals to determine when states change.

Heat

Like Ceilometer, Heat was deployed early on with the intention of enabling application orchestration and autoscaling.  Because heat stacks create users in Keystone, this was a non-starter for us due to our read-only AD identity backend in Keystone.

Recently we’ve been able to make more progress with Heat now that it supports domain isolated users for in-instance credentials and Keystone can do domain-specific configuration.  This is exciting as it’ll help enable higher-level services like Magnum (see below.)

Customizations

Like many larger operators, we carry several local customizations to enable OpenStack to work in our environment.

First, our users do not interact with Horizon: it is an admin-only interface for us.  We have a custom web UI that covers all our technology platform services within Go Daddy (Hadoop, Cassandra, DNS, etc.) and thus users also access OpenStack via that UI.

Secondly, we’ve created a small project creation API external to Keystone to handle access control for who is able to create projects.  It does some additional RBAC checking and allows some delegation of permissions.  This provides a mechanism for users to self-create projects via our UI, without actually granting them that higher level access within OpenStack.

We also carry some significant patches against Nova and Neutron to enable proper VM and network scheduling within our L3 network model.  These are detailed in the later architecture posts about those services.

What’s Next?

We’re excited to complete the build out of our baseline platform in all our global data centers so we can focus on adding new features.  Here are some major items on our roadmap for 2016:

LBaaS (aka Load Balancing without Tickets)

By far the biggest ask from our users is for LBaaS.  Getting a load balancer set up at Go Daddy today involves tickets to a few different groups and 2 – 4 weeks of waiting.  We want this to be 100% self-service and automated.

And we’re getting close.  After a bunch of work on the A10 LBaaS plugin for Neutron, it’s almost ready.  We are coordinating with our network automation team and A10 engineers on a few last fixes and should be able to deploy this to early adopter users soon for final testing.

Magnum for Containers

The number two ask from users is containers.  So we’ve also dug into Magnum a lot during the last month.  While the Kilo version is somewhat lacking, we’re encouraged by the amount of work going into this project and have already been pulling in the Liberty bits from master.

There’s a long road ahead for us to get Magnum into production.  But we’re excited about the possibilities here.

Cinder Block Storage

This quarter our storage team has begun deploying Ceph infrastructure to all our datacenters and that work should be wrapped up in 2016Q2.  We’re beginning lab work now to get Cinder configured in our dev environments and expect to begin deploying it to some of our regions early in Q1.

Cinder storage for additional volumes will unblock many of our teams that have requirements for more persistent, protected storage.

Bare Metal Provisioning with Ironic

This is purely a man-hours issue.  We’ve wanted to put Ironic in place for a long time to get a better handle on our bare metal provisioning.  We expect to focus on this once most of the baseline Magnum and Cinder work is done.

Bare metal servers through a self-service API will be another huge win for our company.  Building physical servers today, while mostly automated, is still a complicated and painful process for most teams.

A Big Thanks to the Community

Our OpenStack engineering team at Go Daddy is just 6 people.  We could never have accomplished all this in such a short time without the support of the OpenStack community.

Those who have helped us along the way are too many to list, but we want to specifically thank a few organizations that have really gone the extra mile for us:

  • NeCTAR
  • CERN
  • Rackspace
  • Yahoo!
  • iWeb
  • Time Warner Cable

We so appreciate your willingness to help out us fellow operators!  See you in Tokyo!