Converting to OpenStack Nova Cells Without Destroying the World

Note: I did a talk at the OpenStack Liberty Summit in Vancouver based on this post.  The slides are available on Slideshare, as well as a video recording of the session.

(See the Exploring OpenStack Nova Cells post for an introduction and background information about Nova cells.)

If you already have a Nova instance configured with VMs running, how do you convert to cells without wrecking everything in the process?  It gets a little tricky, but it is possible.  With some prior planning, it can also be done with minimal service interruption.

Basic Plan

We’ll take the existing Nova instance and convert it into a single compute cell in our Nova cells setup.  The servers running Nova will be converted to cell app servers running only the Nova services for the compute cell.

Next a new Nova instance is created, which is the top-level API cell.  The servers running this are api app servers.

Here’s a visual that illustrates what we’re doing:

Cells Migration

Technically there’s no reason you can’t run all this together on a single server.  (I believe this is what devstack does.)  However, for our purposes the services are split out among several physical servers.  I’ll focus on that scenario here, but this information should also apply to a monolithic configuration.

Environment Prep

The details of this are mostly specific to your environment, but for us this is everything we needed to get ready for the conversion:

  • Provision a new set of api app servers, to run the Nova API cell
  • Create a new database for the Nova API cell
  • Migrate non-Nova services off the previous Nova app servers to the new Nova API app servers
    • This is not strictly necessary, but for us we wanted only the Nova cell services running on the cell app servers.)

Split the RMQ Cluster

You could set up a new RMQ cluster for the new cell, but we wanted to convert with as little interruption as possible.  So instead we took the existing RMQ cluster and split in two.  This way, the running services can still use RMQ while the conversion is going on.

Expand the cluster with new app servers
  1. Configure and start rabbitmq-server on the new Nova API app servers.
  2. Add those servers to the existing RMQ cluster.
  3. Reconfigure all non-Nova services to use only the new Nova API app servers for RMQ.
    • Don’t forget neutron and ceilometer agents!
  4. Reconfigure Nova services (including network, metadata, and compute agents) to use only the cell app servers for RMQ.
    • You probably don’t actually have to change anything here, as these are the RMQ hosts the original Nova components were configured with in the first place.  The exception is if you are running the RMQ cluster behind a load balancer VIP.
Break cluster communication
  1. Use iptables to break the RMQ cluster communication between the cell app servers (original)andtheapi app servers (new ones.)
    • Block TCP port 25672 with iptables:
      • On API app server(s):  iptables -I INPUT -s cell_app_server_ip -p tcp --dport 25672 -j REJECT
      • On cell app server(s):  iptables -I INPUT -s api_app_server_ip -p tcp --dport 25672 -j REJECT
    • Repeat each command for every app server that you have.
  2. You now have a split brain cluster with the API app servers in one group, and the cell app servers in the other.
Remove extra nodes to create two separate clusters
  1. Use the rabbitmqctl forget_cluster_node command to remove the opposite servers from each group.
  2. Now you have two independent RMQ clusters, one for the Nova compute cell, and one for everything else.
  3. Remove the iptables rules (use the commands above, changing -I to -D).
  4. Reconfigure the rabbitmq.config file on all machines to have to proper list of cluster nodes.

Create the Child (Compute) Cell

We’ll now get the child/compute cell configured, and make it aware of the parent cell.

Set Up the Database Record

First, create the cell record using nova-manage on one of the cell app servers:

nova-manage cell create --name=api --cell_type=parent --username=rmq_user --password=rmq_password --hostname=rmq_host --virtual_host=rmq_vhost

Or directly in SQL (in the original Nova database) :

insert into cells ( created_at, weight_offset, weight_scale, name, is_parent, transport_url, deleted ) values ( now(), 1, 1, 'api', 1, 'rabbit://rmq_user:rmq_password@rmq_host:rmq_port/rmq_vhost/', 0);

Note that the deleted field must be set to ‘0’ and not the default value of NULL!  A NULL value is interpreted as “true”, which means Nova thinks this cell is deleted and won’t use it.

You can also skip the database altogether and configure the cells information in a json file.  See “Optional cell configuration” in the Configuration Reference.

Enable the nova-cells Service in the Child Cell

Add the following configuration to your nova.conf on the cell app servers:





You may set any name you want, but you must use a cell type of “compute.”

You’ll also need to install the openstack-nova-cells package for your distribution, if it’s not already.  Start up the nova-cells service, and the compute cell should be ready to go.

At this point, all it’ll do is connect to the local cell RMQ endpoint, and that’s about it.  You can tail /var/log/nova/nova-cells.log to watch for errors.  If you think it’s not working, enable debug in nova.conf, and you’ll get a lot more logging detail.

Disable Quota Enforcement in Compute Cell

You will also want to disable quota enforcement in the compute cell, as all quota operations will now be handled by the API cell:


Update VIF Plugging Options on nova-compute

To get around the problem of Neutron notifications no longer working under cells, add this configuration to your nova-compute nodes (and restart that service):

vif_plugging_is_fatal = false

vif_plugging_timeout = 5

The Exploring OpenStack Nova Cells post explains the details of this in more depth.

Bootstrap New Nova Instance

Now we need to create the new Nova instance for the API cell.  This looks just like setting up Nova normally, but be sure to configure it for the new database and RMQ cluster created on the api app servers.

This process is well-documented elsewhere, but the basic steps are:

  1. Install the Nova packages.  nova-api, nova-cells, nova-consoleauth, and nova-spicehtml5proxy (or whatever you use for console access) are all that’s needed in the API cell.
  2. Configure nova.conf appropriately for the new database and RMQ cluster.
  3. Run nova-manage db sync to create the database schema.
    • If you get failures on this, you may need to upgrade the sqlalchemy-migrate Python module.  There are some buggy versions of it.  I used 0.9.2 without any problems.
  4. Configure the cells options:





Set Up the Child Cell in the Database

This looks just like what we did in the child cell, but with the child cell’s information:

nova-manage cell create --name=cell_01 --cell_type=child --username=rmq_user --password=rmq_password --hostname=rmq_host --virtual_host=rmq_vhost

Obviously you can also do this directly in SQL or a json file as necessary.

Import Flavors from Child Cell

As discussed in the Exploring OpenStack Nova Cells post, flavors (among other objects) have to be manually synced between cells.  Flavors are usually pretty static, so we can safely import those from the child cell ahead of time.

You must copy these tables exactly from the original Nova database to the new one (for the API cell):

  • instance_types
  • instance_type_extra_specs
  • instance_type_projects*

* This table was empty for us.  I think it’s used for private flavors, which are only visible to a particular project.

If you have a single box with access to both databases, you can easily do the import with a MySQL pipeline like this:

mysqldump nova_orig_db instance_types | mysql nova_new_db

Just fill in the proper credentials and database names, but you see how it works.

Import Other State Data

Now we need to import all the information about instances, quotas, volumes, snapshots, etc., that existed originally.  This is the key to migrating to cells in a way that’s invisible to your users.

You’ll want to turn off all the Nova APIs at this point, so there’s no chance of instances being created or destroyed while you’re importing the data.

Here are the list of tables you need to import to the new database.  Do the import in roughly this order so you don’t run into trouble with foreign key constraints:

  • instances
  • instance_info_caches
  • block_device_mapping
  • instance_system_metadata
  • instance_groups
  • instance_group_member
  • instance_group_metadata
  • instance_group_policy
  • key_pairs
  • quota_classes
  • quota_usages
  • quotas
  • snapshots
  • snapshot_id_mappings
  • virtual_interfaces
  • volumes

These other tables I did not have to import, although you may want to.  I don’t think it will hurt anything:

  • instance_actions
  • instance_faults
  • instance_id_mappings
  • reservations
  • volume_id_mappings

If you are going for ultra-low downtime for the Nova APIs, you might want to script this up so you can import these tables as fast as possible.

Start Up the API Cell Services

Now you have all the proper state data that the API cell needs to deal with everything that existed before.  Start up nova-cells, nova-api, nova-consoleauth, and nova-spicehtml5proxy (or your console proxy service of choice) on the api app servers.  These services should remain off on the cell app servers.

Over time the API cell will receive updates from the compute cell about the state of each instance.  (You’ll see this get logged in nova-cells.log and the instances.cell_name field in the API cell Nova database is populated.)  Depending on the cells/instance_updated_at_threshold setting (which defaults to 3600s), it may take a while for this to happen.  I think the child nova-cells service does this automatically when it starts up, so to force the issue you can just restart that service.

Update:  I have heard there’s a way for all instance data to be updated in the API cell, not only the instances.cell_name field.  I had not seen this work, however, and it’s not clear what config changes are needed to do it.

Now you’ll be able to see and manage all the instances you had originally running!

Stuff I Did Not Test

We do not use cinder volumes or snapshots in our cloud (yet), so I have no experience with preserving any of those.  Your mileage may vary on that piece.

The same caveat applies to nova-network.  (As mentioned before we, use Neutron for networking.)

Obviously we use RabbitMQ for messaging and MySQL/MariaDB for database.  There are others that you may be using, and these instructions may or may not apply.

I’ve only actually done this conversion in our dev/test/staging environments, and not in production yet.  The proof will be in the pudding when we tackle that, and I’ll post updates here with any new lessons we learn.

You Might Want to Wait for Cells v2

Cells v2 is being worked on as part of the Kilo cycle, and it should be easier to deal with than the current cells implementation.  See the end of the Exploring OpenStack Nova Cells post for more details.

(tl;dr: unless you’re in a hurry to get to cells under Icehouse or Juno, you’re probably better off waiting for Kilo.)