Genesis, A New Data Center Automation Tool By Tumblr

Tumblr releases a New Data Center Automation Tool-“GENESIS”

Genesis, a tool for data center automation has been released by Tumblr which will streamline the process of discovering new machines and reporting hardware details to Collins, a part of Tumblr’s inventory management system. The tool is very convenient to do hardware configuration such as altering BIOS settings and configuring RAID cards before provisioning an operating system on to the host.

Genesis was developed by the Site Reliability Engineering and Data center teams at Tumblr and now Genesis is an open-sourced under the Apache License and is available on Github.

The tool includes a stripped down Linux image suitable to boot by PXE and a Ruby-based domain-specific language (DSL) for describing tasks to be executed on the host Genesis DSL creates the task based on which packages are installed and commands are executed easily. Examples of tasks are the TimedBurnin task, which performs a stress test on the system to rule out hardware errors before putting it into production, and BiosConfigrR720, which sets up the BIOS on Dell R720s just the way we want it.

There are few systems apart from Genesis that need to be in place for a successful deployment. These are

  • a DHCP server,
  • a TFTP server,
  • a HTTP server,

The Genesis Github project INSTALL.md provides further instructions and also includes the necessary server configuration options required.

When a machine boots, the DHCP server tells the PXE firmware to chain boot into iPXE. The iPXE is then used to present a list of menu choices, fetched from a remote server. When the user had made a choice the Genesis kernel is loaded and initrd (from the file server) along with parameters on the kernel command line. Once the Genesis OS has loaded, the genesis-bootloader fetches and executes a ruby script describing a second stage where we install gems, a few base RPMs, and fetch our tasks from a remote server. Finally, we execute the relevant tasks.

Let us take an example; consider a brand new server that boots up. It makes a DHCP request and loads the iPXE menu. In this case, we had to remember that we don’t know MAC address from before, so it must be a new machine. We then boot Genesis in to discovery mode, where the tasks it runs are written to fetch all the hardware information we need and report it back to the Collins. In our setup this includes information such as hard drives and their capacity and the number of CPUs, but if you want to get detailed information regarding service tags, the memory banks in use, the name of the switch ports, we have to follow this up with 48 hours of hardware stress-test using the TimedBurnin task

The Collins Github project webpage states that this application is very vital as it provides a source of truth and knowledge for Tumblr’s entire infrastructure.  All the data related to Tumblr production environments is stored and encoded in Collins, and this data is used to drive all of Tumblr’s data center automation.

The tool was created as a system which will look after all the physical servers, switches and racks in Tumblr production environments, and has evolved to also support inventory of hardware, IP addresses and software. The demonstration shows that Tumblr the Collins API and data are an excellent mechanism to drive automation processes. Newly Collins provides push button cluster deployment, drives configuration generation when hardware cluster topologies change, drives infrastructure updates when software configuration changes, and helps to manage software deploys.

“Genesis is still in the early stages of development and while we’ve met many of the goals we set out to achieve, there’s still much to be done” says the Tumblr blog. Tumblr further added that “If you find a bug or have a cool idea, let us know and get involved by contributing code and documentation or participating with questions and suggestions.”

LEAVE A REPLY

Please enter your comment!
Please enter your name here