When you first think about scaling an on-premise Hadoop cluster your mind jumps to the process and the teams involved in building the servers, the time needed for configuring them and then the stability required while getting them into the cluster. Here at Target that process used to be measured in months. The story below outlines our journey around scaling our Hadoop cluster, taking the months to hours and adding hundreds of servers in a couple weeks.
The Need
Early 2013 taught us the lesson that manually managing Hadoop clusters, no matter how small is a time consuming and very repetitive task. Our next cluster build in 2014 drove the adoption of Chef, Artifactory and Jenkins to help with cluster operations. We stood up those components and created new role cookbooks to manage everything on the OS (configurations, storage, Kerberos, MySQL, etc.). While this was a step in the right direction, it left us with a manual process to still create the initial base server build and then add it to the cluster after configuring it with Chef.
Build Foundation
Closing the gap in our automation meant finding a way to deliver on true end to end builds, from an initial bootstrap to running jobs in your cluster. OpenStack’s Ironic project was the first piece of the puzzle. Ironic gives us the ability to provision bare metal servers, similar to how OpenStack automated the VM build process. With Ironic as the foundation, we leveraged the Nova client to manage our instance builds. The nova python client interacts with the Compute service’s API, giving us an easy way to specify our build parameters and spinning up an instance on one of our physical servers. The other key piece with the nova client is the ability to send boot information to the server using user data. The bash script sent executes several commands to install the Chef client, setup public keys and run the initial knife bootstrap to set the run list for the build.
Example nova boot command:
nova boot --image $image_name --key-name $key_name --flavor $flavor_name --nic net-id=$network_id --user-data baremetal_bootstrap.sh $instance_name
Server Configuration
The knife bootstrap portion of our user data script sends the instructions for Chef to build a certain node (Hadoop Data Node, Control Node, Edge Node, etc.). The types of nodes are managed with role wrapper cookbooks, giving us an easy way to manage attributes and run lists without changing our core code. This makes it easy to deploy new features and bug fixes to a specific cluster or role within a specific environment for example.
Example role attributes:
# Hadoop
default['hdp']['ambari_server'] = 'ambari_server'
default['hdp']['ambari_cluster'] = 'cluster_name'
# MySQL
default['hdp']['mysql_server'] = 'mysql_host'
# Kerberos
default['hdp']['kerberos_realm'] = 'KDC_REALM'
default['hdp']['kdc_server'] = ['kdc_hosts']
# Software
default['hdp']['ambari']['repo_version'] = '2.0.2'
default['hdp']['repo_version'] = '2.2.4.12'
default['hdp']['util_repo_version'] = '1.1.0.20'
In the case of a Data Node role, the build runs through the following:
- Setting up internal software repo’s
- RHS registration
- Raid controller installation/JBOD configuration
- Base Target OS build
- Hostname management
- Centrify rezoning
- Autofs (Unix home directories)
- OS tuning (Hadoop)
- Kerberos (client installation and configurations)
- Disk formatting, partitioning and mounting
- Ambari host registration
- Ambari client installation
- Ambari service installation
- Ecosystem Setup (R, Python, Java, etc.)
- Ambari host service startup
Home Stretch
Now that we have a fully configured Hadoop Data Node, we needed a way to add it to our cluster without impacting existing production jobs. This is where Ambari comes in the picture. Ambari is our primary Hadoop administration tool which provides both a UI and RESTful APIs to manage your cluster. Using the UI was a no-go for us since that would mean manual intervention. This final push was going through the API to complete the end-to-end automation. Starting with Ambari 2.0, you can integrate Ambari with your Kerberos KDC to have it manage principal and keytab creation. In order for this to work through the API, you needed to ensure your KDC admin credentials were being passed. One way to do this is using the curl session cookie. This json file was created using a Chef template with an encrypted data bag to securely provide credentials.
kdc_cred.erb
Chef template example:
{
"session_attributes" : {
"kerberos_admin" : {
"principal" : "admin/admin@<%= node['hdp']['kerberos_realm'] %>",
"password" : "<%= @password %>"
}
}
}
You can create the cookie with the curl -c option, using the json file which contains your KDC admin credentials. With the session cookie created, you are all set to run through the calls needed to fully install and start a service.
1.Service Addition
This step registers the new service on the host specified and leaves it in an “Install Pending” state. Example command for adding a node manager to host worker-1.vagrant.tgt:
curl -b /tmp/cookiejar --user admin:<pass> -w -i -H 'X-Requested-By: ambari' -X POST http://ambari.vagrant.tgt:8080/api/v1/clusters/vagrant/hosts/worker-1.vagrant.tgt/host_components/NODEMANAGER
2.Service Installation
This step continues from the registration and actually installs the service on the host. After the installation is completed the service is in a ‘Stopped’ state but is ready for use. Example command for installing a node manager to host worker-1.vagrant.tgt:
curl -b /tmp/cookiejar --user admin:<pass> -w -i -H 'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state": "INSTALLED"}}' http://ambari.vagrant.tgt:8080/api/v1/clusters/vagrant/hosts/worker-1.vagrant.tgt/host_components/NODEMANAGER
3.Maintenance Mode
This step suppresses all alerts for the new service since it is installed but in a stopped state. You could start the service at this point, but the rest of the build process for our internal builds needs to complete so this will stay down until the very end. Example command for putting the node manager on host worker-1.vagrant.tgt in maintenance mode:
curl -b /tmp/cookiejar --user admin:<pass> -w -i -H 'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"maintenance_state": "ON"}}' http://ambari.vagrant.tgt:8080/api/v1/clusters/vagrant/hosts/worker-1.vagrant.tgt/host_components/NODEMANAGER
4.Service Startup
With the rest of our build completed we can remove the service from maintenance mode and start it up, so jobs can start running on this host. Example command for starting the node manager on host worker-1.vagrant.tgt:
curl -b /tmp/cookiejar --user admin:<pass> -w -i -H 'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state": "STARTED"}}' http://ambari.vagrant.tgt:8080/api/v1/clusters/vagrant/hosts/worker-1.vagrant.tgt/host_components/NODEMANAGER
The above examples walked you through the process to fully add a Node Manager service using Ambari’s API. The Chef recipe’s created internally leverages that logic as the foundation, but optimize the process to handle multiple client and service installations. Additional information on Ambari’s API can be found on Ambari’s github or the Ambari wiki.
The Results
With the above process we can complete a single data node build in 2 hours. The great thing about the nova client was the ability to script out your builds, allowing us to run parallel builds and reaching our expansion goals by adding hundreds of servers within a couple week timeframe!
Issues Encountered
Our automation efforts ran into issues like any other project. The two larger challenges we faced were finding a way to register new hosts in our DNS and then getting Ambari to scale with our builds.
On the DNS side, not having access to create new records and not wanting to get Change approvals in the process forced us to look at a bulk DNS load option. DNS records were created for the new IP space and during the build the server would lookup the hostname based on the IP address it was assigned. After setting the hostname to the new value and reloading Ohai, we were back in business.
Due to a known bug, Ambari’s host_role_command and execution_command tables were growing at an alarming rate. The more servers we added, the larger the tables became and the longer the installs took. With our systems freeze weeks away, we couldn’t afford to wait and go through an Ambari 2.1.x upgrade where that bug was being addressed. We ended up adjusting the indexes and purging those tables to continue with our builds.
What’s Next?
Community give back (wiki, jira’s for features and issues encountered) with the goal of helping to solve some of those problems ourselves. Outside that we want to continue to enhance our initial build MVP by refactoring some of the code and adding validations and build notifications. All of this will continue to mature our build process, getting us ready for more growth in 2016.