Puppet

by Nathan Tippy, Principal Software Engineer

May 2013

Introduction

Puppet is an open source cross-platform software package for declarative configuration management. Puppet configuration files use the .pp extension and define the desired final state for each node. Platform specific package management tools are used by Puppet to ensure compliance with the configuration files.

These declarative configuration files permit a single operations person to manage many hundreds of servers. These same files permit testers to perfectly duplicate production environments as needed. Of course developers can write these declarative files as a way to define the installation process and document the subtle dependencies that often exist between applications.

Puppet labs produces both an enterprise and an open source edition of Puppet but this article will only be covering the open source edition. The enterprise edition adds support and a few usability features but most of this functionality is duplicated by 3rd party modules for the open source edition.

The flexibility of Puppet allows it to be used in many configurations for many purposes. Here are a few of the more common deployment strategies.

Single instance
Great for experimentation and learning how puppet works
Single master with polling nodes
Scales up to a few dozen nodes using the default install settings
Single Git repository pushing changes to nodes
Scales to many more nodes than the default polling setup
Decentralized using MCollective and message queues
No central point of failure and provides massive scalability
Vagrant virtual deployment
VirtualBox instances can be started on the fly for testing and continuous integration.

Puppet is a cross-platform tool designed to enable the management of mixed environments including flavors of *nix, OS X and even Windows. In general, this mixed approach works well with some notable exceptions. Windows support is somewhat crippled because it only supports a subset of the resources that can be managed easily on other platforms. Another significant limitation is that a Windows machine can not be used as the Puppet master. Overall, however, Puppet provides a very consistent experience regardless of what operating system is in use.

Puppet Work Flow

Puppet always follows the same repeatable work flow for applying changes, regardless of the manner of deployment. This process may be kicked off by push events, the default polling schedule which is set to 30 minutes, or manually requested at the command line.

Manifests(Modules) are written by you or your friends.
GitHub and PuppetForge both have large collections of modules. A simple search for a module similar to the needed one will likely save a lot of time before starting a new one. Utilizing revision control for modules is highly advisable, as with any other software project. It may be tempting to directly modify the module files on the puppet machine, but this defeats the purpose of using Puppet by making it impossible to repeat the deployment on other machines.
Clients(Nodes) will gather facts from facter.
When Puppet is installed it comes with ‘facter’: a command line utility for listing all the facts that Puppet can use for driving decisions in modules. There are a long list of facts supported such as IP, FQDN(Fully Qualified Domain Name), CPU speed, memory or Operating System.
Applicable manifests are compiled into a catalog.
The catalog is built specifically for this node and its facts. The catalog is a directed acyclic graph of changes that must be applied or confirmed in order. There can frequently be legs within the graph that have no ordering requirements. This is a result of the declarative nature of the *.pp files. In those situations no assumptions relating to the order of changes applied should ever be made.
Ensure compliance with the newly built catalog.
The local Puppet agent will walk through the catalog and apply configuration changes as needed. Puppet keeps track of what was done on previous runs and can quickly validate whether the current state has or has not changed. This greatly helps to speed up the process because installation work is not needlessly repeated.
Report generation
The Puppet master will be notified if the client installation is configured to report changes. The Puppet master can then be queried to determine the state of all the nodes. TheForeman is a great 3rd party add-on to provide a nice web based GUI for this data. It has many other useful features and can be used to simplify the installation of Puppet.

Starting and Testing New Nodes

While effecting new development with Puppet, it is sometimes necessary to kick off a run before the next pull. Thankfully, Puppet provides some tools for doing exactly this.

 >  sudo puppet agent --test --onetime

Notice that this command requires admin rights. This makes sense because Puppet is capable of installing or removing practically anything on a node. When the puppet master is installed, access to module files is restricted to ensure the security of the cluster. When new client nodes first talk to the agent, a signed certificate is required before access is granted to any data. This is not a process that can or should be automated; it requires a human in the loop to confirm that access should be granted to the new client machine.

Client’s first request

> sudo puppet agent --test --onetime

2. Master lists the pending certificate requests

> sudo puppet cert list

3. Administrator signs the certificate

> sudo puppet cert sign <NodeName>

4. Client retries request

> sudo puppet agent --test --onetime --noop

An Example (installing Java 7 on Ubuntu)

Puppet starts with the /etc/puppet/manifests/site.pp file and from there it will import other files and make use of installed modules. Nodes can be managed by node.pp files normally imported into the top of the site.pp. Tools such as TheForeman can be used as an ENC (External Node Classifier) to greatly simplify this work with an easy to use front end. For the example here, however, we do everything by hand to demonstrate how these files work.

Nodes are always defined by fully qualified domain names in Puppet. FQDNs are so important that you must ensure your machine has one before running any example or even installing Puppet.

In this example /etc/puppet/manifests/node.pp file we define some node groups using some simple regular expressions. Then we use one simple include statement to declare which of our classes should be applied to the group. It is possible to use a single parameterized class with different parameters for each group. However, that might make the node file harder to read. Many of the decisions relating to how modules and nodes are organized come down to a matter of taste. Nevertheless, the DRY (Don’t Repeat Yourself) principle should always be respected.

node /^www\d+\.ociweb\.com$/ {
            include legacyJDK
}
 
# this will match qa<number>.ociweb.com
node /^qa\d+\.ociweb\.com$/ {
            include stableJRE
}
 
# this will match dev<number>.ociweb.com
node /^dev\d+\.ociweb\.com$/ {
            include stableJDK
}
 
# this will match the single node with the FQDN of experimental.ociweb.com
node 'experimental.ociweb.com' {
            include earlyAccessJDK
}

Configuration declarations should be pushed down into the modules as much as possible for greater reuse. In the example below, only Java versions supported by this mythical enterprise have been defined in the site.pp file. This prevents duplication by defining the Java versions only once.

import 'myNodes.pp'
 
class legacyJDK {
    class{ 'java':
        version => '1.7.0_17',
        tarfile =>  $::architecture ? {
            'amd64' => 'jdk-7u17-linux-x64.tar.gz',
            default => 'jdk-7u17-linux-i586.tar.gz',
        },
        force   => false
    }
}
 
class stableJDK {
    class{ 'java':
        version => '1.7.0_21',
        tarfile =>  $::architecture ? {
            'amd64' => 'jdk-7u21-linux-x64.tar.gz',
            default => 'jdk-7u21-linux-i586.tar.gz',
        },
        force   => false
    }
}
 
class stableJRE {
    class{ 'java':
        version => '1.7.0_21',
        tarfile =>  $::architecture ? {
            'amd64' => 'jre-7u21-linux-x64.tar.gz',
            default => 'jre-7u21-linux-i586.tar.gz',
        },
        force   => false
    }
}
 
class earlyAccessJDK {
    class{ 'java':
        version => '1.8.0',
        tarfile => $::architecture ? {
            'amd64' => 'jdk-8-ea-bin-b79-linux-x64-28_feb_2013.tar.gz',
            default => 'jdk-8-ea-bin-b79-linux-i586-28_feb_2013.tar.gz',
        },
        force   => true
    }
}

All modules are loaded from the init.pp file as their starting point. Often modules include other files and templates. In this example all the files for the module are found in /etc/puppet/modules/java. It is required that modules are put in a folders matching their class name, in this case “java” .

There are many features not addressed here, such as templates, but this simple example demonstrates the most common ones. The full example is available on github.

Note the heavy use of meta parameters to enforce the order of work to be completed. It is easy to think that the work will be done top-down as it is written but that would be wrong. These files are all declarative so any required dependencies must always be explicitly declared. Puppet labs has an exhaustive reference on line for further study.

class java($version, $tarfile, $force=false) {
    # Takes 3 parameters and the third one has a default of false.
    # Variables in puppet can only be assigned once.
    # Here we build some simple strings we will need later.
 
    # These are all the binaries provided by the JRE.
    $jrebins = 'java,javaws,keytool,orbd,pack200,rmiregistry,servertool,tnameserv,unpack200'
 
    $jdk1bins = 'appletviewer,extcheck,idlj,jar,jarsigner,javac,javadoc'
    $jdk2bins = 'javah,javap,jconsole,jdb,jhat,jinfo,jmap,jps,jrunscript'
    $jdk3bins = 'jsadebugd,jstack,jstat,jstatd,native2ascii,policytool,rmic'
    $jdk4bins = 'rmid,schemagen,serialver,wsgen,wsimport,xjc'
 
    # Puppet does not have a concat operator for strings however it does have
    # interpolation when the " double quote is used.  Making use of the
    # variables defined above a single large string is built.
    # These are all the binaries provided by the JDK.
    $jdkbins =  "${jdk1bins},${jdk2bins},${jdk3bins},${jdk4bins}"
 
    # If the string 'jre' or 'jdk' is found in the tar file name we set the
    # appropriate values for $type and $bins
    # The file copy operation from the master to this node is done only if its
    # recognized to be a jre or jdk.  Further down in the exec for untar the
    # subscribe metaparameter is used to continue the install ONLY if this
    # file gets created:         subscribe => File["/tmp/${tarfile}"]
    if jre in $tarfile {
        $type = 'jre'
        $bins = $jrebins
 
        file { "/tmp/${tarfile}":
            ensure => file,
            source => "puppet:///modules/java/${tarfile}",
        }
    } elsif jdk in $tarfile {
        $type = 'jdk'
        $bins = "${jrebins},${jdkbins}"
 
        file { "/tmp/${tarfile}":
            ensure => file,
            source => "puppet:///modules/java/${tarfile}",
        }
    } else {
        alert('ensure the tar file name contains substring jre or jdk')
        # File in temp folder is not created so the install stops.
    }
 
    # Warn users that this was only intended for Debian platforms but
    # the install will continue anyway
    if $::osfamily != 'Debian' {
        alert("This module only tested with Debian osfamily but ${::osfamily} was detected, use at your own risk.")
    }
 
    # Ensure that the directory for jvm exists
    # Require is used by the exec for untar below to ensure the right ordering.
    file { '/usr/lib/jvm':
        ensure => directory,
        owner  => 'root',
        group  => 'root',
    }
 
    # The exec for untar uses the creates metaparameter to tell puppet not to
    # bother running the command again if the creates file exists.
    # When we need to change whats inside the tar we need to force it by
    # ensuring the expected folder name destination is absent.
    if $force == true {
        file { "/usr/lib/jvm/${type}${version}" :
            ensure => absent,
            force  => true,
            before => Exec["untar-java-${type}${version}"],
        }
    }
 
    # untar new Java distros into the right version named folder
    # Will not run if the creates=> folder already exists
    # Will not run if the require=> folder user/lib/jvn does not exist
    # Will not run if the subscribe=> file has not been created
    exec { "untar-java-${type}${version}":
        command   => "/bin/tar -xvzf /tmp/${tarfile}",
        cwd       => '/usr/lib/jvm',
        user      => 'root',
        creates   => "/usr/lib/jvm/${type}${version}",
        require   => File['/usr/lib/jvm'],
        subscribe => File["/tmp/${tarfile}"],
    }
 
    # Splits a string on the the dot token and creates array versionarray
    $versionarray = split($version, '[.]')
    $jvmfolder = "/usr/lib/jvm/java-${versionarray[1]}-oracle"
 
    # Subscribes to the exec for untar so if executed
    # it will then add a symlink to the version folder
    # force must be used just in case a symlink already exists but
    # it is pointing at some old location.
    file { $jvmfolder:
        ensure    => link,
        force     => true,
        target    => "/usr/lib/jvm/${type}${version}",
        subscribe => Exec["untar-java-${type}${version}"],
    }
 
    # Parse the string of binaries on the comma and produce an array
    $binsarray = split($bins, '[,]')
 
    # This call to the define works like a macro.
    # The $binsarray values are each mapped to $name causing multiple
    # exec commands to get called based on what is in the array.
    altinstall{ $binsarray:
        jvmfolder => $jvmfolder
    }
}
 
# Using define to create 3 sets of execs
# to update the alternatives for these bins
define altinstall ($jvmfolder) {
 
    # Install this alternative only if the sim link is created
    # the command its self requires double quotes but double quotes are also
    # in use by puppet because the string is interpolated.  In order to make this
    # work the quotes needed by the command are escaped with \"
    exec { "alt-install-${name}":
        command   => "/usr/sbin/update-alternatives --install \"/usr/bin/${name}\" \"${name}\" \"${jvmfolder}/bin/${name}\" 1",
        subscribe => File[$jvmfolder],
    }
 
    # Set this version as the active default if it is installed
    exec { "alt-set-${name}":
        command   => "/usr/sbin/update-alternatives --set \"${name}\" \"${jvmfolder}/bin/${name}\"",
        subscribe => Exec["alt-install-${name}"]
    }
}

Due to Puppet’s flexibility and broad platform support it has become a very popular way to eliminate tedious and error-prone tasks. Especially now that it is necessary to support an ever-growing number of machines that are often in the cloud. Puppet helps leverage the knowledge from development, testing and operations in order to ensure consistent, repeatable deployments.

Puppet is an excellent tool for managing large clusters of machines. This article has demonstrated only one possible way to make use of Puppet. How do you plan to use it?

References

[1] Puppetlabs.com
http://puppetlabs.com/puppet/what-is-puppet/
[2] Download Puppet Learning VM
http://docs.puppetlabs.com/learning/
[3] Installing Puppet Enterprise for a Windows environment
http://docs.puppetlabs.com/windows/index.html
[4] TheForeman.org
http://theforeman.org/
[5] Puppet 3 Reference Manual
http://docs.puppetlabs.com/puppet/3/reference/index.html
[6] Puppet Code Validator
http://www.puppetlinter.com/
[7] Puppet-Java
http://github.com/objectcomputing/puppet-java