Linux Server Management: Five Signs You're Doing It Wrong
Recently, we observed someone noting that Xen doesn't provide the same physical-to-virtual (p2v) conversion capabilities that VMware does. Yet another reason Xen isn't as widespread as VMware, the theory goes. The question this raises, however, is, "why is p2v even necessary?" Assume the physical hardware had instead died; would this person have had to restore configuration files and settings manually from backup?
Postulate: if you cannot tell a server (be it physical or virtual) to reinstall itself, then simply walk away knowing it will reload and return to full service, you are doing it wrong. Every network service, configuration file, and setting required for a server to function, absolutely must be automated and configured in your configuration management system.
Configured State Versus Running State
Server deployment systems, such as Kickstart, allow administrators to configure most anything to be set after a server is loaded. Kickstart is designed to install a set of packages, and maybe configure a few users that should be able to login immediately after the installation of the operating system. Indeed, you can also copy in configuration files and turn Kickstart into a robust system that sets up many network services on each server at install time. Another school of thought on this matter exists, however.
What happens after installation has completed? Immediately afterward, you are in a known configuration state. Every configuration file on the system is exactly as it was when you copied it in using Kickstart. Immediately after that, the running state of the machine is unknowable. A sysadmin could login and "fix" something, changing files and not documenting the change. The current running state of your Apache Web server, for example, very likely has diverged since you first configured it.
If the server were to crash, you might have backups of /etc/, but the restore process is lengthy, manual, and error prone. Using a proper configuration management system means that all changes to the Apache configuration will be done in a central place, and then pushed out to the server. Upon re-installation, the server would immediately fetch its configuration files, required packages, and other bits again; the configuration same files you have been working from and know with certainty are the correct ones.
Configuration management does much more than manage files. In fact, newer systems hold strong to the belief that if you spend most of your time manually managing text files, you are doing something wrong. Puppet, of course, is the more abstract configuration management system that allows you to talk about users, groups, and packages rather than the files that manage those on each type of server you have.
Beyond knowing the state of your server's configuration files, which is just the simplest example, configuration management systems also allow you to represent complex dependencies and act on conditional states. They also store everything in a central place, which allows admins to quickly verify or change services across the network, automate their monitoring infrastructure, and gather data about the state and status of their network.
Without configuration management, at least two bad things happen. First, you have no idea how your systems are configured. Even if you take good notes about the files you've changed and the packages you've installed, you will never again reproduce the exact same system. Testing an OS upgrade, for example, is simple with Puppet. Simply ensure a new test server (or VM) gets the same configurations and tell it to install with the new OS. Problems can be dealt with before performing the reinstall upgrade on the production system.
And the second bad thing is that you will never know what/when/where/why/how network services are running. It is a very bad thing to discover an old FTP server lying around on a server that hasn't been patched in years. Proper configuration management practice doesn't eliminate the need for security audits, but it does mean you won't have many surprises.
Some clear indicators you are doing it wrong may be helpful. If you:
- SSH into servers after installing them, to configure services ...
- SSH in a for-loop to many servers at once and perform administration ...
- login to Web servers to find out which virtual host sites are running on ...
- manually add new servers (and their services) to your monitoring ...
- make changes to a server without documenting the change and automating it for future reinstalls ...
...you are doing it wrong.
Manage more than 50-100 servers, and this quickly becomes obvious. Scaling IT systems either breeds high levels of automation to make the infrastructure manageable, or it ends up breeding a huge mess.
Configuration management, then, is really about IT infrastructure management. ITIL preaches a mythical CMDB, or Configuration Management DataBase. In the pay-for-crap world of software vendors, this usually means someone will sell you something that inventories all software installed on every server, and generates a pretty report. You can say you are ITIL-compliant (in this area, at least), but you still have no handle on what the infrastructure is really doing, nor do you have automation and the ability to recreate systems from scratch. The technology exists, but it is a complex problem and requires a bit of work to get right. Once right, though, you can scale to more than 1,000 servers with little extra work.
When he's not writing for Enterprise Networking Planet or riding his motorcycle, Charlie Schluting works as the VP of Strategic Alliances at the US Division of LINBIT, the creators of DRBD. He also operates OmniTraining.net, and recently finished Network Ninja, a must-read for every network engineer.