Keep Tabs on Network Services with Nagios
Best of ENP: Nagios is a powerful and extensible tool that'll keep you up to speed on what your servers (and the services they're providing) are up to.
Nagios provides an advanced server and device monitoring solution. It has become the de facto standard among other service monitoring applications, and is highly competitive with the non-free ones. This article will explain why Nagios is useful, and then cover some installation concepts to help get you started.
Previously known as the NetSaint project, Nagios morphed into its current form about four years ago. Nagios can monitor servers and their services, as well as network devices. More primitive monitoring solutions only allow for a simple ping to detect whether or not a server is still up and running. All too often administrators find that a server will respond to pings, but no services are actually working. Nagios connects to various different network services to test for functionality. To test, for example, a mail server, it will connect and wait to get the SMTP greeting before it declares the service operational. Nagios will monitor most common network-based services out of the box, and plug-ins exist for most anything else.
Plug-ins is where Nagios really shines. People have written countless feature extensions for Nagios, from SNMP-based queries to instant messenger hooks that allow notices to be sent via ICQ. Nagios has the built-in ability to send notifications of outages to a group of administrators, normally done via an email-to-pager gateway. By utilizing the available plug-ins, you can configure Nagios's notifications in many ways. One of the more popular plug-ins is a daemon that receives and generates alerts based on SNMP traps. The Nagios Exchange is a forum for finding and exchanging useful plugins. Browse around and see what some creative people have done. They've even created a Nagios live CD that you can use for testing (it comes preinstalled on a Knoppix disk).
Nagios and its many plug-ins can monitor an astounding number of services, and more. It can check, intelligently, all of the standard services: Web, SSH, telnet (gasp), ftp, etc. But that's only the backend. Nagios wouldn't be very useful if it didn't notify people of these outages. It does, and it also provides a very intuitive Web page. But we aren't talking about just a simple page that displays some errors, a la syslog output. Nagios presents a control center from which you can monitor, acknowledge, and view the history of all your alerts.
From the webpage you can view all of your hosts, and their status, in an easily read red-equals-bad, green-equals-good interface. But wait, there's more, as the infomercials would say. Nagios also allows users to comment on an event and includes a "schedule downtime" feature. When you comment on an alert, everyone else knows that you're working on it, or at least that you've acknowledged the failure and returned to bed. When scheduling a downtime, Nagios suppresses all notifications of failures so that your pager battery isn't depleted during the window of time that you expect things to be broken.
All of this software still wouldn't be useful if you had to hire an administrator to maintain Nagios, like some commercial applications require. As you might have guessed, Nagios is fairly painless to get up and running. The concept is just like most other open source applications: download, compile, install, configure.
There are a few things to realize, though. Nagios needs to be able to SSH into your servers, and you probably want to run it as its own user. So use your account management system and create a 'nagios' user on all of your machines.
Welcome back; the next step is to compile and install Nagios on the server. That's easy enough, just read the documentation to see which options you'd like to set. As you've read in the documentation, you can run 'make install-configs' to copy in sample configuration files to the Nagios etc/ directory. This is a good idea.
Next, install the default plug-ins from the Nagios site to get the core monitoring features. You can test them each manually by simply running them on the command line, they're just scripts. The final step to get Nagios ready for configuration is to rename all of the configuration files. When you get rid of the -sample suffix, Nagios will be ready to configure. The hosts.cfg, hostgroups.cfg, services.cfg, and contacts.cfg files will all need to be modified before Nagios will start. The best approach to working with Nagios is to read through the configuration files, and start adding the information it needs small amounts at a time. Once you have one server monitored, adding others is a snap. It isn't as difficult as it seems, and it makes sense that you'll have to spend a little time configuring it--the authors can't know a priori what you'd like monitored. They have done a wonderful job of assuming just enough of your configuration needs to make the Nagios experience pleasurable.
Don't get discouraged with Nagios. It really works the way they promise, and a few hours spent configuring monitoring will save many, many headaches down the road. Next week we'll focus on configuring and customizing Nagios, and take a look at a few interesting uses people have come up with for it.