Kick the Tires on Zenoss
The Zenoss installation process is relatively painless. After installing dependencies, you need to download the Zenoss tarball and run the installer script that comes with it. The installation guides, while they do provide the necessary information, are painful to read. The Ubuntu guide, for example, suggests that we add a user named "zenoss" and set the password to "zenoss" if security isn't a concern. The entire document assumes you are new to Linux, which is also extremely annoying. One is left to assume that this was community-contributed documentation.
The official documentation itself, however, is wonderful. It does an excellent job of explaining how the entire system works, and provides some good examples. Some sections are a bit sparse, but documentation cannot cover everything. The best place to find documentation beyond the Zenoss Guide is, of course, Google, which frequently leads you back to an excellent community-contributed document on more advanced topics.
Back to the installation process: We may have praised it too hastily. The installation guides mention that you need Python 2.4 installed for the installer to run successfully. It claims that you can later switch back to 2.5, because the system itself will run just fine. This is not true at all; stick with 2.4 as the default Python on your Zenoss server. Most things work under 2.5, but you will frequently encounter Python errors spewed to the Web page unless you run it under 2.4. After the install is complete, you can browse to HTTP port 8080 on your Zenoss server.
The installation guide didn't say that it was running on port 8080, so after figuring that out with lsof, we're presented with a login page. The account created during the install actually works! Hey, you don't take anything for granted with software, especially if you're bruised and battered from living in the OSS world for many years. Pleasantly surprised, it's time to look around and see what this thing can do.
Clicking around aimlessly is certainly fun, but not recommended with Zenoss. Without at least at least a partially understanding of Zenoss, it is nearly impossible to make sense of the configuration options. A casual user clicking around will be able to find devices and information about them with ease, but a Zenoss administrator will become frustrated. Documentation, to the rescue.
After reading the guide, it became clear that everything is organized into Classes. There are other organizational components, like Systems, Locations, and Groups, but fundamentally, devices end up in a Class. We recommend you ignore everything but classes for now.
We added some devices manually, most important servers first. The only required information was the hostname and Class the device belongs to, and ta-da, Zenoss fired off a discovery process and began enumerating the first server. This worked wonderfully, and before long we could see network interfaces (and utilization graphs), performance graphs of CPU, memory and some other great tidbits of information. Since Zenoss was told this was a Solaris server, and its Production State is "Production," we already begin to see the benefits of having our devices in Zenoss. Under /Devices/Server/Solaris, all Solaris servers will live, and provide an easy access point to information.
Before adding a device, it must be SNMP capable. If you don't already have NET-SNMP installed and running on all your servers, now is a good time to start. It is quick to configure a read-only community and enable it on most operating systems. Once SNMP is enabled, feel free to add more devices, or, alternatively, go crazy.
Unleashing Zenoss Discovery
Adding nodes manually is extremely tedious. Zenoss is capable of Layer 3 discovery, which means it can find IP information, like routing tables and automatically enumerate your hosts, but it cannot provide Layer 2 information, such as "what switch port is this host attached to?" Starting discovery on a subnet is not straightforward, but the guide explains it all.
I decided to "discover" a subnet which contained 55 Solaris and Linux servers. After browsing to the Network section, locating the subnet (it already knew the subnet because it was in the routing table of a host we added), and clicking the secret magical arrow menu button, Zenoss was told "Discover Devices."
After nearly an hour of churning, due mostly to timeouts on devices with no SNMP running, Zenoss reported that it was done. On the main Dashboard page I was able to click on /Devices/Discovered and find all of our SNMP-able hosts from that subnet. Now, the only thing to do was set the production state (production, pre-production, test, etc), and the Class (/Devices/Server/Linux).
At this stage of the game, most NMS products are usually extremely painful to deal with. Zenoss, on the other hand, consumed much more time than anticipated—we could not stop playing with it. We added another subnet that contained around 100 nodes, and started organizing them into classes and production states. It was extremely fun, and the best part is that after a day of Zenoss running, we had charts of some extremely useful information about our servers. Tools such as Cacti or Munin can collect this type of information, but anyone who has configured either of those tools knows that it is barely worth it.
If you decide to give Zenoss the keys to your network gear (SNMP community), watch out! You are in for a treat. Zenoss will automatically discover the Layer 3 topology, and clicking on "Network Map" provides an extremely good, interactive map of your network. The accuracy rivals HPOV, and the usability is unmatched. Oh, did we mention that simply pointing Zenoss at your routers and switches would obsolete another piece of enterprise software? Cricket, the prevailing traffic stats collector for network gear, just became useless to this Zenoss user. After Zenoss has crawled a switch, you can simply browse to the "OS" tab and view all your interfaces and their descriptions. Clicking on one brings up the graphs of bandwidth, packets per second, and errors. The performance tab provides the CPU and memory usage graphs for your Cisco. So long, Cricket.
These features are all great, but we really want to obsolete Nagios. Sorry Nagios, you're a pain to configure and scale.
Zenoss has a leg-up already, since it knows about most of our servers as a result of discovering them. It tries to automatically generate events when the CPU load goes too high, or when a well-known service it has discovered seizes to function, but we certainly don't want pages in the middle of the night about things we could care less about. Configuring monitoring is not straightforward in Zenoss, but we're expecting to discover that it is both fun and productive. We're off to read the documentation, install some ZenPacks from the community pool, and see if we can really wean myself off Nagios.