Hire a Babysitter for Your Windows Apps
After installing Windows-based applications in a production environment, there's typically some level of babysitting (monitoring) involved.
The number of products out there for monitoring systems, services, applications, and networks is mind-boggling. Luckily (or not) some Microsoft-based services and applications can have very specific monitoring requirements that narrow the scope just by their nature.
NetIQ's AppManager Suite, Microsoft Operations Manager (MOM), VitalSuite, SiteScope, and of course, HP OpenView and Tivoli are just a few commercial products available. In this overview, we'll cover Microsoft's solution (MOM) and NetIQ's AppManager, since these two are related by technology and commonly used for Windows-based application environments.
MOM's Relationship to NetIQ and Other Products
NetIQ is both a partner and a competitor with Microsoft. Microsoft licensed the source code for NetIQ's Operations Manager in October of 2000, and MOM is based on technology from NetIQ. MOM 2005 and MOM 2005 Workgroup Edition (formerly known as MOM 2005 Express) are part of Microsoft's ongoing Dynamic Systems Initiative, the company's long-term systems management strategy. MOM 2005 Workgroup Edition is for sites with 10 or fewer servers.
(Click for a larger image)
If your budget can't handle the hefty price-tag of these commercial products, or if basic monitoring such as network connectivity, uptime, CPU, memory and disk usage is all you need for your Windows systems, then an open source solution such as Nagios or Zabbix might suffice.
But for mission critical Windows-based applications and services, open source solutions sometimes can't suitably monitor these environments. Monitoring Active Directory, Exchange, IIS or SQL servers can require some very specific capabilities that open-source solutions often can't provide "out of the box." Even Nagios requires some sort of proxy mechanism to communicate with Windows servers for monitoring and responding to Windows Event Log and Service failures. These are fairly basic monitoring requirements for Windows-based systems.
Windows-based Distributed Applications: Problem Children?
Particularly important to many organizations is monitoring of Windows-based distributed applications that use .NET, COM+, MTS, MSMQ and other such technologies which preclude use of open source solutions.
NetIQ's AppManager is one of the most prevalent commercial products used for monitoring of Windows-based distributed applications. AppManager is capable of monitoring such obscure items as MSMQ (define) "Incoming Message Rate," "Active Queue Bytes," and "Active Queue Messages" or COM+ (define) response times. Process memory usage is another common item to monitor in Windows-based applications. For monitoring obscure COM counters, Microsoft Premier Support offers a .dll called COMPSTAT2 which exposes advanced counters so that these can be handled in NetIQ like other performance monitor counters. Microsoft undoubtedly has a few such .dlls up its sleeve!
(Click for a larger image)
As a monitoring tool for Windows-based application environments, NetIQ AppManager has some very useful capabilities. One of these is the Set Resource Dependency script for Microsoft-based clusters. When applied to both nodes of a cluster, it will automatically prevent monitoring jobs from running against an inactive cluster node, so that false alerts are not generated. NetIQ automatically switches its monitoring to the active node if there's a failover.
Maintenance Mode is a similar useful function. When a system is placed in Maintenance Mode, monitoring scripts do not run against it. This is useful for planned downtime and reboots, so that false alerts are not generated.
AppManager also uses the concept of Knowledge Script Groups, which is a grouping of monitoring scripts that can be applied to similar types of servers. This makes management and execution of monitoring scripts much easier, and keeps them more organized.
(Click for a larger image)
Interviewing the Babysitter
There are a few obvious things to consider if you're selecting a monitoring solution for Windows-based systems and applications.
Since custom scripting is often involved in monitoring solutions, find out what sort of scripting language each product uses. NetIQ for example, uses VBScript. Along with adhering to any SLAs, evaluate the most common problems that result in system or application downtime in your environment. What type of trending and reporting do you need for your systems for SLA and other purposes? Will you need to justify the purchase of new equipment based on increasing demands on existing equipment? Which product provides the most capabilities 'out of the box' based on your specific needs? How much custom-scripting or extension is involved, and how difficult and costly is this likely to be? How will it scale in your environment if it's to be used company wide?
The best advice in general about any monitoring solution is to plan wisely and keep things as simple as possible. Be frugal about what to monitor, and about data collection. Plan your data retention and reporting needs carefully. Databases can get unwieldy very quickly if you're collecting a lot of performance data and keeping it for a long time. Also be stingy with alerts. They can add up quickly, particularly with a large numbers of servers being monitored. The initial tendency is to send more alerts than you need, or even want!
If it's your responsibility to choose a monitoring solution, evaluate available tools carefully. If it doesn't do the job, then you'll have a more difficult time doing yours.