RiverMuse has taken on the task of making legacy enterprise monitoring systems useful in today’s IT environment. No, it’s not re-writing an HP OpenView or IBM Tivoli, instead RiverMuse has found a nice place to sit and provide even more value: as a Manager of Managers. It’s so much more than just another layer on top of existing monitoring systems, however.
RiverMuse started in 2008 as a combination of some of the the remaining innovators from Riversoft and Micromuse. Micromuse acquired the struggling Riversoft in 2002, and IBM acquired Micromuse for considerably more money in 2005. Prior, Riversoft had essentially become HP OpenView advanced edition, and with the IBM acquisition Micromuse became IBM Tivoli Netcool. Needless to say, this company has pedigree. So, what does this software do? It’s complex; that is to say, really difficult to explain: Open source manager of managers, uber event correlation, fault message aggregator, message filter and enhancer–all of those descriptions are true. None of those really explain it, either. Instead, let’s start by defining a few of the problems.
Current Problems in Enterprise Systems Monitoring
Continuous change is prevalent in today’s IT world more than it ever has been. Administrators deploy new hardware seamlessly, but that’s not even half of the story. In virtualized environments, virtual machines can spring up as the result of just a few simple commands. They can also physically move, which plays havoc on many monitoring and reporting systems.
Agility, then, is a weak point. What happens to your monitoring system when a node physically moves? Is all the historic data associated with the old MAC or IP address re-associated properly? Or does it linger behind and get associated with the new node to occupy that space? Does your monitoring system even know when virtual machines migrate to different physical hardware?
The lack of agility and ability to handle continuous change means that the two popular enterprise monitoring systems have blind spots. Silent failures can happen if the monitoring system fails to properly recognize how the environment has changed.
Finally, there is the cost. Large IT departments are still paying in excess of six figures for maintenance contracts on this aging monitoring software. Don’t forget the specialized staff required to run these systems.
That is not to say RiverMuse can replace expensive and antiquated network management systems. Instead, it sits and gathers information from your NMS as well as other sources, and from this vantage point can make intelligent decisions about event and fault management. Let’s take a look at what it actually does.
RiverMuse’s Place in the Infrastructure
To make sure we aren’t wrongly portraying RiverMuse as an add-on to the big two NMS providers, let’s talk about the open source solutions briefly.
What is number one complaint with the open source monitoring system Zenoss? It’s the noise. You cannot even define parent-child relationships to stop Zenoss from alerting on 100 hosts when a switch has failed. With the other big option, Nagios, the main complaint is configuration. Alerting, and even adding new nodes, is a manual, time-consuming process. What if you could just send all alerts and data from these systems to a smarter system. With RiverMuse being a manager of managers, this is possible.
RiverMuse correlates events from monitoring systems, applications, and infrastructure components. This means that you aren’t constrained by information from only a few of the critical sources any longer. SNMP traps can be sent directly to RiverMuse from various infrastructure and application sources. Your current NMS can likely do that, but RiverMuse can also digest syslog and Windows events. This means event correlation goes much deeper and that actually identifying the root cause of outages is possible from within the alerting system. RiverMuse, then, is almost like a Splunk but with more enterprisey features and robust alerting capabilities.
RiverMuse will then enhance the information it gathers by taking data from configuration management databases. For example, getting an alert from your NMS that says “node web2347 is down!” is not very useful. Sysadmins even struggle with remembering whether or not they should be in panic mode. Business users and NOC employees less familiar with hostnames in an infrastructure certainly aren’t going to know if that alert means anything.
Finally, once events have been given additional business information, they get filtered. This is where RiverMuse augments many half-baked NMS solutions. It de-duplicates, filters, and summarizes only events that matter before sending them on to operations staff for immediate attention. Some NMSes do this fairly well, and some completely ignore this. RiverMuse, however, owns this functionality as its core function, and it uses more information than is available within a traditional NMS to make its decisions.
Configuring these types of systems is going to be complex. The data and rules RiverMuse is dealing with can only be automated so much before it starts to make guesses about individual business requirements. It does, based on the demos we’ve seen, make this process very easy. We all know that dedicated FTE is required to manage the big two NMSes. This time is often spent updating the NMS with new information about network changes and fighting with alerting rules. With RiverMuse’s vision, it seems entirely possible that after deploying RiverMuse these staffs will be freed up to focus more on the business logic of how the infrastructure interacts.
RiverMuse is partially open source. RiverMuse core includes all the basic components, but seems to be fairly bare-bones compared to the pro offering. But hey, RiverMuse say it has “gifted” this to the open source community. Misunderstandings of “open source” aside, this is a fascinating system built by a fascinating company comprised of veterans in the NMS space. It’s worth a look.