Log Analysis: Splunk Digs Deep, Displays Shallow
Splunk, Inc. has received plenty accolades lately, mostly as a result of its effort in making Splunk, its flagship product, available with an open source API. Splunk seeks to parse every log file within your IT infrastructure, and then correlates the data in a meaningful way. After it consumes tons of data, Splunk's Web interface makes it very easy to grok the root cause of most issues without having to manually peruse tons of separate log files.
Splunk's main purpose is to figure out relationships between disparate and seemingly unrelated data. During the development of Splunk, they studied how expert systems administrators made correlations in their heads, and realized that this could all be done in software. For example, if an administrator notices a slow Web server, the first place she might begin investigating is the Web server logs. Many times, the reason for the slowness isn't simply because the server is handling too many requests. Sometimes there are underlying problems with some backend services, like a slow-to-respond SQL server, or a file server with a degraded RAID array. All of these things can be seen at a glance with Splunk's correlated display, and you won't have to think of all the possible causes before you start hunting.
It's not that Splunk will make the expert admins more useful than they already are, since they would be able to find the issue after grepping through a few log files anyway, but Splunk certainly makes the process faster. It also allows non-experts and people unfamiliar with a datacenter's infrastructure to extract some meaning out of log entries.
Most people would summarize this software in a relatively vague manner: "Splunk is a search engine that indexes your log files." Simply setting up a syslog server accomplishes nearly the same thing, so Splunk must be doing something more, and indeed, it is. CEO Michael Baum has been quoted numerous times asserting that people were using a "gross oversimplification," usually to debunk the media's tendency to describe Splunk as a Google-like search of your log file data. Splunk is completely different.
"... IT data changes every millisecond--its streaming, rapid-fire information that isn't hyperlinked to anything," Baum explains. Splunk is clearly in a new realm, and its special algorithms focus on correlating data into events, rather than simply indexing things.
To accomplish all this, Splunk takes log file data, SNMP traps, and even log data stored in a database and puts it all together. The underlying log format doesn't matter--Splunk doesn't depend on W3C standards or anything like that to parse data in a pre-determined format, it would be too limiting. Instead, Splunk makes heavy use of timestamps and keywords to correlate data into events that matter. Of course, all of this would suffer from the same pitfalls as "one huge log file" if Splunk didn't take great care in creating a usable interface.
Splunk's Web interface is truly innovative. Similar to Google's Gmail, Splunk uses AJAX to make everything clickable. If you see, for example, a message i.d. from the e-mail you're trying to trace in your sendmail logs, all you have to do is click on it and all the log entries associated with that message i.d. will be displayed. Yes, it's much like plain old grep in this example, only these entries have been indexed, and you don't have to wait for grep to parse your entire log file again. The Web interface will even plot histograms for you if you hover the mouse over an event i.d., showing you exactly how widespread an event is by comparing it to the complete index.
Splunk is available for free, or "trial" usage. The free version is limited to indexing 500MB per day, which is more than enough data to see that using Splunk is advantageous. The installation process is a snap, and its Website truthfully advertises "less than 15 minutes to download and install." Usage is a snap as well, you can configure Splunk many ways, but in it's simplest form you can simply select a log file to analyze from the Web interface. You can even drop a tar file into the special Splunk directory; it will be noticed and read in no time.
That's really all there is to Splunk installation. After you've loaded some data you can start searching through your logs. As soon as you begin typing, you notice a drop-down auto-completion box, and realize that Splunk has really thought of everything. Search for a hostname, search for an error message, search for anything that appears in your logs. If you get too many results, there's no need to modify your search results (unless it was truly a horrible search), you can simply hold ctrl-alt and click on a word to have similar entries removed.
Previously we referred to the open API that Splunk provides. There are a few (seven, to be exact) plugins available that have been created by Splunk users at splunkforge. The open source community website is only 9 months old, but the seven plugins listed are very promising. Most of them focus on "cleaning" of specific logs, like Oracle or MySQL to give Splunk an easier set of data to parse. Perhaps the lack of plugins indicates that Splunk already takes care of most people's needs, or that there simply isn't a large developer following for a mainly non-free product. It's likely the former.
All in all, Splunk is a very useful product. It's painless to test, so give the free download a try, it's available at splunk.com. Many smaller sites could even get by on 500MB per day, so it's actually a free product for some.