Splunk may be better known because of its recent IPO, but the company has been around since 2004. Its product – also called Splunk – is designed to collect and index the vast amounts of machine data generated by servers, applications, hypervisors, networking devices, websites and any other part of an organization’s IT infrastructure, plus external sources like GPS systems, stock market feeds, Twitter streams, or simply servers operating in a public cloud. It then turns this mass of machine data into useful information that can be understood and acted upon, presenting it in the form of graphs and dashboards.
According to the company, Splunk’s core capabilities are:
- Real-time visibility – searching, correlating, monitoring and alerting on events as they happen within the IT environment.
- Search and navigation – searching across potentially billions of machine data events in seconds on a single commodity server.
- Historical analytics – analyzing trends, statistics and metrics about the behavior of customers, users, transactions, applications, web servers, app servers, and networks.
Splunk can be downloaded free with a perpetual license to index up to 500 megabytes of data per day, and in many organizations the product is first used first in the IT department to troubleshoot problems, according to Sanjay Mehta, Splunk’s senior director of product marketing. After that it often spreads to other parts of an organization by word of mouth, he says.
The most common uses for Splunk, says Mehta, are:
- Application management – analyzing application and server performance issues and causes of failures
- Security and compliance – monitoring security systems, investigating security incidents and proving compliance controls
- Infrastructure management – including server and network management and managing virtualized environments to ensure uptime and identify problems
- Web analytics and business intelligence – such as examining site visitors’ behavior in real time and spotting trends and patterns in transactions
Before Splunk can do any of this it has to access the sources of machine data and index it all, and the product is designed to make this relatively easy. “Splunk will hold your hand and help you point it at the data. You select what problems you want to investigate, and it will suggest the sorts of data sources you should be looking at,” says Mehta.
Data can be captured directly over the network, but in cases where that’s not possible a common way to get at the data is to use Splunk’s Universal Forwarder. This is a lightweight application that can be deployed on many machine data sources to provide real time data collection and which can be used to monitor local application log files, capture the output of status commands on a schedule, capture performance metrics from virtual or non-virtual sources or watch the file system for configuration, permissions and attribute changes. Versions are available for operating systems including Windows, OS X, Linux, Solaris, FreeBSD, AIX and HP-UX.
When data flows in to Splunk it is compressed (typically to half its original size) and stored on a server where it is indexed and readied for analysis and correlation. Splunk supports five types of analysis:
- Time-based correlations, to identify relationships based on time, proximity or distance.
- Transaction-based correlations, to track a series of related events as a single transaction to measure duration, status or other analysis.
- Sub-searches, taking the results of one search and using them in another.
- Lookups, correlating with external data sources outside of Splunk.
- Joins, to support SQL-like inner and outer joins.
In large organizations, access to Splunk’s analytics is role based, using roles that administrators define. For example, system administrators have access to a command line interface to carry out any analysis they require; security staff can be provided with security dashboards; compliance officers can generate ad-hoc reports; and executives can order the types of business intelligence dashboards they require from IT staff. Access control points that limit functionality by role restrict the searches, alerts, reports, dashboards and views that different Splunk roles can see. The software also integrates with external LDAP and Active Directory servers to enforce enterprise-wide security policies.
Although Splunk offers a version of its application for free, this version is limited. The principal benefit of the paid-for version is that it allows the indexing of more than 500MB of data per day, but it also offers additional features (some of which have been discussed above) such as monitoring and alerting on real time events, authentication and role-based access controls, integration with single sign on (SSO) solutions and the ability to search across distributed Splunk deployments. The cheapest license, for 500Mb of indexed data per day, costs around $8000, with licenses for greater volumes of indexed data getting progressively costlier.
This pricing model aims to take advantage of the fact that most companies are generating ever-increasing volumes of data: research house IDC projects that the volume of digital data stored on systems around the world is set to grow by 45% annually from 1.8 trillion gigabytes in 2011 to 7.9 trillion gigabytes by 2015. It also projects that the market for technology like Splunk’s and for services that handle these huge volumes of data will grow by 40% a year to $16.9 billion over the next three years.
Cheap and Easy IT Management
Julie Craig, an analyst at research house Enterprise Management Associates, says that two important selling points for Splunk are that the product is relatively cheap, and that it is easy to get up and running. “Products like Tivoli and Compuware can certainly correlate information from multiple sources, but it’s certainly not trivial to deploy these products in mid-sized companies, and not everyone has the resources to that. By contrast Splunk is fairly inexpensive and easy to deploy,” she says.
For the moment, it doesn’t look like Splunk is going to face much competition, she believes. “Splunk has been out there for four or five years, so I would have thought that if it was going to face competition we would have seen it already,” she says.
In terms of how Splunk is being used, Craig’s experience is that most companies are still using it within the IT department for application and infrastructure troubleshooting rather than for web analytics and business intelligence. “I thought the business intelligence side would take off far more quickly than it has, but I expect that that will become more popular in the next year or two,” she says.
Given the popularity of the product among IT staff, she concludes that the successful IPO should not have come as a surprise. “When Splunk gets in to an enterprise, it spreads like a virus. It’s like a fork or a spoon: once you have used one you can’t imagine what you did without it.”