Advances in SAN
technology finally make infrastructures more manageable and convenient to use. At the top of the list is of course virtualization, which can get confusing. In this article we’ll explain what storage virtualization is, and what types may be right for your environment.
First, a definition: virtualization always means abstraction. We make something transparent by adding layers that handle translations, causing previously important aspects of a system to become moot. With some types of storage virtualization, for instance, you can forget about where you have allocated data, because migrating it somewhere else is much simpler. In fact, many systems will migrate data based on utilization to optimize performance.
The greatest benefit of (some types of) storage virtualization is that you abstract that previously troublesome spindle count. If the workload requires more IO than the current number of spindles can handle in an acceptable amount of time, the virtualization controller can migrate data and spread it out across more. The lengthy performance analysis and workload evaluations don’t need to be that difficult any longer.
There are two kinds of virtualization in the storage world: file-level and block-level. Block-level virtualization takes over before the file system even exists: it’s replacing or augmenting existing controllers and taking over at the disk level. File virtualization requires some software be installed on the server that uses the storage, but enable such things as file-level determination of usage, which is used to determine which data can be aged to slower storage.
Storage Virtualization Methods
Host-based storage virtualization means that a driver of some sort is installed on the host operating system, and it intercepts and possibly redirects IO requests. The File Area Network (FAN) concept uses this, but software RAID
and volume managers are also an example of host-based storage virtualization.
Network-based storage virtualization is extremely interesting. A fibre channel
switch that sits between the host and storage will actually virtualize all requests, redirecting IO unbeknown to the user. This method doesn’t rely on the operating system, in fact the operating system on the host doesn’t even know its happening. This is true virtualization, though you must be extremely careful about interoperability between your switch and storage array vendors.
Array-based virtualization means that one “master” array will take over all IO for all other arrays. It must be fast enough to handle all your aggregate storage traffic, and it must also interoperate with all your existing disk arrays to realize the benefits. This method of virtualization provides the most benefits, including centralized management and seamless data migration.
The largest benefits of storage virtualization are:
- non-disruptive data migration
- centralized management
- increased utilization
- better visibility
Moving data around should not require downtime. Indeed, most people have found a way to migrate data without an outage using software RAID mirrors, but this is a very manual process and it only works between similar platforms. With virtualized storage, data migrations can be seamless and even automated.
The other big selling point for virtualization is that you can manage many existing arrays through one central point: the virtualizing controller. You still need to have the controller, and you still create RAID sets on the other arrays, but you can allocate storage from a central place and view all available storage from a single interface. Most virtualization solution work with many vendors’ products, but interoperability is still a concern.
Utilization, as with server virtualization, is also addressed by storage virtualization. By migrating data to cheaper storage, and spreading out the workloads more appropriately, you can both increase utilization and your visibility into what the overall workload is across all your storage arrays. Thin provisioning, where you allocate less space than the operating system actually sees, is also possible with most virtualization solutions. This allows you to dynamically grow your storage to meet the real needs, instead of the inflated needs your storage customers generally cite.
The problem always exists, with all types of virtualization, that you may be consolidating too much. Care must be taking to ensure that the virtualizing controller can handle the load of all other disk systems that it is virtualizing for. In essence, you must get an extremely fast, redundant, and memory-rich device to benefit from storage virtualization.
Virtualizing also means extra dependence on the “host” device, or in the storage case on the controllers that do the virtualizing.
Information Lifecycle Management (ILM
) is an invaluable tool for maximizing your investment in storage. If certain data is not used within a certain amount of time, you can configure a policy to automatically migrate it to slower, less expensive storage. With file-level virtualization, this can be done on a per-file basis, which is extremely effective. If you choose to only use block-level virtualization, then the only migration support by most vendors is the relocating of an entire LUN
. Some data may be important, and some may not.
The pros certainly outweigh the cons. With all types of virtualization, be they server or storage, you must pay attention to what’s taking place. More single points of failure are introduced, which must be accounted for, and resource utilization is always more than you estimate. It’s interesting that the main benefit of virtualization is higher utilization, but once you start scaling up a bit more, you quickly run out of resources.
The best way to combat resource limitations is to not go overboard. Also, make sure to take advantage of automated resource balancing features. Make sure to ask your storage vendor in-depth questions about how they support data migration and ILM, and of course, don’t forget about interoperability.
Storage virtualization will change the way your storage is managed. Most importantly, it will change the things the storage administrators focus on. Instead of countless data migration tasks to free up resources on one or two things at a time, they can pay more attention to the big picture: overall site scalability and performance.