Life in the Real World

Life in the Real World 

The Perfect (Virtual) Marriage: Deduplication and VMware
By Larry Freeman and Bill May
Article adapted with permission from NetApp's Tech OnTap, January 2008

VMware® has become one of the more popular use cases we've seen for NetApp deduplication. Shortly after the release of deduplication with Data ONTAP® 7.2.2, customers began reporting great success deduplicating VMware virtual machines (VMs), both in traditional VI3 environments and also emerging VDI environments.

Naturally, we wanted to take a closer look and discover why they were so excited. The answer came quickly: They were consistently seeing space savings of 50% or more with virtually no performance impact. Some were obtaining storage savings as high as 90%. Here's how:

NetApp Deduplication

The unique advantage of NetApp deduplication is that it can take any NetApp flexible volume (FlexVol® volume)–regardless of how the data was written into the volume– and easily identify and eliminate duplicate blocks within that volume. If two or more blocks are the same, we eliminate the duplicate blocks and change the data pointers so that all the duplicates are redirected to a single data block.

It doesn't matter what the blocks are or what application they belong to; if the blocks are the same, the duplicates are eliminated. This is in sharp contrast to most other deduplication products out there, which are predominantly limited to use with a single application–typically backup.  Another NetApp advantage is that you can deduplicate existing data volumes. You don't have to have deduplication running from the start. You can take a volume that's been in use for a long time and recover significant disk space through deduplication.

How NetApp Deduplication Works

When deduplication is enabled on a volume, it creates a list of the digital fingerprints that represent all blocks in use. These fingerprints are already part of the Data ONTAP metadata, so it is not necessary to create a new one for each block. By comparing these fingerprints, it is relatively easy (that is, system overhead is low) to determine which blocks are duplicates. (Possible duplicates are compared to ensure they are indeed the same.) Then, it's just a matter of bookkeeping to change the reference pointers and eliminate the duplicates.  The fingerprints are only used to identify duplicate blocks; they are not used to look up or access data. Thus, data access remains fast and is not subject to data corruption due to the deduplication process. By the way, this is the same basic process we've been using with our Snapshot™ technology for over a decade: using one "physical" data block to represent many "logical" data blocks. The deduplication process is simply run on a volume periodically whenever you need to reclaim storage space. Because of its low overhead, NetApp deduplication can be used with a wide range of workloads.

VMware Environments

VMware is a terrific technology that reduces the number of servers needed in the data center by consolidating several physical servers into one "virtual" server. VMware accomplishes this by allowing users to first create a master template for each application environment, then to "clone" these templates into many VM images. Once the clones are created, they are installed concurrently as "guests" on a single server. By virtualizing your server environment, you can utilize your servers much more efficiently. VMware users typically run six to 10 VM guest operating systems per physical server, although we have heard from some customers that they are running up to 70 VMs on a single server.

The Perfect Marriage

While VMware provides a valuable cost benefit by consolidating your servers, it is not quite so efficient at consolidating the storage used by VMware clones. That's where deduplication comes into the picture.

Each cloned VM image requires the same amount of physical storage space as the template from which it was created, but it is largely redundant. This makes them good candidates for space reduction through deduplication, but–because VMware is a primary storage application–users are reluctant to impose any additional load on these servers, which might degrade end user read/write response times.  NetApp deduplication solves this problem. Because it provides deduplication with minimal system performance intrusion, users can substantially reduce the amount of storage capacity required to house VMware clone copies without affecting business workflow.

How is this possible? NetApp deduplication is an intrinsic part of Data ONTAP and its WAFL® file system. Unlike other forms of deduplication, NetApp deduplication utilizes many characteristics inherent within the storage operating system. There are no need to create complicated hashing algorithms, no lookup tables to search for and reconstitute data, and no rewriting of data during the actual deduplication process. All that's required is a small digital fingerprint for each 4KB WAFL block (they already exist in the system), a quick comparison of these fingerprints, and a simple blockredirect process to rereference the original data block. Duplicate data blocks are then released back to the system.  NetApp deduplication is performed as a low-priority background process. This process can be run automatically any time the VMware data grows beyond a predetermined threshold, or it can be scheduled to run only during convenient off–peak hours.

Sounds Good, How Do I Start?

To get started, you first have to add NearStore® and deduplication licenses to your system. Then you can enable deduplication on your desired volume(s) with a simple CLI command. This will trigger the process of gathering fingerprints in each enabled volume. If you have existing data in the volume, NetApp deduplication can optionally scan that data too. Once deduplication has been enabled, it's simply a matter of deciding how often you want to reduce your volume space requirements by running the deduplication process. Most customers run deduplication nightly, since their daily data change rate is normally low enough that the deduplication process will run quickly.

© 2008 VLSystems, Inc. All rights reserved.  |  Privacy Policy  |  Home
Powered by MOSS 2007