The Magic of Space Saving Technology

Scott Delandy and I teamed up for another fun video to share a simple explanation of technology -this time with a magic trick! In this video, we cover space saving technology typically used in storage arrays: thin provisioning, snapshots, deduplication and compression. More technical details on all of these things below, but please check out the video:

What I love about this video is that it is simple, storage agnostic and fun. Anyone with a basic understanding of computers should be able to understand the concepts at a high level. So let’s discuss the importance of each, how they are used in the industry and how they apply to EMC technology.

Thin Provisioning

Thin provisioning is used just about everywhere today. Simply put, thin provisioning shows a disk to the host at whatever size you choose, say 1 TB, but only writes the data the host writes. It’s pretty common in the industry to see people claim that thin provisioning will gain you 2:1 in storage efficiency. Why? Because hosts rarely use all of the capacity we give them. We can take the data the host writes and distribute it across a large pool of disks thanks to the use of virtual provisioning. The devices given to the host share the resources of a larger pool of disks instead of being localized. The data location for each device is tracked separately with pointers called metadata. In the early days, we could get in a bit of trouble in the database space when the DBAs would write 0’s to newly created database volumes. This was solved by “zero space reclaim” initially where the array could go in post process and manually clean up the zeros written out. Now, we just don’t write the zeros. We would also eventually have issues because of the way Operating Systems handle deletion of data: marking it for deletion instead of writing 0’s. This is now handled by intelligence based on the OS that will clear this “deleted” data for us. The most exciting advance in thin provisioning for me is the introduction of all-flash array technology which means no compromises in performance to use thin devices over devices that are fully written out.

Snapshots

Replication technology has been at the heart of our Customer’s business needs longer than the 10 years I have been in the storage industry. Since the introduction of snapshots, which are point-in-time pointer-based copies of of a given source volume, customers have wanted to implement them exclusively because of the space savings they provide. The problem with traditional means of storing data is that those volumes all point back to the spinning disk, which is a significant performance degradation on both production and the copies. With all flash systems, we can now provide the performance of a full copy, without sacrificing the performance of production and the subsequent copies. We can make more copies with equal performance and save a ton of space! What happens when new data is written to the source or a snapshot? That just gets redirected to new space. Simple. The space savings you get with snaps isn’t X:1 because of the changes written to the source and snapshot volumes, but depending on the workload, it can be very significant.

Deduplication

Deduplication works by inspecting small chunks of data to see if they match, then only writing that data once. This means that multiple hosts that share the same storage system can share the same chunk of data using pointers similar to that of snapshots. Deduplication can work really well with some workloads, specifically VDI/VSI where we have seen ratios in the 12:1-15:1 range. XtremIO is our flagship product in this space. XtremIO is awesome because deduplication is done in-line all the time with consistent low latency. In-line deduplication is an important differentiator because sorting through the data and consolidating after the fact is inefficient, takes up extra space when you need it most (when the array is busy) and adds unnecessary utilization to the system and drives. For database workloads, deduplication doesn’t work as well. For those workloads, better reduction can be achieved by using compression.

Compression

Compression reduces the data by applying an algorithm that allows for reduction of 1’s and 0’s written to disk without losing any data. It’s more complex than that under the covers, there’s some serious work that goes into making compression work on a given technology. Again, mileage varies here based on your workload and data. Some things compress well, others not so much -specifically if the data is compressed at the host level. All of our all flash array technologies will offer compression. XtremIO has it available and completely in-line today. In line compression will be available on the VMAX All Flash later this year. The reasons for in line compression over post process (or in line that becomes post process when the array is busy) are the same as those outlined above for deduplication. A rule of thumb reduction estimate here is 1.8:1-2:1, but it depends on the data you are storing. In most hybrid array models, compression comes at a significant performance penalty, but with the enabling technology of all-flash you don’t have to sacrifice nearly as much performance for space savings.

While we employed a fun magic trick to illustrate the concept of no compromise space-saving technology, at the end of the day there’s no magic to this stuff. It’s all just some well-written software that makes really cool technology even better. I hope you enjoy the demo. As always, I welcome your thoughts.