Demystifying Deduplication

With the emergence of flash technology, the need for deduplication rose due to the expense of the drives. Deduplication has become a blanket term to span a range of space-saving technology from snapshots to zero-space reclaim. Here, I am going to try to break it down in the way I do to my customers.

As usual, the TL;DR:
1. You say you offer deduplication. I do not think it means what you think it means. – Always ask for specifics.
2. As it turns out, you’ve got deduplication on that 10 year old array hidden in your garage! Congrats!
3. Wait? How important is this to me? The answer, as always: it depends.

Deduplication. You keep using that word.
When people ask if we offer deduplication, many vendors will say yes. In the minds of our customers, deduplication in its purest form is single-instance data storage. In order to do this successfully, the sweet spot is around 4k or 8k IO. Most workloads are bigger than that, which leads to the need to break the incoming write into smaller chunks to enable that level of data reduction. This means that CPU and memory resources have to be dedicated to this task and care must be taken to architect the technology to account for this additional workload. The benefits can be massive because it means that every small block of data that matches that 4 or 8k block can be written only once and accessed by all at the array level saving both drive writes as well as an incredible amount of space. Even then, the implementation of single-instance data storage varies: some will only do it when the array is under light load, others do it all the time. What good is deduplication if it is only performed when the array is not busy? What happens when you drive heavy workload consistently? It is important to ask for specifics in this space and learn what you are getting and how it may apply to your workload. So what does deduplication really mean and why such a slippery slope? Wikipedia, FTW: “In computing, data deduplication is a specialized data compression technique for eliminating duplicate copies of repeating data. Related and somewhat synonymous terms are intelligent (data) compression and single-instance (data) storage.”

As it turns out, you’ve got deduplication on that 10 year old array hidden in your garage! Congrats!
By the definition above, data deduplication can also include snapshots. Snaps have been around for over 10 years now. If you have an old decommissioned array in your garage, odds are you have “deduplication” on it in the form of snapshots. What makes snapshots so awesome now is the ability to use them at little to no performance penalty –mileage varies by vendor and platform, be sure to ask the question. Snapshots enable lun-level “deduplication” where the same lun or set of luns is copied and presented for backup, reporting or other needs as a performant alternative that enables the production host to focus all resources on production workload while the array seamlessly presents a safe, tracked copy of that volume to other hosts for necessary processing. By definition, this is a form of data deduplication, but it doesn’t quite scale to the same benefits as single-instance data storage. In fairness, with my customers I tend to classify snapshots as part of space efficiency along with compression, but some do not make that distinction.

And now for the dreaded “It Depends”

I could start a bit of a battle over technical religion and tell you what I think about deduplication and snapshots and what is really important, but the truth is that it all depends on your workloads and business needs. In the case of VDI, VSI and some other workloads even as the price of flash decreases, deduplication is absolutely worth the additional processor and memory resources thrown at it. Running a large block workload or a data warehouse? Snapshotting is probably good enough because that CPU and memory is probably better used servicing reads and writes. Many things fall in that in-between category and can go either way. What’s most important to your business needs? You can talk to your application owners and for once say “as you wish” or at the very least…

Stop the Insanity: A Practitioners Guide to Copy Data Management

I’ve been doing replication automation and application integration for a decade at EMC. I got into it early on because few people wanted to touch it and I was the new kid. Why?
I’ve kept it all these years because of the extreme value of this conversation with our customers. Many of my customers have application teams that need to enable quick reporting on their applications outside without taxing their production workload. The frequency of this need ranges anywhere every hour to weekly. The good news is that it is now easier than ever to give your application owners the ability to schedule and create on-demand, application-consistent replicas to meet your business needs with Appsync. Let’s take a look at how to do this right.

The TL;DR:
1. Standardize your builds. Integration problems happen with pet configurations.
2. Bring your DBAs and App owners in early. They know what they need and where their bodies are buried.
3. The devil is always in the details. Don’t assume that because a tool has App integration that it supports every configuration of that app.

Standardize your builds


While individuals on your team may be unique, your server/app builds should not. Implementation challenges come with introduction of any technology; the most challenges come with unique configurations and one-offs. If you adopt consistency across your builds and standardize, the problems you encounter will be minimal in comparison. Appsync will plug into your host environment and discover the devices related to your app, create matching storage for you and create an application consistent copy for Oracle, SQL, VMWare or Exchange. This eliminates a lot of the work in trying to match your backup and reporting environments to your source. Change your production app to include more storage? No problem! Appsync will discover that and make the adjustments in your target storage and reporting/backup environments too! So easy, no one has to lift a finger. You can chill like this awesome guy:

Bring your DBAs and App owners in early

Ever play telephone? It can be really fun to see what comes out the other end -unless you are counting on that kind of communication to run your business.
Bring your app owners and DBAs into the conversation. They know the special configurations and requirements of the app/DB environment that are going to impact your success. Odds are, right now they are going rogue and creating their own database copies the hard way. Why not be the hero and give them a seemless, automated technology that cuts their time to value by up to 5x? If you are running all flash, you can become a real storage hero by introducing 5-20x space efficiency while maintaining that high level of performance that meets the needs of the DBAs. Requirements such as RPO can be plugged into Appsync for reporting on SLAs. Appsync even gives you a cool dashboard to report on those SLAs. The configuration details can also be preserved and manipulated to fit your backup and reporting needs. You can give your DBAs access to their specific application environments within Appsync to allow them to set these advanced configuration parameters automatically without interfering with the other Applications in Appsync control. This keeps the storage admin from playing telephone with the application teams and allows the application teams to create the replicas they need with little more than a little policy setting on the part of your storage admin. These policies can limit application access, the number of copies created and limit the ability to restore to production for the ultimate in protection.

The devil is in the details.

One of the common mistakes that integrators make is making the assumption that support for a specific application or widget implies support for all configurations. This is not always the case.
Some application configurations cannot be supported and automated by tools for a variety of reasons. The inclusion of your DBA team as well as a good read of the release notes for any product that enables integration is advised before you deploy your solution. Don’t have a supported application? For tricky applications such as EPIC, you can create crash-consistent, file system replicas on the fly. It doesn’t have all of the cool bonus hooks, but it’s much better than scripting it all out and trying to support it. On the other side of the coin, many tools will only support specific storage platforms. Appsync covers the breadth of the Primary Storage portfolio at EMC: Recoverpoint, Unity, Vipr Controller, VMAX, VNX, VPLEX, and XtremIO.

With all of the options and features available in Appsync, why not choose this easy button for your backup and reporting environment? It intergrates tightly with the application and storage environments to allow you to standardize on a single solution for automating your replication needs. It offers the ability to give your app owners and DBAs control of their own environment without overrunning others leading to fewer integration issues and urgent storage team requests. Appsync will also work to automate your crash-consistent copies for unsupported applications. Copy Data Management doesn’t have to be hard anymore. Deploy the easy button for application-integrated reporting and back up and give your business time to focus on what’s truly important: the best experience for your customers.

Hard Easy Computer Keys Showing The Choice Of Difficult Or Simple Ways
Hard Easy Computer Keys Showing The Choice Of Difficult Or Simple Ways

Email is Dead! Long Live Email!

Saturday I woke up to a message from Twitter that said “You have reached 700 followers”, shortly followed by a message that updated the number of followers to 800. Given that I had been at barely 600 followers on Friday and it was early on the west coast, my initial thought was: “Go home Twitter, you’re drunk.” I started my morning routine, went for a run and later hopped on Twitter to confirm my follower count at 600… only to find it hovering around 800. What? I had been relatively quiet on Twitter, and as cool as the posts that I knew of referencing me have been, they aren’t the type to generate 200+ new followers on a Saturday. Naturally, my reaction started with concern: “I wonder what I’ll get to apologize for today! The possibilities are endless!”

storm trooper

(It really is too easy – I’ve spent too much time apologizing this year.) Then I thought “wake up your kids, I have a lesson in social media to share! It turns out all of that stuff I say to the school kids about the permanence and reach of your internet posts are true!” A simple Bing/Google search revealed the most likely source: this fun little Phrasee article. Cool Story. Oh and look there’s another one about the same Tweet! It’s like Christmas! why did I tell one of my best friends that email is dead for all to see on Twitter? Have I failed at Social Media? Isn’t the primary purpose for things like Twitter interacting with like-minded people about what we find important?

As usual, the TL;DR:
1. Email is not an effective two-way communication method for me and many of my peers in the industry.
2. Social media and mobile communications are a more effective form of communication if you want a rapid response from my demographic.

I put that tweet out there to exaggerate my opinion as many do in a forum that is limited to 140 characters or less, to socialize to my followers an effective way of communicating with me and to start a conversation about the way mobile is changing our lives. In a world where many of us are constantly staring at their mobile devices, I am inundated with information that varies in value. The value for me in email at work is as a means to search for internal information on the products I support, to send documented requests for assistance and to blast the occasional “in case you missed these 3-5 important things happening” email to my people. To me, email is an ineffective way of two-way communication because I receive a massive amount of email both professionally and personally every day. On the personal front, I can count on my email service to filter out spam and promotions with relative success. Professionally, what I get isn’t spam, but it doesn’t usually require an immediate action or attention either. It’s there as an FYI or item of note – and whatever it is will probably change at least a few more times before I really need to know it. I could try to read every email and commit all of this information that is constantly changing to memory, but if I did I wouldn’t be very good at doing my real job: talking to our customers, engaging/leading our people and of course making the number. I am still sitting on around 20,000 unread emails. I search for the important stuff in my free time. I read emails from key people in the company: my people, leaders and the occasional troublemaker. I check subjects to get a feel for what my day will be like much like people used to check newspaper headlines. While I do not think email is going anywhere anytime soon, I definitely think its use as an effective way of communicating in more than one direction has reached its “best before” date.

Social Media and mobile communications are a more effective form of interactive communication for me and many people like me. I travel a lot for work. I spend a significant amount of my time on planes or in transit. As good as technology has gotten, plane WiFi for things other than IM communications is a crap shoot at best. I get so many PowerPoints sent to my email that half of the time I don’t get the good email updates until I land.


IM, iMessage, Twitter, all of that goodness almost always works if Wifi is up. I can be an effective worker and communicator with those applications, email is too cumbersome. I can market who I am and what I am passionate about to anyone who cares to look and they can make the choice to engage me in conversation or not. The last time I took advantage of an email deal was about 5 months ago. I may open one ad a week. I’ve bought at least 5 things via targeted marketing on various social media platforms since then. What’s the difference? In my experience, the emails tend to be a wide blast with a catchy title that I always feel guilty for clicking Targeted advertising via social media already takes my interests, age and lifestyle into account and comes up with pictures of items that I may want to buy. Example: On Facebook my best female friend gets an add from a clothing company for a Doctor Who shirt. I get this:


It’s ok to laugh, I did. I value exercise over time in front of the TV and I find humor in these things. I am the type to wish someone a day based on their opinion of something I like… “If you don’t have something somewhat nice to say, well it’s your choice to have that kind of day.” I don’t have a PHD in marketing, but that is good marketing. That is how you reach a generation of young talent who is too busy focusing their time on making a difference: targeted ads with imagery that fits right into their lifestyle. I am cautious of anyone who touts any one answer as the best form of communication. I am an engineer and a leader. Show me statistics that say that email is the best form of marketing, I will show you a clever statistician that gets paid by an email marketing firm. The answer always lies between ones motives and “it depends”. As with most cases, the tools you need to do the job depend on the outcome you want to achieve. Social Media is a more effective way to engage with me and many of my peers. I prefer to engage my customers directly in person when I can because they appear to retain more information and have a better experience than they would through email or webcast. It shows them I care enough about their business to speak directly. There are other demographics that do better with TV, email and other forms of marketing. Who are you trying to target? Why should they care? What is your end goal? All of these questions and more lead you to the right tool to do the job.

Thanks to phrasee for the free press and the opportunity to have a more meaningful discussion! Open season readers! What works for you?