What does it mean to you have your Backup data globally de-duped using Netabckup Appliances?!
Of course De-dupe is a great thing, the first time I realised what is De-duped was 3 years ago when I worked for a NetApp Partner and found out how they do De-Dupe on their SAN storage, I loved the ability to eliminate redundant data from your SAN.
But what does it mean to “globally” De-duped at your backup, and I will tell you later why I placed “globally” between brackets .
I didn’t care much for Backup De-Dupe, to be honest, I knew that De-dupe is cool but those are backups, they can be safely not De-Duped (if this is grammatically correct ), who cares right ?, I didn’t realise how much I was mistaken until 3 weeks ago when I attended the NetBackup Appliances training, as the same question was raised.
The trainer explained an example that blew my mind, I didn’t realise how much saving a company can achieve using De-dupe backup data Globally, how, let us see:
Assuming a company that is operating 20 TB of Data (I made the examples little bit bigger to demonstrate how much saving you are getting), those data could be any type of data (VMs, Files, Mix or anything). let us check the following table for 2 weeks worth of backup data size (2 weeks to demonstrate the effect of full backups):
|Run||None De-duped size||De-duped size|
|First Week Full Backup||20 TB||Maybe 10 TB (remember the data is De-duped and expected to see 50 to 60 % size reduction)|
|Full Week of Differential Data||5 TB||2.5 TB (De-Duped Data, size reduced)|
|Second Week Full Backup||25 TB||maybe 0 or a worth of only 1 day of data, how much is that 100 GB ?!)|
|Total||50 TB||15 TB|
What?, why is that?, Well because Netbackup Appliances with the De-dupe will see the full backup again as data that can be De-Duped and will be 100% De-duped and will only backup the data that has been changed since the latest incremental backup. (how much is that, it will be for sure much more less than the full backup ).
Note: maybe the example is not fair, maybe your software is using some sort of de-dupe technique, but is it a global de-dupe, do you get the full de-dupe efficiency across all the data ?! do you get it across sites, is it mixed with the replication ?!
There is another edge, there are a lot of backup software that can do De-Dupe, but who can do it globally across all the backup sets that is running within the environment, I think none, all Backup Software do the trick on the Job basis, meaning that data within the single backup Job, Folder or disk is deduped, not globally across all the backup jobs, and ….and across the appliances themselves (DR site scenario or remote Sites with NBU appliances scenario).
I loved the backup De-dupe, I loved them so much, I will start from tomorrow let you see NBU appliances in action, I setup the lab and you will see the NBU Appliances effect starting tomorrow, buckle up and enjoy the ride.