Understanding Backup Space Utilization

Posted July 10, 2017 by Soniya Shah, Information Developer

This blog post was authored by Soniya Shah.

Creating regular database backups is an important part of database maintenance. The vbr utility lets you back up, restore, and copy your database to another cluster. You can create full and incremental backups, and even back up objects, such as tables. Ideally, backups should match what is on-disk.

Sometimes, you might find that your backups take up more space than intended. You might also find that the files on-disk don’t match the backup, which points to a larger backup issue. This post shows how to evaluate that space utilization.

If you have more than one backup host, you will need to run these steps on each host.

1. First, let’s create a table that we will use in this example that stores files: => CREATE TABLE files (sz int, file varchar(200), path varchar(2000), dt timestamp, storage_id varchar(200) default substr(file,0,49) ) ORDER BY storage_id SEGMENTED BY hash (storage_id ) ALL nodes ksafe; 2. Next, we’ll create two tables. The first stores snapshot information and the second stores backup information: => CREATE TABLE snapshot_info ( manifest_file varchar(200), storage_id varchar(200), type varchar(50) , storage_location_id int, length int, digest varchar(50)) ORDER BY storage_id SEGMENTED BY hash(storage_id) ALL nodes ksafe; => CREATE TABLE backup_info ( manifest_file varchar(200), storage_id varchar(200), type varchar(50) , ref_count int, length int, digest varchar(50)) ORDER BY storage_id SEGMENTED BY hash(storage_id) ALL nodes ksafe; 3. Load the actual file system information into the table, where /location is the user’s own path to the backup directory: find /location -maxdepth 10 -printf "%s|%f|%p|%TY-%Tm-%Td %TT\n" | vsql -c "copy files (sz,file, path,dt) from stdin abort on error;" 4. Load the backup file metadata into two tables. The first is for the entire global manifest, and the second is for the snapshot manifest: for file in $(find /location -name *.manifest); do cat $file | sed -ne "/\[objects\]/,$ p;" | grep -v "\[objects\]" | sed 's/, /|/g' | sed "s/^/${file//\//_}|/g" ; done | vsql -c "copy snapshot_info from stdin abort on error;" for file in $(find /location -name backup_manifest); do cat $file | sed -ne "/\[Objects\]/,$ p;" | grep -v "\[Objects\]" | sed 's/, /|/g' | sed "s/^/${file//\//_}|/g" ; done | vsql -c "copy backup_info from stdin abort on error;"; 5. Take a look at the space used. The following query shows 28 TB is available for the backup: => SELECT sum(length)//10^12 TB FROM backup_info ; TB ---- 28 1 row) The next query shows the disk usage information matches the backup_info table. Each has 28TB of space: => SELECT sum(sz)//10^12 TB from files WHERE substr( path, position (files.storage_id in path), length(path)) = file; TB ---- 28 (1 row) 6. Let’s drill down further to see how much space is common to all snapshots: => SELECT sum(length)//10^12 common_size_TB FROM backup_info WHERE ref_count > 1; common_size_TB ---------------- 19 (1 row) 7. We can also see how much space is available for only one snapshot: => SELECT sum(length)//10^12 common_size_TB FROM backup_info WHERE ref_count = 1; common_size_TB ---------------- 9 (1 row) One snapshot takes nearly half the space that is available for all snapshots.

8. Now, look at how many files are accounted for in the backup: => SELECT count(*) FROM files left outer join backup_info ON (files.storage_id = backup_info.storage_id) WHERE backup_info. storage_id is not null; count -------- 663401 (1 row) Ideally, this number should match the number of files on-disk. We’ll take a look at that in a few steps.

9. Are any files not accounted for? Let’s take a look. In this case, there are no unaccounted files: => SELECT count(*), sum (sz)//10^12 size_TB from files left outer join backup_info ON (files.storage_id = backup_info.storage_id) WHERE substr( path, position (files.storage_id in path), length(path)) != file -- these are directories AND backup_info.type is null; -- unmatched count | size_TB -------+--------- 0 | (1 row) 10. What about missing files? In this case, there are no missing files: => SELECT count(*) , sum (length)//10^12 sz_TB FROM backup_info left outer join files ON (files.storage_id = backup_info.storage_id) WHERE files.file is null; count | sz_TB -------+------- 0 | (1 row) 11. Let’s confirm that files are accounted for in the backup: => SELECT count(*), sum(sz)//10^12 on_disk_TB, sum (length)//10^12 on_backup_TB from files left outer join backup_info ON (files.storage_id = backup_info.storage_id) WHERE backup_info. storage_id is not null; count | on_disk_TB | on_backup_TB --------+------------+-------------- 663401 | 31 | 31 (1 row) That number matches the number we saw previously, 663,401.

12. Suppose you’re now interested in seeing which snapshots cost the most. Run a statement to determine this: => SELECT distinct substr(manifest_file,43,15), sum(length)//10^12 TB FROM snapshot_info GROUP BY 1 ORDER BY 1; group by 1 order by 1; substr | TB -----------------+---- 20170128_160314 | 20 20170129_221407 | 19 20170131_000208 | 19 20170131_052621 | 19 20170201_020003 | 19 20170202_020003 | 19 (6 rows) 13. The following query shows space consumed by snapshots that is not shared with other snapshots: => SELECT distinct substr(snapshot_info.manifest_file,43,15), sum(backup_info.length)//10^12 backup_sz, sum (snapshot_info.length)//10^12 snapshot_sz FROM backup_info join snapshot_info using (storage_id, type, digest) WHERE ref_count = 1 GROUP BY 1 ORDER BY 1; substr | backup_sz | snapshot_sz -----------------+-----------+------------- 20170128_160314 | 9 | 9 20170129_221407 | 0 | 0 20170131_000208 | 0 | 0 20170131_052621 | 0 | 0 20170201_020003 | 0 | 0 20170202_020003 | 0 | 0 Perhaps the first snapshot in this list is the oldest and that’s why it takes up so much space.

14.Finally, we can ask – do the snapshots add up to the backup space? => SELECT count( distinct storage_id) FROM snapshot_info ; count -------- 663014 (1 row) => SELECT count( distinct storage_id) FROM backup_info ; count -------- 663014 (1 row) Yes, they do. These numbers match, which means no files are missing or leaked between the backup and what is stored on-disk.

For more information, see Backing Up the Database in the Vertica documentation.