Why Do My License Audits Show A Significant Size Reduction?

Posted July 24, 2015 by nunziato

Three 3D arrows, different colors pointing in different directions

When you run a database audit, you may notice that your Vertica audit size has reduced from the audit size calculated before you upgraded to version 7.1 SP2. Don’t worry, you haven’t lost any data. The size reduction is due to storage changes related to licensing.

Vertica computes the effective size of the database based on the export size of the data.

  • Prior to Release 7.1 SP2, Vertica calculated the effective size of the database by computing the character width of each column value and adding a 1-byte cost for a delimiter. The delimiter cost is most noticeable with short values (for example, single digit integers), notably nulls which are 0 width without the separator.
  • As of Release 7.1 SP2, Vertica no longer counts a 1-byte delimiter value in the effective size of the database. The Vertica audit license size is now based on the data width alone.

Vertica audit size may be greatly reduced from the previous version audit size because Vertica no longer adds a 1-byte value to account for each delimiter. Under the new sizing rules, null values are free.

In addition, as a result of the change, compression ratios show less compression than previous versions.

You can find detailed information on how Vertica calculates database size in the Calculating the Database Size section of the Administrator’s Guide.

What if I’m Still Seeing Discrepancies?

If you are seeing audit discrepancies that can’t be accounted for by Vertica not counting separator characters, consider the following:

  • AUDIT uses a random sampling mechanism. Expect some variation when performing audits, especially if you have, for example, wide VARCHAR columns.
  • The data audit size is not the same as your actual database size. Your actual database size includes the data excluded from the audit. If you compute your compression ratio by dividing the audit size of your database by the physical size of the database, add the separators back in for an accurate computation.