Post-upgrade Tasks for Saving Catalog Space

Posted October 25, 2016 by Sarah Lemaire, Manager, Vertica Documentation

Selective focus high angle view of businesswoman drinking cup of coffee and working on laptop at office desk
 

When you upgrade to 7.2 and later, not only can you take advantage of the new features, you can perform tasks to save substantial space in your Vertica catalog.

Specifically, you can save space by:

  • Bundling data files
  • Consolidating unsegmented projection names
  • Reclaiming catalog space

Imagine you’re moving. You might take this time to go through your possessions and donate some things that you don’t use anymore. You do this to make moving easier and to reclaim unused space. Why not do the same with your Vertica catalog? Just remember three things: bundle, consolidate, reclaim. In moving terms, you might bundle your fluffy winter jackets into a vacuum bag and consolidate your half-empty boxes of cereal into one container. After doing so, you can reclaim your space!

In Vertica, you can bundle your data files and/or consolidate unsegmented projection names, and then reclaim that newly acquired space. Let’s look at these three aspects in more detail.

Bundle data files

You’ll want to bundle data files because it reduces the number of files in your file system. How much does it reduce it? By at least 50%, and often much more! Bundling also improves the performance of file-intensive operations like backups, restores, mergeouts, and moveouts.

With 7.2 and later, newly created storage is automatically bundled. Here’s how to bundle existing data files:

Use the COMPACT_STORAGE meta-function to bundle existing data (.fdb) and index (.pidx) files to the .gt format. The .gt format is automatically enabled for any files created in version 7.2 or later.

=> SELECT COMPACT_STORAGE (object_name, min_ros_filesize_kb, small_or_all_files, simulate);

Vertica creates small files for the following reasons:

  • When you have tables with hundreds of columns containing NULL valued columns.
  • When your partition ranges are small.
  • When tables are wide and sparsely populated. In this case, columns compress very well.

Setting the simulate parameter to TRUE allows you to see the results of bundling, without actually performing the task. The following example shows what happens when you bundle the my__table table:

=> SELECT COMPACT_STORAGE('my_table',1024,'small','true');
Task: compact_storage
On node my_db_node0001:
Projection Name :public.my_table_b0 |
selected_storage_containers :1 |
selected_files_to_compact :6 |
files_after_compact : 1 |
modified_storage_KB :0 

On node my_db_node0002:
Projection Name :public.my_tables_b0 |
selected_storage_containers :2 |
selected_files_to_compact :12 |
files_after_compact : 2 |
modified_storage_KB :0

See our documentation for parameter specifications.

Consolidate unsegmented projection names

You can also save space by consolidating unsegmented projection names.

When you run the CREATE PROJECTION statement with the UNSEGMENTED ALL NODES clause, Vertica creates instances (buddies) of the unsegmented projection on all cluster nodes.

Before version 7.2, Vertica gave each instance a unique name and created a projection object for each node. Each projection object had its own statistics.

Now, Vertica uses the same name to identify all instances of the unsegmented projection, as seen here:

postupgrade2

 

Mapping all unsegmented projection buddies to a single identifier saves significant space in the database catalog. To upgrade existing projections to take advantage of this benefit, run the MERGE_PROJECTIONS_WITH_SAME_BASENAME meta-function with one of the following arguments:

  • An empty string, which tells Vertica to consolidate all unsegmented projection buddy names under their respective projection base names.
  • The base name of the projection whose buddy names you want to convert.

Reclaim catalog space

Now that you’ve bundled your data files and consolidated unsegmented projection names, reclaim your newly-acquired catalog space. If you don’t, on restart Vertica reads all the old projections into memory (and then throws them away). However, the catalog allocator doesn’t free memory.

To reclaim catalog space:

  1. Reset the CatalogCheckpointPercent configuration parameter.The CatalogCheckpointPercentage parameter specifies the threshold at which a new checkpoint is created for the database catalog. By default, it is set to 50%, meaning that when transaction logs reach 50% of the size of the last checkpoint, Vertica creates a new checkpoint. Each checkpoint demarcates all changes to the catalog since the last checkpoint. To take advantage of the newly-acquired catalog space, you can temporarily set the percentage to 0 to set a checkpoint for each transaction:=> ALTER DATABASE mydB SET CatalogCheckpointPercent = 0;
  2. Set the next checkpoint.The following example show how to tell Vertica to set new checkpoint(s) now, using the new parameter setting:
    => SELECT hurry_service('SYSTEM','catalogcheckpointer');
    hurry_service
    -----------------
    Service hurried
    (1 row)
  3. Reset the CatalogCheckpointPercentage to the default value:=> ALTER DATABASE mydB SET CatalogCheckpointPercent = null;
  4. Restart the Vertica database.

To learn more about post-upgrade tasks and Vertica in general check out the links below: