Share this article:

Vertica Partitions: The FAQs

The Vertica partitioning capability divides one large table into smaller pieces based on values in one or more columns. As you will learn in this article, partitions can make data lifecycle management easier and improve the performance of queries whose predicate is included in the partition expression.

This document answers the most important questions you might have about partitions in your Vertica database. These questions fall into the following categories:

Partition Basics

How do partitions work?

Suppose you need to create a retention policy for your data. For example, you might need to retain data for only five years. You want to delete all data that is older than five years.

The Vertica partitioning capability makes it easy to manage this data efficiently. Let's see how this works.

Suppose you have a table named trade that contains data for:

  • Trade date (tdate)
  • Ticker symbol (tsymbol)
  • Trade time (ttime)

Using the following CREATE TABLE statement, specify that Vertica partition the data based on the year that the trade took place:

=> CREATE TABLE trade ( 
   tdate DATE NOT NULL, 
   tsymbol VARCHAR(8) NOT NULL,
   ttime TIME) 
   PARTITION BY EXTRACT(year FROM tdate);

When you load data into the trade table, Vertica segregates the data based on the partition expression, in this example, the calendar year:

PARTITION BY EXTRACT(year FROM tdate)

This partitioning allows you to easily manage a subset of the data that you might need to move or delete using the partition management functions.

For the trade table example, suppose you first load data for 2008. Vertica stores the data in ROS (read optimized store) containers.

Vertica creates a partition in which to store all the ROS containers for the 2008 data. The most recently created partition is called the active partition. The following graphic shows two rows of 2008 data loaded into the partition.

partition_basics0a.png

When you then load 2009 data for the first time, Vertica creates a new partition that becomes the active partition. In the following graphic, Vertica loads two rows of 2009 data into the new partition. Inactive partitions contain data that may not need to be accessed frequently.

partition_basics0b.png

When performing a mergeout, for inactive partitions, the Tuple Mover combines all the ROS containers into a single ROS container. In the active partition, the Tuple Mover uses a strata-based algorithm to combine the ROS containers.

The following graphic shows combined ROS containers for the 2008 and 2009 partitions, and newly loaded data in the active partition.

partition_basics3.png

Note For more information about active partitions and ROS containers, see Partitions and ROS Files and ROS Containers later in this article.

What is the difference between partitioning and segmentation?

Segmentation helps you split your data evenly across nodes in your cluster to take advantage of the MPP (massively parallel processing) architecture.

Partitioning helps organize data on each node into different storage containers. Partitioning reduces I/O and improves query performance.

Suppose a fact table has at least the following two columns:

  • Transaction ID (trans_id)
  • Transaction date (trans_date)

When you create the fact table, you should:

  • Segment the table projections on HASH (trans_id) to evenly distribute data across all nodes.
  • Partition the table by date.

With this table definition, the data for each partition is evenly split across all cluster nodes. This segmentation allows queries targeted for a specific partition to execute in parallel. The following simple example shows how the data might be distributed:

seg_vs_part.png

What are active partitions?

As mentioned earlier, the active partition for a table is the partition that was created last, not the partition that was updated last. The active partition contains data that is being loading frequently.

You can change the number of active partitions per table by changing the ActivePartitionCount configuration parameter. The default value is 1.

The Tuple Mover combines storage containers for inactive partitions into a single ROS container, creating one ROS container per partition. The Tuple Moves merges storage containers for active partitions using the strata algorithm, which determines how to merge the data.

If you frequently load data into the last two partitions, change ActivePartitionCount to 2. This is a global configuration parameter and affects every table. If you set it to 2, the Tuple Mover applies the strata algorithm to the last two created partitions.

Increasing ActivePartitionCount reduces the number of Tuple Mover operations. However, you may end up with too many ROS containers in your projections.

Use the following query to find the active partitions for a projection:

=> SELECT DISTINCT partition_key FROM strata
   WHERE projection_name ILIKE '%sktest%' 
   AND schema_name ILIKEe '%public%';
 partition_key
---------------
             5
             8
             9
(3 rows)

Which tables should be partitioned?

Vertica recommends that you partition only large fact tables. Do not partition small tables or dimension tables. Doing so creates a large number of ROS containers, which can quickly increase the catalog size and impact query performance.

How can I archive rarely used data?

Instead of deleting data, you may need to keep data around for a long time for historical purposes.

If that's the case, consider creating a storage policy for historical and older partitions. You can store older partitions on slower and less expensive storage.. This option frees up faster and costly storage for quick retrieval of frequently accessed partitions.

For more information, see Creating Storage Policies in the Vertica documentation.

What should I consider when defining partition schemes for a table?

When you define a partition expression for a table, consider the following:

  • Data retention policy
  • Frequently used query predicates
  • The impact that partition granularity can have on the number of ROS containers per projection per node and the total number of ROS files per node

Partitions and Storage Pruning

How do partitions affect data lifecycle management?

Partitioning large fact tables in your Vertica database makes data lifecycle management easier and improves query performance.

Vertica provides capabilities to:

  • Drop a partition from a table.
  • Move a rarely used partition to an archive table.
  • Move a rarely used partition to less expensive storage.
  • Restore a partition from an archive.

Use partitions and the associated SQL capabilities to manage your data retention policies.

What is storage pruning and how is it related to partitioning?

Data for partitions is segregated into individual storage containers. As a result, queries that use predicates on partitioned columns can significantly benefit from the Vertica storage-pruning capabilities. Read on to learn how storage pruning works.

During the query-planning phase, the database optimizer identifies storage containers that do not contain data required by that query. The Optimizer bases this information on the minimum and maximum values of the partitioning column of each container. During query processing, the Vertica execution engine omits storage containers that do not store applicable values, reducing I/O and improving query performance.

Suppose you have a table with three years of data that is partitioned by month. This table has 36 partitions of data. Here are four scenarios:

Query Storage Container Pruning
Query analyzes data for a specific week or month. Vertica prunes the storage containers that contain the remaining 35 partitions.

Query analyzes data for a three-month quarter.

Vertica uses storage containers belonging to three partitions—one for each month—and prunes the rest.
Query has a predicate on a column that is not involved with the partitioning.

This query does not make effective use of storage pruning because the data for the predicate column may be present across multiple partitions.

Example: Table is partitioned by transaction_date, but the query has a predicate on the zip_code column.

Query has a predicate on a column that is correlated with the partitioning column.

This query takes advantage of storage pruning where applicable.

Example: The query has a predicate on the ship_date. The ship_date typically falls in the same month as the transaction_date, or the month after. In this case, Vertica can prune all but the two partitions for those two months.

How can I find out if my query is taking advantage of partitioning and storage pruning?

To find the number of pruned storage containers for a query, take these steps:

  1. Profile the query to get the transaction_id and statement_id:

    => PROFILE SELECT * FROM <table_name> WHERE <column_name> BETWEEN 5 AND 7;
    NOTICE 4788:  Statement is being profiled
    HINT:  Select * from v_monitor.execution_engine_profiles where 
    transaction_id=54043195528458555 and statement_id=1;
    NOTICE 3557:  Initiator memory for query: [on pool general: 19543 KB, minimum: 19543 KB]
    NOTICE 5077:  Total memory required by query: [19543 KB]
     C1
    ----
     6
     7
     5
    (3 rows)
  2. Check the QUERY_EVENTS system table for that transaction, and identify where the query plan indicates that partitions are eliminated from processing when that query executes:

    => SELECT node_name , event_details FROM query_events WHERE 
       event_type = 'PARTITIONS_ELIMINATED' AND transaction_id = 54043195528458555
       AND statement_id=1;
        node_name     |                       event_details
    ------------------+-----------------------------------------------------------
    v_vmart_node0003  | Using only 1 stores out of 3 for projection public.tab_b0
    v_vmart_node0002  | Using only 1 stores out of 5 for projection public.tab_b0
    v_vmart_node0001  | Using only 1 stores out of 2 for projection public.tab_b0
    (3 rows)

Partitions and ROS Files and ROS Containers

What is the difference between ROS files and ROS containers?

When a user issues COPY statement with DIRECT, Vertica creates one ROS file per column. A ROS container is a logical grouping of ROS files. Vertica created ROS containers as a result of a COPY DIRECT or Tuple Mover operation.

Why should I consider the number of ROS containers per projection for each node?

Vertica segregates partitioned data into different storage containers. Because Vertica does not merge data across partitions, having too many small partitions can push the number of storage containers close to the maximum number of ROS containers: 1024.

If you reach this limit for a specific projection, and try to load new data into that projection, the load fails with a "Too many ROS containers" error. If you have that error, do one of the following:

  • Use ALTER_PARTITION to change the partition expression to a less finer grained partitioning scheme.
  • Move older partitions to an archive table using MOVE_PARTITION.

How do the total number of ROS containers per node impact my database?

The number of ROS files per node is:

 (# storage containers) x (# columns per projection)

The number of ROS files can be a major contributor to a large catalog size. Having a large catalog consumes system memory and slows down other database operations such as system table queries, database startup, database backup, and scratch recovery.

If you have a catalog with 1 million ROS files per node, the catalog size is approximately 3–4GB. If the number of ROS files grows before you have it under control, catalog memory may not be released. After Vertica reacquires the released catalog memory, the memory is available list for future use. However, the node does not actually free that memory until you reboot that node.

Let's look at two use cases.

Use Case 1: Large Catalog

You have a database with

  • 1,000 tables
  • 2 projections per table
  • 50 columns per table (on average)
  • 50 ROS containers per node
  • Approximately 5 million ROS files per node.

You have 2000 projections and 5,000,000 ROS files:

(1000 tables) x (2 projections per table) = 2000 projections
(50 columns) x (2000 projections ) = 10,0000 x (50 ROS per projection) = 5,000,000 ROS files 

You can partition 300 tables by date and retain the data for one year. Doing so allows you to manage a single date's worth of data using Vertica partition management functions. However, that bumps the ROS count by another ~10 million, doubling the size of the catalog:

(300 tables) x (2 projections) = 600 projections
(365 days) x 600 = 219,000 x (50 columns) = 10,900,000 ROS files

In this situation, you have two options:

  • Partitioning by date adds at least 365 ROS containers per projection per node, so you want only a small number of tables partitioned by date. Avoid partitioning small tables and if possible, instead, partition by week or month for most tables.
  • If partitioning by date is required, identify where you can reduce the number of ROS containers by combining similar tables into a single table. Schemas are often inherited from legacy OLTP systems that split tables into multiple tables by geographical location to improve query performance. For best results, combine these tables into a single table.

Use Case 2: Too Many ROS Containers Per Projection

You have a database with:

  • 100 tables
  • 2 projections per table
  • 50 columns per table
  • 50 ROS containers per node
  • Approximately 500,000 ROS files per node

You have 200 projections and 500,000 ROS files:

(100 tables) x (2 projections per table)     = 200 projections
(Average of 50 columns) x (200 projections)  = 
 10,000 x (50 ROS containers per projection) = 
 500,000 ROS files

Suppose you want to partition 10 tables by date and retain the data for three years. Doing so allows you to manage the data for a single date using the partition management functions. In this scenario, you may not accumulate a large number of ROS files. However, you hit ROS pushback (1024 ROS containers) when loading data before those three years are up.

1 table x (365 x 3) daily for 3 years = 1095  ROS containers for inactive partitions 

As a result, there is no room left for data to be received in active partitions.

Alternatively, if you partition by week:

1 table x (52 x 3) weekly for 3 years = 156 ROS containers for inactive partitions 

This alternative creates significantly fewer ROS containers. In this situation, consider partitioning the data by week instead of by date.

Consider these issues when planning for future data growth. Your organization may move data for other analytic applications to Vertica or decide to retain more data after seeing the storage optimizations and performance capabilities of your database.

Repartitioning and Reorganizing

Can I partition a non-partitioned table? Can I change the partition expression for a table?

You partition an existing table or change the partition expression for a table using:

=> ALTER TABLE <table_name> PARTITION BY <partition_expression>

When you run this statement, any existing partition key information for storage containers is immediately erased. You have to reconstruct this information according to the new partition expression using the REORGANIZE keyword.

Caution Do not alter table partitioning when nodes are down.

What happens when I use the REORGANIZE keyword?

Use the PARTITION BY and REORGANIZE keywords separately or together to partition or repartition a table as follows:

=> ALTER TABLE <table_name> PARTITION BY <partition_expression> REORGANIZE;

When you run this statement, Vertica drops any existing partition keys, repartitions the table, and reorganizes the table.

The reorganize operation is a variation of the Tuple Mover operations that run in the background. The reorganize operation reads the data in chunks, so as not to affect database performance. Then, REORGANIZE writes the data to ROS containers according to the new partitioning scheme and adds the partition key to the ROS container objects.

Can I delay REORGANIZE? What is the effect of such a delay?

In order to minimize the performance of a running database, reorganize only works on a subset of ROS containers at a time.

Delaying the reorganize operation after repartitioning results in the following limitations:

  • You cannot run the partition functions on a table whose partition expression has been altered but not reorganized.
  • For partitioned tables, ROS containers without partition keys do not participate in Tuper Mover mergeout. This can contribute to ROS pushback.

Reorganize as soon as possible after repartitioning. Monitor the progress of the reorganize operation until it completes for all projections anchored on the altered table.

How can I monitor the status of the REORGANIZE process ?

Monitor the reorganize background progress using the following system tables:

  • VS_TUPLE_MOVER_OPERATIONS
  • PARTITION_STATUS
  • PARTITION_REORGANIZE_ERRORS

To review the partition history for a given table, query the system table CATALOG_EVENTS.

How do I remove table partitioning?

To alter a table so that it is no longer partitioned, use the following statement:

=> ALTER TABLE <table_name> REMOVE PARTITIONING;

After you remove the partitioning from a table, Vertica treats the table like any other non-partitioned table. Vertica merges the ROS containers using the strata algorithm.

Partition Limit Considerations

Is there a limit on the number of partitions per table?

There is no limit on the number of partitions per table. However, data for partitions are segregated into ROS containers. Vertica limits the number of ROS containers per projection per node to 1024, so the limit is essentially 1024 partitions per table.

Vertica prevents a single COPY DIRECT statement from loading more than 1024 partitions.

If you have more than 365 partitions (approximately one-third of the limit), watch the ROS container per projection counts and monitor Tuple Mover mergeout operations. With more than 365 partitions, you may need to rethink the partitioning scheme for that table or move partitions that are not queried to an archive table.

How can I determine the size of the database catalog in memory?

The following query returns the size of the database catalog:

=> SELECT node_name, MAX (ts) AS ts, MAX(catalog_size_in_MB)
   AS catlog_size_in_MB
   FROM
   (SELECT node_name,
    TRUNC((dc_allocation_pool_statistics_by_second."time")::TIMESTAMP,
    'SS'::VARCHAR(2)) AS ts, 
    SUM((dc_allocation_pool_statistics_by_second.total_memory_max_value
    - dc_allocation_pool_statistics_by_second.free_memory_min_value))/(1024*1024) 
    AS catalog_size_in_MB from dc_allocation_pool_statistics_by_second GROUP BY 1,
    TRUNC((dc_allocation_pool_statistics_by_second."time")::TIMESTAMP,
    'SS'::VARCHAR(2))
    )
    subquery_1 GROUP BY 1 ORDER BY 1 LIMIT 50;

How can I check to see if partitioning is contributing to a large catalog?

There are several situations that can cause a large database catalog. A large catalog is one that is 10 GB or greater.

The following query returns the top 50 projections that have the maximum ROS containers. If you see that the tables in the results are partitioned and the partition count is high, consider modifying your partitioning scheme so that Vertica creates fewer partitions.

=> SELECT s.node_name, p.table_schema, s.projection_name,
   COUNT(DISTINCT s.storage_oid) storage_container_count,
   COUNT(DISTINCT partition_key) partition_count, 
   COUNT(r.rosid) ros_file_count
   FROM storage_containers s LEFT OUTER JOIN PARTITIONS p 
   ON s.storage_oid = p.ros_id JOIN vs_ros r 
   ON r.delid = s.storage_oid
   GROUP BY 1,2,3 ORDER BY 4 DESC LIMIT 50;

For More Information

For more information about partitioning, see the following sections in the Vertica documentation:

Share this article: