Rebalance Taking a Long Time

Posted June 14, 2018 by Soniya Shah, Information Developer

After you add a node to your Vertica cluster or remove a node from your cluster, Vertica rebalances the data across all the nodes. If rebalancing is taking a long time, review these steps to find out the probable cause.

Pre-Requisites

To ensure a successful rebalance of your cluster, before you start the rebalance, take these steps: 1. Schedule the rebalance at a time that does not conflict with ETL jobs.2. Back up the database.3. Drop older or unused table partitions.4. Verify that local segmentation is disabled. If local segmentation is not disabled, run this command to disable it: => SELECT DISABLE_LOCAL_SEGMENTS();5. Check the CPU and network bandwidth using vioperf and vnetperf, respectively. If the available bandwidth is lower than your initial benchmarks, contact your system administrator to find and fix the problems causing the lower benchmarks. 6. Check if you have storage that is at least 40% of the size of your database available to perform the rebalance. To identify storage usage, run the following query: => SELECT node_name, storage_path, disk_space_used_mb, disk_space_free_mb FROM DISK_STORAGE; Check the available disk space on your Linux file system: $ df -h Get a snapshot of each node from the HOST_RESOURCES system table: => SELECT host_name, disk_space_used_mb, disk_space_total_mb FROM HOST_RESOURCES; If you do not have enough storage, take steps to reduce your catalog size:
  • • Delete unneeded, temporary, and staging data.
  • • Clean up the log files.
  • • Drop unneeded tables or partitions.
  • • Add a new drive, add a storage location, and migrate some catalog objects to the new location.
  • • Add a temporary storage location for temp space to be used during the rebalance.
Check the settings for the built-in REFRESH resource pool. => SELECT name, is_internal, plannedconcurrency, maxmemorysize FROM RESOURCE_POOLS WHERE name='REFRESH'; If necessary, adjust the resource pool settings to accommodate the rebalance operation.   7. Minimize any DML operations (COPY, INSERT, UPDATE, DELETE) on tables to be rebalanced. If rebalance has a lock on a table, the load fails. If the load has a lock on a table, the rebalancing pauses. If you think your rebalance might contend with ETL jobs, increase the LockTimeout value. The default value is 300 seconds (5 minutes). => SELECT SET_CONFIG_PARAMETER('LockTimeOut, 600)8. Purge deleted data as described in Purging Deleted Data.9. Configure the hosts you are adding to the cluster.10. Add the hosts to the cluster.11. Add the nodes to the database. Note: For more information, see Managing the Database.12. If you want your rebalance to run uninterrupted, give preference to the rebalance process by setting the DMLCancelTM parameter to false: => SELECT SET_CONFiG_PARAMETER('DMLCancelTM', 0);Now you’re ready to start the rebalance operation. Here are the steps to get started and to monitor the process to help it complete successfully. Start the rebalance as described in Rebalancing Data Using SQL Functions.        
Step Task Results
1 Monitor which tables Vertica is rebalancing: => SELECT * FROM REBALANCE_TABLE_STATUS; The query returns information about which tables Vertica is rebalancing and how much data has been transferred.
2 If any DML or DDL operations interfere with the rebalancing of certain tables, you see this error message: ERROR 3007: DDL statement interfered with this statement If you see this error, stop the DML or DDL operations and restart the rebalance. Alternatively, postpone the rebalance until a time when these operations will not be running.
3 Monitor the following system tables to see the progress of individual tables and projections: => SELECT table_name, separated_percent, transferred_percent FROM REBALANCE_TABLE_STATUS;=> SELECT anchor_table_name, separated_percent, transferred_percent FROM REBALANCE_PROJECTION_STATUS; Monitor the percent values. If they do not progress as expected, your rebalance may be running slowly. If rebalance is slow, consider stopping the rebalance process, and preparing for a new rebalance using the steps at the beginning of this rebalance.
4 Check if rebalancing completed successfully. => SELECT operation_status FROM REBALANCE_OPERATIONS; If operation_status = COMPLETE, the rebalance completed without errors. If all the nodes in the cluster are rebalanced, this is the end of your checklist. If the nodes in the cluster are not rebalanced, try the following:
  • • Check the pre-requisites listed in this checklist and restart the rebalance.
  • • If the nodes are still not rebalanced, contact Vertica Technical Support.
 
5 Check if the K-safety is correct. If K-safety is not set to 1 or 2, follow the instructions in Designing for K-Safety.
6 Evaluate the performance of your database after the rebalance. If there is a significant reduction in performance, contact Vertica Technical Support.

Learn More

See the Understanding Rebalancing 1: What Happens During Rebalancing and Understanding Rebalancing 2: Optimizing for Rebalancing in the Vertica Knowledge Base.