myVertica  

Catalog Size Debugging

You have a large catalog if your catalog size is more than 10 GB or your catalog is changing more than 5% per day. This checklist gives you some suggestions and recommendations to monitor and reduce your database catalog size.

The database catalog contains tractable metadata such as tables, projections, users, nodes, ROSes, and so on. There are two types of objects in the catalog:

  • Global objects are not node-specific. They include tables, users, and nodes. Because any node serves as a query initiator, global objects are fully replicated.
  • Local objects are node-specific. They include ROS, WOS, and dependencies. Since storage layout may vary by node, local objects are not replicated. Each node tracks its storage layout independently.

  • When a node starts, the catalog is cached in memory.

    Step Task Results
    1 Check the catalog size in all the nodes: => SELECT DATE_TRUNC('day',ts) date, node_name, MAX(catalog_size_in_MB)::int as END_CATLOG_SIZE_MEM_MB FROM ( SELECT node_name, TRUNC((dc_allocation_pool_statistics_by_second."time") ::TIMESTAMP,'SS'::VARCHAR(2)) AS ts, SUM((dc_allocation_pool_statistics_by_second. total_memory_max_value - dc_allocation_pool_statistics_by_second. free_memory_min_value))/ (1024*1024) AS catalog_size_in_MB FROM dc_allocation_pool_statistics_by_second GROUP BY 1, TRUNC((dc_allocation_pool_statistics_by_second."time") ::TIMESTAMP,'SS'::VARCHAR(2)) ) foo GROUP BY 1,2 ORDER BY 1 DESC, 2; If you have a large catalog that is increasing at a gradual speed and might affect the performance of your database, consider upgrading to Vertica 7.2.3-15 and above. Alternatively, work with this checklist to find the root of the catalog size.

    If the catalog size in memory is below 5% of your total memory and your Vertica performance is acceptable then this is the end of your checklist.

    If your catalog is within 5% but is more than 10 GB, use this checklist to find the root cause of your large catalog size. Consider reducing your catalog size using techniques in this checklist.
    2 To see if your catalog is growing and to verify if your database has too many objects, use the following queries: => SELECT COUNT(*) FROM tables; => SELECT COUNT(*) FROM Projections; => SELECT COUNT(*) FROM columns; If you have a large number of objects for your catalog size, consider reducing unnecessary objects. You can reduce the objects by:
    • Dropping objects that are not used.
    • Reducing the object dependencies. For example, analyze statistics only in columns that are used by joins, predicated and GROUP BY. If the number of objects cannot be reduced, to adapt your resource pools, go to Step 12.

    If you do not have a large number of objects for your catalog size, go to Step 3.

    3 Check if your catalog is growing due to an increase in delete vectors. => SELECT node_name, COUNT(*) FROM delete_vectors GROUP BY 1; If you see a large number of delete vectors relative to the size of your database, consider cleaning the delete vectors. For more information see Delete Vectors checklist.

    Cleaning delete vectors lowers catalog size in memory.
    After you clean delete vectors, you must restart the database or the node for the catalog size to reduce in size.

    If there are not too many delete vectors, go to Step 4.
    4 Check if there are there many ROS containers? => SELECT node_name, COUNT(*) FROM storage_containers GROUP BY 1; If ROS containers COUNT is high, go to Step 5 to see if there are projections with too many ROS containers.
    You can also contact Vertica Technical Support to perform a deeper analysis of why the catalog is so large.
    5 Identify projections with higher ROS containers. => SELECT node_name, COUNT(*) num_proj, AVG(ROS_COUNT) AVG_ROS_COUNT FROM Projection_storage WHERE ROS_COUNT > 300 GROUP BY node_name; If there are too many projections with too many ROS containers, but the average ROS count is lower than 300, go to Step 11.

    If there are not too many projections but the AVG_ROS_COUNT is above 500, go to Step 6.
    6 Check if you have too many partitions based on how you designed your tables. => SELECT projection_id, table_schema, projection_name, COUNT(distinct partition_key) FROM partitions WHERE node_name = (SELECT local_node_name()) GROUP BY 1,2,3 ORDER BY 4 DESC; If projections are not partitioned or the number of partitions is less than 600, Step 7.

    If there are projections with many unique partitions ID, change the table partition key to have fewer partitions per projection, and go to Step 7.
    7 Monitor if mergeout is working for that particular projection. => SELECT operation_start_timestamp, node_name, operation_status FROM tuple_mover_operations WHERE projection_id = <ID of step 6> and operation_name = 'Mergeout' ORDER BY 1 DESC; If the last status is not COMPLETE wait until mergeout completes and COUNT ROS containers with query. => SELECT node_name, projection_id, projection_schema, projection_name, ROS_COUNT FROM projection_storage WHERE ROS_COUNT > 30 ORDER BY ROS_COUNT DESC limit 10; If the last status is COMPLETE but mergeout has not run in the last hour go to Step 8. If mergeout is working properly but the projection still has too many ROS containers, contact Vertica Technical Support to do a deeper analysis.
    8 Run a manual mergeout. => SELECT DO_TM_TASK(‘mergeout’,’projection_schema.projection_name’); And when complete verify number of ROS containers. If mergeout does not reduce the number of ROS containers, contact Vertica Technical support.
    9 Restart Node to recover memory after adjusting catalog objects and verify catalog size in memory. => SELECT node_name, DATE_TRUNC('day',time::timestamp) date, MAX(size/1024^2)::int catalog_disk FROM DC_catalog_persistence_events WHERE event_action = 'write complete' GROUP BY 1,2 ORDER BY 1, 2 DESC; If restarting the node is not possible, go to Step 10.
    10 If your Vertica version is earlier than 8.0.1, adjust the resource pool settings to avoid memory swapping.
    Create a resource pool size of 100 – (General Pool Memory Size ) – % catalog in memory.

    For example, if the catalog in memory is 8% and the general pool memory size is 95% (the default) the calculation is 100 – 95 – 8 = –3.

    We allocate 2% in case that catalog grows. => CREATE RESOURCE POOL CATALOG_MEM MEMORYSIZE '5%’;

    This pool can be adjusted without restarting to avoid swapping or out of memory.
    In Vertica 8.0.1, Vertica automatically adjusts METADATA pool.
    Adjust pool to reflect catalog size so resource manager do not plan queries with this memory and avoid OOM or swapping.

    This pool adjustment does not resolve performance issues, to resolve performance issue consider the steps to fix catalog size.
    11 If there are too many partitioned tables, but the number of partitions is still as recommended, do one of the following:
  • Partition only the tables with more rows to avoid catalog growth.
  • Partition the tables with less granularity.
  • If it is not possible to no partition small tables go to Step 10 to adjust resource pools.
    12 Identify if there are partitions with fewer rows because those tables should not be partitioned. Partition tables with rows more than 100 per partition. => SELECT projection_id,table_schema, projection_name, COUNT(distinct partition_key), AVG (ROS_ROW_COUNT) FROM partitions GROUP BY 1,2,3 having avg(ros_row_count) < 100000 ORDER BY 5; Consider repartition tables that have several partitions with few rows in each table.
    Alternatively, do not partition those tables at all.
    If modifying tables is not an option go to Step 10 to adjust resource pools.

    Learn More

    See the Database Performance is Slow in the Vertica Troubleshooting Checklists.