Best Practices for Statistics Collection

The query optimizer requires representative statistics in order to choose the best query plan. Vertica regularly collects statistics incrementally, which is generally sufficient for most optimizer requirements. If necessary, you can explicitly request that Vertica collect full table statistics.

Collecting Partial Statistics

For many applications, statistics do not need to be accurate to the latest minute. The ANALYZE ROW COUNT operation regularly collects partial statistics and supplies enough data to satisfy many optimizer requirements.

If desired, you can explicitly invoke ANALYZE ROW COUNT by calling the function DO_TM_TASK with the argument analyze_row_count. For example:

vmart=> SELECT DO_TM_TASK('analyze_row_count'); 
                                  DO_TM_TASK---------------------------------------------------------------------------------------
row count analyze for projection 'call_center_dimension_DBD_27_seg_temp_init_temp_init'
row count analyze for projection 'call_center_dimension_DBD_28_seg_temp_init_temp_init'
row count analyze for projection 'online_page_dimension_DBD_25_seg_temp_init_temp_init'
row count analyze for projection 'online_page_dimension_DBD_26_seg_temp_init_temp_init'
row count analyze for projection 'online_sales_fact_DBD_29_seg_temp_init_temp_init'
row count analyze for projection 'online_sales_fact_DBD_30_seg_temp_init_temp_init'
row count analyze for projection 'customer_dimension_DBD_1_seg_temp_init_temp_init'
row count analyze for projection 'customer_dimension_DBD_2_seg_temp_init_temp_init'
row count analyze for projection 'date_dimension_DBD_7_seg_temp_init_temp_init'
row count analyze for projection 'date_dimension_DBD_8_seg_temp_init_temp_init'
row count analyze for projection 'employee_dimension_DBD_11_seg_temp_init_temp_init'
row count analyze for projection 'employee_dimension_DBD_12_seg_temp_init_temp_init'
row count analyze for projection 'inventory_fact_DBD_17_seg_temp_init_temp_init'
row count analyze for projection 'inventory_fact_DBD_18_seg_temp_init_temp_init'
row count analyze for projection 'product_dimension_DBD_3_seg_temp_init_temp_init'
row count analyze for projection 'product_dimension_DBD_4_seg_temp_init_temp_init'
row count analyze for projection 'promotion_dimension_DBD_5_seg_temp_init_temp_init'
row count analyze for projection 'promotion_dimension_DBD_6_seg_temp_init_temp_init'
row count analyze for projection 'shipping_dimension_DBD_13_seg_temp_init_temp_init'
row count analyze for projection 'shipping_dimension_DBD_14_seg_temp_init_temp_init'
row count analyze for projection 'vendor_dimension_DBD_10_seg_temp_init_temp_init'
row count analyze for projection 'vendor_dimension_DBD_9_seg_temp_init_temp_init'
row count analyze for projection 'warehouse_dimension_DBD_15_seg_temp_init_temp_init'
row count analyze for projection 'warehouse_dimension_DBD_16_seg_temp_init_temp_init'
row count analyze for projection 'store_dimension_DBD_19_seg_temp_init_temp_init'
row count analyze for projection 'store_dimension_DBD_20_seg_temp_init_temp_init'
row count analyze for projection 'store_orders_fact_DBD_23_seg_temp_init_temp_init'
row count analyze for projection 'store_orders_fact_DBD_24_seg_temp_init_temp_init'
row count analyze for projection 'store_sales_fact_DBD_21_seg_temp_init_temp_init'
row count analyze for projection 'store_sales_fact_DBD_22_seg_temp_init_temp_init'
(1 row)

Collecting Full Statistics

Running ANALYZE_STATISTICS or ANALYZE_HISTOGRAM is an efficient but potentially long-running operation that analyzes each unique table column across all projections. You can run it concurrently with queries and loads in a production environment.

Statistics gathering can incur considerable overhead on system resources (CPU and memory), at the expense of queries and loads. Vertica recommends that you gather full table statistics only after one or more of following events occur to a table:

Tip: You can analyze statistics on a single table column, rather than on the entire table. Running statistics on a single important column (such as the predicate column) is useful for large tables, which could take a long time to compute. It's also a good idea to run statistics on a column after you call ALTER TABLE to add or change a column.

 

Other Tools

You can diagnose and resolve many statistics-related issues by calling ANALYZE_WORKLOAD, which returns tuning recommendations.

If you update statistics and find that the query still performs poorly, run your query through the Database Designer and choose incremental as the design type. See Incremental Design.

 


Was this topic helpful?