Getting Statistics

ANALYZE_STATISTICS collects and aggregates data samples and storage information from all nodes that store projections associated with the specified table. You call this function as follows:

ANALYZE_STATISTICS ('[schema.]table' 
                     [, 'column[,...]'] 
                     [, percent ] )

If you call this function with a single empty string argument, Vertica collects statistics for all database tables and their projections:

=> SELECT ANALYZE_STATISTICS ('');

Operations

ANALYZE_STATISTICS performs the following operations:

Sampling Size

ANALYZE_STATISTICS constructs a column histogram from a set of rows that it randomly selects from all collected data. Regardless of the percentage setting, the function always creates a statistical sample that contains up to (approximately) the smaller of:

If a column has fewer rows than the maximum sample size, ANALYZE_STATISTICS reads all rows from disk and analyzes the entire column.

Note: The data collected in a sample range does not indicate how data should be distributed.

The following table shows how ANALYZE_STATISTICS, when set to different percentages, obtains a statistical sample from a given column:

Number of column rows   % Number of rows read Number of sampled rows

<=max‑sample‑size

20

All

All

400K

10

max‑sample‑size

max‑sample‑size

4000K

10

400K

max‑sample‑size

Note: When a column specified for ANALYZE_STATISTICS is first in a projection's sort order, the function reads all data from disk to avoid a biased sample.

Data Collection Percentage

If ANALYZE_STATISTICS omits specifying a percentage , Vertica collects a fixed 10-percent sample of statistical data from disk. Specifying a percentage of data to read from disk gives you more control over deciding between sample accuracy and speed.

The percentage of data you collect affects collection time and accuracy:

Evaluating Results

After you collect the desired statistics, run the Workload Analyzer to retrieve hints about under-performing queries and their root causes, and obtain tuning recommendations. For more information, see Analyzing Workloads.

Examples

Compute statistics on all projections in the VMart database:

=> SELECT ANALYZE_STATISTICS (''); 
 ANALYZE_STATISTICS 
--------------------
                  0
(1 row)
					

Compute statistics on a single table:

=> SELECT ANALYZE_STATISTICS ('shipping_dimension'); 
 ANALYZE_STATISTICS 
--------------------
                  0
(1 row)

Compute statistics on a single column across all projections for a table:

=> SELECT ANALYZE_STATISTICS('shipping_dimension','shipping_key'); 
 ANALYZE_STATISTICS 
--------------------
                  0
(1 row)

Collect data from the entire disk by setting the percent parameter to 100:

=> SELECT ANALYZE_STATISTICS ('shipping_dimension', 'shipping_key', 100);
 ANALYZE_STATISTICS 
--------------------
                  0
(1 row)

Collect data on all projections for shipping_dimension from 20 percent of the disk:

=> SELECT ANALYZE_STATISTICS ('shipping_dimension', 20); 
 ANALYZE_STATISTICS 
-------------------
                 0
(1 row)