Looking Under the Hood at Vertica Queries

Posted March 2, 2016 by Sarah Lemaire, Manager, Vertica Documentation

Three 3D arrows, different colors pointing in different directions

When you submit a query to Vertica, you want it to execute as quickly and efficiently as possible. The query optimizer creates a plan that is designed to do just that. The directives in the query plan determine your query’s run-time performance and resource consumption, but the properties of your projections and the system parameters also impact the query’s performance.

But the query plan is just that—a plan. You really want to know what actually happened while the query executed. What projections did the query use? What system resources did the query acquire? Can adjust any parameters to make the query run faster?

Those are a lot of questions. Read on to get some answers.

What projections did your query access?

Projections, of course, are the physical location where Vertica stores your data. For best performance, you should always run Database Designer to optimize the projections that your frequently executed queries need to access.

To find out which projections were used for a particular query, you could review the query plan, but there’s an easier way: query the QUERY_PROFILES and PROJECTION_USAGE system tables. These system tables will give you information about specific queries, as well as usage information. When joined, these two tables provide you with an overall picture of what projection a query accessed.

Identifying the query

To see the projections used by a particular query, first, get the statement and transaction IDs from the QUERY_PROFILES system table. The following results list the executed queries in reverse chronological order, with the latest query listed first:
=> SELECT query, transaction_id, statement_id, query_start
FROM query_profiles
ORDER BY query_start DESC;
-[ RECORD 1 ]--+------------------------------------------------------------------------------
query | SELECT avg(employee_age) FROM public.employees GROUP BY employee_location;
transaction_id | 45035996278252303
statement_id | 1
query_start | 2016-02-23 10:03:00.270725-05
-[ RECORD 2 ]--+------------------------------------------------------------------------------
query | SELECT * FROM query_profiles;
transaction_id | 45035996278252113
statement_id | 18
query_start | 2016-02-23 10:01:59.563359-05
-[ RECORD 3 ]--+------------------------------------------------------------------------------
query | GRANT EXECUTE ON FUNCTION public.GetData() TO PUBLIC;
transaction_id | 45035996278251057
statement_id | 1
query_start | 2016-02-23 05:10:15.164605-05 …

Now that we have a statement and transaction id for a couple queries, we can use the ids to find the projections associated with them.

Identifying a query’’s associated projections

Let’s identify the projection used by the first query from the QUERY_PROFILES example in the previous section:

=> SELECT avg(employee_age) FROM public.employees GROUP BY employee_location;

To find the projection associated with this query, join the QUERY_PROFILES and PROJECTION_USAGE system tables on the statement_id and transaction_id columns, and then specify the statement id and transaction id in a WHERE clause.

The following natural join returns the text of the executed query and the name of the projection it accessed. You can see that the SELECT query used the employees_b0 projection.
=> SELECT query_profiles.transaction_id,
query_profiles.statement_id,
query_profiles.query,
projection_usage.projection_name
FROM query_profiles, projection_usage
WHERE (
projection_usage.transaction_id = query_profiles.transaction_id
AND projection_usage.statement_id = query_profiles.statement_id
)
AND
(
projection_usage.transaction_id = 49539595901078494
AND projection_usage.statement_id = 20
);
-[ RECORD 1 ]---+---------------------------------------------------------------------------
transaction_id | 49539595901078494
statement_id | 20
query | SELECT (avg(employee_age) FROM public.employees GROUP BY employee_location;
projection_name | employees_b0

Did my query insert data?

The PROJECTION_USAGE system table can also tell you whether data was inserted into or retrieved from a projection by a particular query. The io_type column contains one of two possible values:

  • input—SELECT statements
  • output—INSERT and COPY statements

Let’s return to our example query:

=> SELECT avg(employee_age) FROM public.employees GROUP BY employee_location;

Not surprisingly, for this SELECT query, the io_type column is input because that query retrieved data from the employees_b0 projection. In the following results, you can see that other queries inserted or copied data into the specified projections:
=> SELECT transaction_id, statement_id, io_type, projection_name FROM projection_usage ORDER BY query_start_timestamp DESC;
-[ RECORD 1 ]---+------------------------------------------------------
transaction_id | 49539595901078494
statement_id | 20
io_type | output
projection_name | employees_b0
-[ RECORD 2 ]---+------------------------------------------------------
transaction_id | 45035996278259395
statement_id | 3
io_type | input
projection_name | summary_b0
-[ RECORD 3 ]---+------------------------------------------------------
transaction_id | 45035996278256653
statement_id | 1
io_type | input
projection_name | sales_b0
-[ RECORD 4 ]---+------------------------------------------------------
transaction_id | 45035996278256651
statement_id | 1
io_type | output
projection_name | locations_b0 ...

The PROJECTION_USAGE system table has a lot of information about the projections your queries used. For complete information, see PROJECTION_USAGE in the Vertica documentation.

What system resources did my query use?

Now suppose you are evaluating your system and query performance. What system resources do your queries acquire? Which queries consume resources that might affect performance?

You can write a natural join that provides resource usage information from the RESOURCE_ACQUISITIONS system table. The following query retrieves the:

  • Number of open file handles used
  • Number of threads used
  • Amount of memory (kb) acquired
  • Duration of the resource request (ms)

=> SELECT query_profiles.transaction_id,
query_profiles.statement_id,
query_profiles.query,
resource_acquisitions.open_file_handle_count,
resource_acquisitions.thread_count,
resource_acquisitions.memory_inuse_kb,
resource_acquisitions.duration_ms
FROM query_profiles, resource_acquisitions
WHERE (
resource_acquisitions.transaction_id = query_profiles.transaction_id AND
resource_acquisitions.statement_id = query_profiles.statement_id
)
AND
(
query_profiles.transaction_id = 49539595901078494 AND
query_profiles.statement_id = 1
);
-[ RECORD 1 ]----------+------------------------------------------------------------------------
transaction_id | 49539595901078494
statement_id | 1
query | SELECT (avg(employee_age) FROM public.employees GROUP BY employee_location; open_file_handle_count | 3
thread_count | 6
memory_inuse_kb | 70760
duration_ms | 22
Can I adjust any parameters to make the query run faster?

You can use the information in the RESOURCE_AQUISITIONS system table to determine if you want to adjust your resource pool or system parameters for queries that consume more or fewer resources. Read more about resource pool parameters and how you can use the CREATE RESOURCE POOL and ALTER RESOURCE POOL statements to configure these parameters.

For More Information

For complete information about the system tables used in these example, in the Vertica documentation, see: