Resource Management

Posted April 9, 2018 by Soniya Shah, Information Developer

Programmer
This blog post was authored by Soniya Shah. A Vertica database runs on a cluster of hardware. All loads and queries running against the database take up system resources, such as CPU, memory, disk I/O, bandwidth, file handles, and more. Query performance depends on how many resources are allocated to it. In a single-user environment, the system can devote all resources to a single query, which gets the most efficient execution for the query. It’s more likely your environment needs to run multiple queries at the same time. When you run more than one query on the system, the queries share the resources. Therefore, each query can take longer to run than it would if it were running by itself. The resource manager helps you prioritize resource usage.

The Resource Manager

The resource manager operates on resource pools. A resource pool is a pre-allocated subset of the system resources, with an associated queue.

Built-in Pools

Resource Pool Description
GENERAL Answers requests that have no specific resource pool associated with them.
BLOBDATA Controls resource usage for in-memory blobs.
DBD Controls resource usage for Database Designer processing.
JVM Controls Java Virtual Machine resources used by Java UDXs.
METADATA Tracks memory that is allocated for catalog data.
RECOVERY Used when recovering another node of the database.
REFRESH Used when refreshing newly added projections.
SYSDATA Reserved for temporary storage of intermediate results of queries against system monitoring and catalog tables.
SYSQUERY Reserved for temporary storage of intermediate results of queries against system monitoring and catalog tables.
TM (Tuple Mover) Controls resources for queries associated with the Tuple Mover.
WOS (Write Optimized Store) Controls WOS usage.

User-Defined Resource Pools

You can supplement the built-in resource pools with your own resource pools, to handle needs that are specific to your database. User-defined resource pools are useful if you have competing resource requirements across different classes of workloads. With user-defined resource pools, you can: • Handle workload classes • Limit memory usage, concurrency, and priority • Restrict users to specific pools

Example: Managing High-Priority Queries

Imagine your business CEO frequently runs the following query through a third-party BI tool: => SELECT DISTINCT s.product_key, p.product_description FROM store.store_sales_fact s, public.product_dimension p WHERE s.product_key = p.product_key AND s.product_version = p.product_version AND s.store_key IN (SELECT store_key FROM store.store_dimension WHERE store_state = 'MA') ORDER BY s.product_key; You want to ensure the query always has resources to run. When this query runs, Management Console shows that it uses the general resource pool, which it shares with other queries. Queries that are queued to use the general pool occasionally time out. This is not acceptable for queries that the CEO needs to run. To ensure that CEO queries always run to completion, create a resource pool that is dedicated to them: 1. Determine how much memory to allocate to CEO queries. Run the PROFILE command on the query shown earlier: => PROFILE SELECT DISTINCT s.product_key, p.product_description FROM store.store_sales_fact s, public.product_dimension p WHERE s.product_key = p.product_key AND s.product_version = p.product_version AND s.store_key IN (SELECT store_key FROM store.store_dimension WHERE store_state = 'MA') ORDER BY s.product_key; NOTICE 4788: Statement is being profiled HINT: Select * from v_monitor.execution_engine_profiles where transaction_id=45035996274783380 and statement_id=1; NOTICE 3557: Initiator memory for query: [on pool general: 693504 KB, minimum: 640011 KB] product_key | product_description -------------+---------------------------------------- 1 | Brand #1 butter 1 | Brand #2 bagels 2 | Brand #3 lamb 2 | Brand #4 brandy 2 | Brand #5 golf clubs ... 2. The query returns with a hint and estimate on how much memory the query requires. Copy the transaction and statement IDs from the hint, and use them in the following query to confirm how much memory the query actually used: => SELECT MAX (memory_kb) memory_kb FROM dc_resource_acquisitions WHERE transaction_id = 45035996274783380 AND statement_id = 1; memory_kb ----------- 696213 (1 row) 3. As superuser, create a resource pool with at least that amount of memory: => CREATE RESOURCE POOL ceo_pool MEMORYSIZE '700000K'; Note: Vertica supports many options for customizing a resource pool. These include: modifying memory size, priority and concurrency. After you create a resource pool, you can modify it later with ALTER RESOURCE POOL. 4. Grant user ceo usage privileges on ceo_pool: => GRANT USAGE ON RESOURCE POOL ceo_pool TO ceo; 5. Associate this user with the pool: => ALTER USER ceo RESOURCE POOL ceo_pool; After you associate user ceo with resource pool ceo_pool, the resources allocated for that pool are reserved for queries from that user only. If desired, you can extend usage of this resource pool to other users or roles. Now, switch to the ceo user and run the query again: => \c - ceo You are now connected as user "ceo". => SELECT DISTINCT s.product_key, p.product_description FROM store.store_sales_fact s, public.product_dimension p WHERE s.product_key = p.product_key AND s.product_version = p.product_version AND s.store_key IN (SELECT store_key FROM store.store_dimension WHERE store_state = 'MA') ORDER BY s.product_key; When you check Management Console, it confirms that the query uses resources from resource pool ceo_pool.

Example: Allocating Resources for Batch Loads

Batch loads usually require more time and memory than routine queries like the one shown earlier. By using resource pools for batch loads, the general pool is free to process queries. The batch loads are assigned to their own resource pool, which is allocated the resources it requires to process load operations efficiently. 1. First, we will create a user for the resource pool: => CREATE USER etluser; 2. Create a resource pool for batch loads: => CREATE RESOURCE POOL batch_pool MEMORYSIZE '2G' MAXMEMORYSIZE'6G' MAXCONCURRENCY 3; 3. Next, grant usage privileges to etluser on the new resource pool, and associate this user with the resource pool: => GRANT USAGE ON RESOURCE POOL batch_pool TO etluser; => ALTER USER etluser RESOURCE POOL batch_pool; When etluser runs batch loads, you can use Management Console to confirm that the load operation uses resources from batch_pool.

Managing System Resource Usage

In addition to using Management Console, you can query Vertica system tables to monitor usage of resource pools. For more information, see the Vertica documentation.