Inside the Secret World of the Workload Analyzer

Posted May 6, 2014 by Curtis_1

Modern Database Analytics

wla-image-e1399405198532
When I’m on a flight sitting next to someone, and we’re making polite conversations, often the question comes up “What do you do?” In these situations, I have to assess whether the person works in the IT industry or is otherwise familiar with the lingo. If not, my stock response is “I fix databases.” This usually warrants a polite nod, and then we both go back to sleep. This over-simplified explanation generally suffices, but in truth, it is wholly inadequate. The truth of the matter is that my job is to ensure that databases don’t get broken in the first place; more specifically, a Vertica database. But our clients have different, complex goals in mind, they sometimes configure their systems incorrectly for the kind of stuff they’re doing. I’m constantly looking for ways to empower clients to understand problems to look for before they become bigger problems.

That?s why I?’m a huge fan of the Vertica Workload Analyzer (also referred to as WLA). The WLA is a great tool in the war against problems. The WLA?s goal in life (if it has one) is to find and report on problems ? 20 specific ones to be exact. If you are using Management Console, the WLA runs once per day and produces a report. This report indicates actions to take, like tables that should have statistics updated, or queries that aren?t performing adequately. This report (actually just entries in a table) is reported in the Management Console in a window called ?Tuning Recommendations?.

But you don’?t need Management Console to take advantage of the Workload Analyzer ? you can run it manually by running the following command:

SELECT ANALYZE_WORKLOAD('');Interestingly, even though it’s called “Workload Analyzer”, the actual command is ANALYZE_WORKLOAD. You could toss this into CRON and run it multiple times per day if you’d like, but it’s probably not necessary.

The output of this command comes back to your screen – it reports on all the various “problems” that it is programmed to identify and returns them. It also creates entries in two separate tables V_MONITOR.TUNING_RECOMMENDATIONS and TUNING_RECOMMENDATION_DETAILS.

As I mentioned earlier, there are 20 queries that WLA runs in order to look for problems. These queries are called rules. WLA rules can be disabled or enabled, and many of them have parameters that can be modified- increasing or decreasing the scope of the rule. These rules have clever names like sys_r1 and sys_r13. It’s ok though – WLA doesn’t allow you to run rules directly, but I’ll break down the most useful rules, and how they can be modified for maximum effect.

sys_r1: Stale and Missing Statistics

sys_r1 looks for tables that have stale or missing statistics. It recommends updating statistics on those tables. sys_r5 and sys_r6 find queries that have reported JOIN_SPILLED or GROUP BY SPILLED events. These are indications that there is no good projection design which would enable these queries to operate at maximum efficiency. These three rules generate the most typical output I see at client sites.

sys_r16 and ssys_r20: System Utilization

sys_r16 and sys_r20 are rules that check for system utilization. sys_r16 checks for a CPU utilization that exceeds a 95% threshold for more than 20 minutes. sys_r20 looks for memory utilization of 99% for more than 10 minutes.

Many of the other rules cover very specific edge-case scenarios, and are unlikely to fire in most client sites and we hope to cover those in future blogs. For now, let’s stick to this handful, and talk about how we can modify these rules, and add our own rules!

Altering Rules

What you may not know about the WLA is that you can modify certain rules. That is, you can change some parameter settings such as threshold values that the rule uses to find problems. Let’s take rules sys_r16 and sys_r20. Each looks for CPU and memory utilization in your system. The default thresholds might be too high for your liking–but that’s ok, you can adjust them.

dbadmin=> ALTER TUNING RULE sys_r16 SET USAGE_THRESHOLD_PCT=80;
ALTER TUNING RULE
dbadmin=> ALTER_TUNING_RULE sys_r16 SET DURATION_MIN=10;
ALTER TUNING RULE

Now this rule will fire under less extreme circumstances. Instead of looking for a CPU usage of 95% for 20 minutes, it will now report on a CPU usage of 80% for 10 minutes. All the tuning rule parameters (if applicable) are defined in another system table called VS_TUNING_RULE_PARAMETERS. You can use that table to determine which parameters are available to which rules. You can tweak them if you desire.

Furthermore, you can disable or enable rules with the following commands:

dbadmin=> ALTER_TUNING_RULE sys_r16 disable;
ALTER TUNING RULE
dbadmin=> ALTER_TUNING_RULE sys_r16 enable;
ALTER TUNING RULE

Creating your own rules

The coolest part of the WLA is that you can actually create your own rules. The first step is to write your rule. This is just a regular SQL query whose purpose it is to find a problem. The example I’m going to include here is not currently a rule, but it would make a great addition – finding too many delete vectors. This rule has a threshold defined as a percentage of delete vectors as compared to the number of rows in a table. When this percent exceeds 20%, this rule will fire.
CREATE TUNING RULE dv_rule (DELETE_VECTORS, PURGE, dv_threshold=20) AS SELECT CURRENT_TIME AS time, 'Delete vectors account for ' || round(deleted_row_count/row_count, 2)*100::char(4) || '% of rows for projection ' || ps.projection_name AS observation_description
, ps.anchor_table_id AS table_id
, ps.anchor_table_name AS table_name
, s.schema_id AS table_schema_id
, ps.projection_schema AS table_schema
, NULL AS transaction_id
, NULL AS statement_id
, (SELECT current_value || '%' FROM VS_TUNING_RULE_PARAMETERS
WHERE tuning_rule = 'dv_rule' AND parameter = 'dv_threshold')
AS tuning_parameter
, 'run purge operations on tables with delete vectors' AS tuning_description
, 'SELECT PURGE_PROJECTION(''' || ps.projection_name || ''');' AS tuning_command
, 'LOW' as tuning_cost
FROM delete_vectors dv
JOIN projection_storage ps ON dv.projection_name = ps.projection_name
JOIN schemata s ON s.schema_name = dv.schema_name
WHERE ps.row_count > 0
AND deleted_row_count / row_count >
(SELECT current_value/100::numeric FROM VS_TUNING_RULE_PARAMETERS
WHERE tuning_rule = 'dv_rule'
AND parameter = 'dv_threshold') ;
The first line sets up my rule. I give it a name “dv_rule” and I define some parameters. The first two are like labels. They are arbitrary. Technically, these are referred to as the “observation type” and the “tuning type.” You can see examples by looking in the TUNING_RECOMMENDATON_DETAILS table. All remaining parameters are optional — and I can optionally define tuning parameters. Here, I’ve defined my 20% threshold for delete vectors. This query follows the order of the columns in the TUNING_RECOMMENDATION_DETAILS table. If you compare the SELECT list, and the columns in that table, you’ll note the similarities.

My query actually references the VS_TUNING_RULE_PARAMETERS table in order to get my % threshold. This creates a bit of a chicken and egg problem – the query won’t work since the parameter doesn’t yet exist in this table. So, the first thing you have to do is create the rule, which also creates the parameter. Then you can modify the query as needed. If you need to drop the query, you can do so with the following command:
dbadmin=> DROP TUNING RULE dv_rule ;
DROP TUNING RULE
Because you might go through this process several times, I recommend putting your work in to a file so you can reference it easily.

And that’s it! That’s everything you need to know in order to create and manage WLA rules. Not only can you use these rules to find problems in your Vertica cluster, you could use them to look for issues in your own data.

In the community forums, I’d like to continue the discussion – come up with your own rules, and post them. Who knows – the good ones we might include in a future Vertica release!