Operational Considerations for Database Administrators in Eon Mode

Posted March 6, 2018 by Soniya Shah, Information Developer

Programmer
This blog post was authored by Shrirang Kamat. The purpose of this document is to familiarize advanced Vertica Enterprise mode users about some of the behavioral similarities and differences between Vertica Enterprise mode and Vertica Eon mode. This document assumes you have a basic understanding about Eon mode architecture. You can find more details about Eon mode architecture in the Vertica documentation.

Projection Design

Eon mode, like Enterprise mode supports segmented, replicated, and aggregate projections. You do not need to create buddy projections for segmented projections. Projection definitions, using projections, and the process of creating and optimizing projections is similar to Enterprise mode. An anchor table must have at least one super projection and can have 0 or more query-specific projections. Vertica provides the Database Designer, a tool that helps you create optimized projection designs. The design files that are exported from an Enterprise mode database can be imported into an Eon mode database without changing the file. The following is an example of a segmented projection in Eon mode: => CREATE PROJECTION values_p (a, b, c) AS SELECT a, b, c FROM values SEGMENTED BY HASH(a) ALL NODES; –-KSAFE keyword is optional

K-Safety and High Availability

In Eon mode, segmented projections are split into shards. The number of shards in a database is not tied to the number of permanent nodes in the cluster, but to what you specified when you created your Eon mode database. In a database with a K-Safety of 1, a segment shard will have at least two different nodes subscribing to it, and a replica shard that all nodes subscribe to. Each shard has one primary node subscription. For example, if your database has three nodes and three shards, use the following system table query to view the node subscriptions: testeon=> SELECT node_name , shard_name ,subscription_state,is_primary FROM node_subscriptions ORDER BY 1,2; node_name | shard_name | subscription_state | is_primary --------------------+-------------+--------------------+------------ v_testeon_node0001 | replica | ACTIVE | f v_testeon_node0001 | segment0001 | ACTIVE | f v_testeon_node0001 | segment0002 | ACTIVE | t v_testeon_node0002 | replica | ACTIVE | t v_testeon_node0002 | segment0001 | ACTIVE | t v_testeon_node0002 | segment0003 | ACTIVE | f v_testeon_node0003 | replica | ACTIVE | f v_testeon_node0003 | segment0002 | ACTIVE | f v_testeon_node0003 | segment0003 | ACTIVE | t (9 rows) When you open a session, Vertica assigns a set of nodes and shard subscriptions that serve your queries. If a node that was part of the session subscription goes down, then the nodes and/or shard subscriptions for a session are automatically adjusted. This operation is hidden from users, but you will have 100% of your data available to query. Our test database has three nodes running in Eon mode, with three shards. When all the nodes are UP, we have the following session subscriptions: testeon=> SELECT node_name , shard_name FROM session_subscriptions WHERE is_participating; node_name | shard_name --------------------+------------- v_testeon_node0002 | replica v_testeon_node0001 | replica v_testeon_node0003 | replica v_testeon_node0001 | segment0001 v_testeon_node0003 | segment0002 v_testeon_node0002 | segment0003 (6 rows) After v_testeon_node0003 went DOWN, the session subscriptions changed. Shard segment0002 is now available through node v_testeon_node0001 (highlighted). The subscription for shard segment0003 on v_testeon_node0002 will automatically switch to being primary. testeon=> SELECT node_name , shard_name FROM session_subscriptions WHERE is_participating; node_name | shard_name --------------------+------------- v_testeon_node0002 | replica v_testeon_node0001 | replica v_testeon_node0001 | segment0001 v_testeon_node0001 | segment0002 v_testeon_node0002 | segment0003 (5 rows) If v_testeon_node0002 and v_testeon_node0003 go DOWN , then the database will shutdown in UNSAFE mode because shard segment0003 is not subscribed to by any UP nodes.

Node Restart/Recovery

When a node is restarted in Eon mode, the node will move to the UP state after it joins the cluster. In Eon mode, there is no replay of catalog events, recovery of historical data, or replay of deletes. After the node moves to the UP state, the shard subscription_state for the shards that node subscribes to changes from INACTIVE to PENDING. In the PENDING state, the node receives shard metadata from another node that subscribes to the same shard. After the node received shard metadata from another node, the shard subscription state transitions to the PASSIVE state. In the PASSIVE state, the node will queue and fetch missing files for the shard subscription to the warm depot. Once all the queued files are fetched, the shard subscription state is marked as ACTIVE. When all shard subscriptions for a node are in the ACTIVE state, the node will be available to run queries. The configuration parameter EnableDepotWarmingFromPeers can be used to skip warming the depot during node recovery. When this parameter is set to 0, a node will skip warming its depot and transition to ACTIVE state from PENDING state. In this case, queries that hit data that is not found in the depot will exhibit a performance degradation upon the first hit. Data files that are not found in the depot are automatically fetched from communal storage for future access.

Node Down Performance

In Enterprise mode, the performance of some queries could be significantly impacted when all nodes are not UP because query plans generated by the Vertica optimizer can be different. Eon mode does not face the same issues with node down performance. In Eon mode, query plans are the same, irrespective of whether or not all nodes are UP, because there are no buddy projections. As a node goes down, the running session automatically picks or subscribes to the required shard from an UP node. This conceptual change in Eon Mode has removed the need for buddy projections, which are responsible when a node goes down in Enterprise mode.

Adding and Removing Nodes (Rebalancing)

In Enterprise mode, when new nodes are added or removed from your cluster, the number of segments and segment boundaries changes. ROS containers on existing nodes must be separated and transferred to match changes in segment layouts. This process is known as rebalancing and is an I/O intensive operation that can take a significant amount of time, depending on the size of the data on each node. Running the rebalance operation on all nodes in your cluster requires that all nodes are in an UP state. Therefore, any nodes that are DOWN cannot be removed from the cluster. In Eon mode, when nodes are added or removed from the cluster, you can run the rebalance_shards function to redistribute shard subscriptions among nodes. When a node subscribes to a new shard, it will have warmed the depot before it is available to run queries. This process does not require splitting ROS files and is significantly faster than rebalance_cluster. You can add or remove nodes to the cluster when all nodes are not UP because changes to shard subscriptions are allowed when all nodes are not UP. The following shows the node subscription for a three node cluster with four shards: testeon=> SELECT node_name , shard_name ,subscription_state,is_primary FROM node_subscriptions ORDER BY 1,2; node_name | shard_name | subscription_state | is_primary --------------------+-------------+--------------------+------------ v_testeon_node0001 | replica | ACTIVE | f v_testeon_node0001 | segment0001 | ACTIVE | t v_testeon_node0001 | segment0002 | ACTIVE | t v_testeon_node0001 | segment0004 | ACTIVE | f v_testeon_node0002 | replica | ACTIVE | t v_testeon_node0002 | segment0001 | ACTIVE | f v_testeon_node0002 | segment0003 | ACTIVE | f v_testeon_node0002 | segment0004 | ACTIVE | t v_testeon_node0003 | replica | ACTIVE | f v_testeon_node0003 | segment0002 | ACTIVE | f v_testeon_node0003 | segment0003 | ACTIVE | t (11 rows) You can add a new node to the cluster using admintools, and then run rebalance_shards to make it a four node cluster: testeon=> SELECT rebalance_shards(); rebalance_shards ------------------- REBALANCED SHARDS (1 row) The following shows the node subscription after a new node was added and the shards were rebalanced: testeon=> SELECT node_name , shard_name ,subscription_state,is_primary FROM node_subscriptions ORDER BY 1,2; node_name | shard_name | subscription_state | is_primary --------------------+-------------+--------------------+------------ v_testeon_node0001 | replica | ACTIVE | f v_testeon_node0001 | segment0001 | ACTIVE | f v_testeon_node0001 | segment0002 | ACTIVE | t v_testeon_node0002 | replica | ACTIVE | t v_testeon_node0002 | segment0001 | ACTIVE | t v_testeon_node0002 | segment0004 | ACTIVE | f v_testeon_node0003 | replica | ACTIVE | f v_testeon_node0003 | segment0002 | ACTIVE | f v_testeon_node0003 | segment0003 | ACTIVE | t v_testeon_node0004 | replica | ACTIVE | f v_testeon_node0004 | segment0003 | ACTIVE | f v_testeon_node0004 | segment0004 | ACTIVE | t (12 rows) You can shrink the cluster to a single node by reducing the K-Safety to 0 and then removing all but one node: testeon=> SELECT mark_design_ksafe(0); WARNING 6022: Setting K-safety to 0 could result in catastrophic data loss in the event of a failure. Do not use k=0 in a production environment. For test, dev or other non-production environments, K=0 may be acceptable however Vertica still recommends a minimum value of K=1 mark_design_ksafe ---------------------- Marked design 0-safe (1 row) Then, view the new shard subscriptions: testeon=> SELECT node_name , shard_name ,subscription_state,is_primary FROM node_subscriptions ORDER BY 1,2; node_name | shard_name | subscription_state | is_primary --------------------+-------------+--------------------+------------ v_testeon_node0001 | replica | ACTIVE | t v_testeon_node0001 | segment0001 | ACTIVE | t v_testeon_node0001 | segment0002 | ACTIVE | t v_testeon_node0001 | segment0003 | ACTIVE | t v_testeon_node0001 | segment0004 | ACTIVE | t (5 rows)

Fault Groups and Elastic Cluster Scaling

In Enterprise mode, you can create fault groups to make Vertica rack aware so that the buddy nodes are not placed on the same rack. You can also take advantage of terrace routing, which helps reduce the buffer requirements of queries running on large clusters. In Eon mode, adding nodes to the cluster will have a positive effect on query throughput. You can create fault groups to take advantage of elastic throughput scaling. If the number of nodes in a fault group are equal to or greater than the total number of shards, Vertica sessions that connect to a node in a fault group will only use nodes in that fault group to run queries. Vertica sessions that connect to a node in a fault group may use nodes in other fault groups only if there are not enough nodes in an UP state to answer the query. testeon=> CREATE FAULT GROUP group1; CREATE FAULT GROUP testeon=> CREATE FAULT GROUP group2; CREATE FAULT GROUP testeon=> ALTER FAULT GROUP group1 ADD node v_testeon_node0001; ALTER FAULT GROUP testeon=> ALTER FAULT GROUP group1 ADD node v_testeon_node0002; ALTER FAULT GROUP testeon=> ALTER FAULT GROUP group1 ADD node v_testeon_node0003; ALTER FAULT GROUP testeon=> ALTER FAULT GROUP group2 ADD node v_testeon_node0004; ALTER FAULT GROUP testeon=> ALTER FAULT GROUP group2 ADD node v_testeon_node0005; ALTER FAULT GROUP testeon=> ALTER FAULT GROUP group2 ADD node v_testeon_node0006; ALTER FAULT GROUP The following shows the updated node subscriptions: testeon=> SELECT node_name , shard_name ,subscription_state,is_primary FROM node_subscriptions ORDER BY 1,2; node_name | shard_name | subscription_state | is_primary --------------------+-------------+--------------------+------------ v_testeon_node0001 | replica | ACTIVE | f v_testeon_node0001 | segment0001 | ACTIVE | f v_testeon_node0001 | segment0002 | ACTIVE | f v_testeon_node0002 | replica | ACTIVE | t v_testeon_node0002 | segment0001 | ACTIVE | f v_testeon_node0002 | segment0003 | ACTIVE | f v_testeon_node0003 | replica | ACTIVE | f v_testeon_node0003 | segment0002 | ACTIVE | f v_testeon_node0003 | segment0003 | ACTIVE | f v_testeon_node0004 | replica | ACTIVE | f v_testeon_node0004 | segment0001 | ACTIVE | t v_testeon_node0004 | segment0003 | ACTIVE | f v_testeon_node0005 | replica | ACTIVE | f v_testeon_node0005 | segment0001 | ACTIVE | f v_testeon_node0005 | segment0002 | ACTIVE | t v_testeon_node0006 | replica | ACTIVE | f v_testeon_node0006 | segment0002 | ACTIVE | f v_testeon_node0006 | segment0003 | ACTIVE | t (18 rows) When we connect to the database using vsql via node v_testeon_node0004, we only get nodes from fault group group2. testeon=> SELECT node_name , shard_name FROM session_subscriptions WHERE is_participating; node_name | shard_name --------------------+------------- v_testeon_node0004 | replica v_testeon_node0005 | replica v_testeon_node0006 | replica v_testeon_node0005 | segment0001 v_testeon_node0006 | segment0002 v_testeon_node0004 | segment0003 (6 rows)

WOS & Load Strategies

In Eon mode, WOS is not available. Bulk loading is recommended for Eon mode.

Tuple Mover

In Eon mode, the Tuple Mover operations are global operations wherein a node with a primary subscription to a shard is responsible for running the mergeout operation. The output of the mergeout operations are shipped to communal storage and the other nodes that subscribe to the shard. The mergeout algorithm that qualifies and picks projections for mergeout does not change in Eon mode. In Eon mode, the moveout operation is disabled because there is no WOS.

Epochs

In Eon mode, because there is no WOS, the last_good_epoch value is always current (-1). The AHM epoch is advanced to the last_good_epoch, every three minutes, unless you have changed the value of the AdvanceAHMInterval configuration parameter.

Delete, Update & Replay Delete

In Eon mode, the delete and update plans are identical to Enterprise mode. The replay delete plan is only applicable to the mergeout operations because there is no moveout operation, recovery, rebalance, or adding of buddy projections. Because there is no WOS, all update and delete plans write to ROS or DVROS files.

Resource Pools and Workload Management

Resource pools, system tables related to resource pools, and configuring resource pools for managing workloads is the same in Eon mode as it is for Enterprise mode. Eon mode allows you to take advantage of elastic throughput scaling by configuring multiple fault groups. DML queries will reserve very small amount of memory on nodes that are not part of session subscriptions but will receive files as part of peer to peer cache refill.

Clearing dropped ROS files from S3

When ROS files are dropped as a result of the DROP TABLE, TRUNCATE TABLE or mergeout operation, the files are immediately removed from the depot and dropped from the communal storage location after two hours. Stay tuned for part 2 of this document, which will cover topics including system tables, Vertica catalog, backup and restore, migration between the two Vertica modes of operations (Eon and Enterprise), and upgrades.