Vertica OEM Hardware Sizing Guide

Vertica is a massively parallel processing (MPP) database designed to run across a cluster of similar nodes. These nodes should be identical. However, in some cases that is not possible. For example, if you expand a cluster that was originally purchased more than two years prior,the nodes are probably not identical. In that case, configure the new system with the same number of cores, at the same or faster speed, with the same amount of memory and disk space for each node.

When specifying hardware for your Vertica cluster, there are several important issues to take into account. To start, let's look at what hardware profiles work well for Vertica.

Cores

Vertica is designed to perform best with 16–28 cores clocked at or over 2.6GHz. The number of cores is important because Vertica attempts to multithread any operation that it can. The extra cores help with concurrency. For analytical queries, each query uses at least one core per node to execute, but depending on resource pool configuration, queries can utilize the entire system.

Vertica can execute more than 16 queries at a time on a cluster built on 16-core systems due to hyperthreading and context switching. However, there is some overhead in the operating system that keeps Vertica from allocating memory for more than 28 threads, which would require an extremely large amount of memory.

Clock Speed

High clock speed is important to Vertica workloads because some capabilities of Vertica (and RHEL for that matter) are single-threaded and benefit massively from the increased performance for each thread. For best performance, use 2x 8–14 core processors. The E5-2687Wv4 is 12 cores at 3.0 GHz.

I/O

Given that Vertica is an I/O-intensive application, you should avoid bottlenecks in the I/O subsystem. Use a large RAID array of 10k rpm, typically small form factor (SFF) drives. For best results, allocate 10k rpms per core. 10 cores = 10x 10k rpm drives, or about 6x 15k rpm drives. This allows Vertica to have fast access to read and write data to disk. This works out to about 80–100MB/s per core of write, which can be verified with the vioperf tool that is in /opt/vertica/bin.

When running vioperf, the most important test to review for complex ingest workloads is the rewrite speed. The rewrite speed is a good estimate of the workload that the Vertica Tuple Mover puts on the I/O subsystem when merging ROS containers. Running vioperf, you should see 40–60MB/s per core. The RAID card required to get these performance numbers is standard and just needs to be 12 GB SAS with approximately 4 GB of cache.

RAID

RAID 10 is standard, but for OEM implementation, RAID 50 is an increasingly popular option. In large clusters, RAID 50 can save on storage costs. The rebuild time of a RAID 50 array is significantly higher than that of a RAID 10 array. The more data you put on each node, the longer it takes Vertica to rebuild a node after a failure. Typically, we see a maximum of 16 TB of data mounted per node, but in larger clusters, we see multiple 16 TB partitions, up to 2 16 TB partitions per node using storage locations.

Network

For your network, you need 10 GB per node of non-blocking network (no oversubscription). If you are going to fully load the system with drives that have 32 TB of disk on each node, use dual 10 GB connections to speed up node recovery.

Memory

Typically, in modern hardware, 256 GB of RAM is the right amount. However, you can size the RAM on 8 GB per core, rounding up to 128 GB or 256 GB. These sizings are the clean options from a DIMM size perspective.

Depending on the memory utilization for each query, 512 GB of RAM per node is an option in high analytical workloads. If this is the best option for your cluster, use resource pools to manage query contention and to make sure that machine learning queries get at least 50% of the resources.

Now that you have the basic components, consider the hardware profiles that fit these requirements.

Hardware Profiles

The DL380 and the DL360 are, respectively, 2U and 1U servers. The 2U option has 24 drives and the 1U has 10 drives. Large or dense clusters, such as those built with Apollo 4200 machines, which has two sockets, up to 768 GB of RAM, and 48 SFF drives, allow Vertica OEMs to build solid ‘dense’ systems using 2x 12 core procs, 256–512 GB of RAM, and upwards of 50 TB of disk.

If you want to take this configuration and segment into two virtual nodes (using containerization, or virtualization technology to break the system into two equal parts), use higher core count processors such as the 2697Av4 with 16 cores at 2.6 GHz, or 20 or 22 core processors clocked at 2.2 GHz. When doing so, you can build a 2U sever that is twice as powerful as the DL380s by giving the system 32+ cores, 512 GB of RAM, and 32+ TB of disk.

The same concept can be applied to a four-socket system if you can provide it with enough disk for four nodes. Be aware that Vertica is a non NUMA-aware application, so your containerization or virtualization technology that is breaking the machine into pieces also needs to handle all the NUMA functionality. Otherwise, the NUMA functionality has a negative performance impact on Vertica.

In addition, it is imperative that there be no oversubscription at any layer on a four-socket system. You should have 10 GB network per node. This means 40 GB for an eight-socket system. If you configure a system with multiple Vertica nodes per physical host, make sure to configure fault groups. Vertica distributes buddy projections across the cluster such that the integrity of all data is protected from the loss of any one host and its projections.

Cluster Size

Vertica can run in clusters of up to 300–400 nodes. However, the clusters can lose efficiency at that scale. For best results, keep large Vertica clusters in the 100–200 node range. Having the right configuration allows for multi-petabyte deployments. With 32 TB (2 16 TB partitions) of disk on each node and 4x compression, you’re looking at 12–24 PB on 100–200 nodes.

This sizing can also go the other way: Nodes for smaller applications can be configured with as little as two cores, 4 GB of RAM, and two disks.

Vertica performance also depends on your workload and catalog size. Make sure to test the configuration of your Vertica cluster to verify that you are achieving the desired performance.

For More Information

For more info on these systems, check out these hardware configuration guides on the Vertica Community: