Live Aggregate Projections with Vertica

Posted July 1, 2014 by Steve Sarsfield, Vertica Senior Product Marketing Manager – Partner Ecosystem

projections-e1403731206732
The Dragline release of Vertica offers an exciting new feature that is unique in the world of big data analytics platforms. We now offer Live Aggregate projections as part of the platform. The impact is that you can really fly through certain types of big data analytics that typically grind down any analytics system.

Before I get into that, however, it?’s important to back up and give some background on Vertica projections. Many databases use indexes and materialized views to improve query performance. However, these secondary structures have drawbacks. Materialized views and indexes can bloat and become a very inefficient way to optimize data analytics. They can be time-consuming to keep up-to-date during data loading, can require frequent rebuilding, and they can be tedious to manage.

Vertica has always had a better solution to materialized views and indexes. Vertica has no raw uncompressed base tables, no materialized views, and no indexes. Our optimizations consist of optimized collections of table columns, which we call ?”projections”?. There are several different types of projections. At the core, a projection could be an optimized collection of pre-sorted columns than may contain some or all of the columns of one or more tables. A projection that joins one or more tables is called a pre-join projection with the benefit of speeding up joins. A projection that contains a pre-calculated aggregate function such as average, top-K, sum, etc. is called an aggregate projection, which is a new feature of our Dragline release.

What’?s cool about aggregate projections is that queries that rely on aggregate functions like SUM, MIN/MAX and COUNT are no longer bog down the system with excessive I/O and calculation. Now, these calculations can be calculated and updated as data loads. The Vertica query optimizer creates the projections and always keeps them up-to-date, ready to answer your aggregate queries without having to grind and churn through the data.

In real life analytics situations, this new feature accelerates the speed and performance by computing metrics on the data as it arrives for targeted and personalized analytics without programming accelerator layers. It?’s particularly powerful if you?’re implementing smart metering applications, for example, where you are helping your customers understand their usage and compare it to others in the neighborhood. The aggregate information is available in the projection without having to recalculate it over and over again so your data analytics system is free to take on other workloads without the fuss.

Speeding up aggregate functions should help with many use cases for today and tomorrow. We live in a world where data volumes from smart devices such as smart buildings, mobile phones, GPS devices and sensors are ever-increasing. We’?re finding value in leveraging this data to predict usage based on history, predict equipment failure, maximize heating/cooling/lighting costs, detect fraud and more. Vertica continues to believe that projections offer a superior solution to materialized views and indexes. Projections remove the trade-off between performance and data size and offer the ultimate in flexibility for fast big data analytics.