|Share this article:|
Vertica Integration with Looker: Tips and Techniques
About Vertica Tips and Techniques
Vertica develops Tips and Techniques documents to provide you with the information you need to use Vertica with third-party products. This document provides guidance using one specific version of Vertica and one version of the third-party vendor's software. While other combinations are likely to work, those versions may not have been tested.
This document provides in-depth guidance for configuring Looker to connect to Vertica. Looker uses the JDBC client driver to connect to Vertica. Engineers have tested:
- Looker on-premises connecting to Vertica on-premises and in the cloud.
- Looker in the cloud connecting to Vertica on-premises and in the cloud.
The testing used Vertica 7.0.x, 7.1.x, and 7.2.x with Looker 3.35.15 on Linux. For more information about connecting Looker to Vertica, see the Vertica Integration with Looker: Connection Guide.
Looker Tips and Recommendations
Looker has two types of derived tables. These tables are frequently used to generate complex queries with pre-aggregated data and calculations:
- Ephemeral derived table
- Persistent derived table (PDT)
If you need to write derived tables, use ephemeral derived tables because they are temporary and built at query time. These tables work like subqueries and are not written in Vertica. Run your derived table queries directly in Vertica using vsql. Doing so verifies that your queries run quickly and are optimized. Then, write the derived table queries in Looker.
For more information about derived tables, see the Looker documentation.
Do not use the persistent version of the derived tables (PDTs) with Vertica. Use PDTs with caution because they can have a negative impact on query performance, as described in the implications that follow.
PDTs write the results of queries into a separate schema in Vertica. These tables are regenerated regularly. To use PDTs, you must create a scratch schema and set write permissions for this schema. The scratch schema is specified when you create a connection to Vertica from Looker.
The following is a list of implications that could arise from using PDTs:
- Additional disk storage costs because additional tables are created and the query results are stored in the database.
- Impact on Vertica licensing, because licensing is based on the volume of data.
- Additional database configuration and database maintenance by the database administrator, resource management, and user permissions.
- Impact on other user’s queries.
- Data in PDTs is not up to date and should be recreated periodically to fetch the latest data.
- Refresh scheduling can affect performance.
- Nested PDTs create additional overhead.
- Multiple concurrent users and multiple PDTs can produce overhead on your Vertica database.
You can use the PDT Panel to monitor the PDTs in your application. This panel provides an overview for the current state of all PDTs. For more information, see PDTs in the Looker documentation.
Looker View Files
Looker automatically creates view files from tables in your database. However, Looker does not create view files from view objects in the database. If you need to connect to a view in Vertica, you must manually change the name of the table to correspond with the name of the view in the view file.
For example, suppose you have a Vertica flex table that you populate and calculate. You want to connect to the view of the flex table from Looker. Follow these steps to add a new view file for your table:
- Go into developer mode.
- Navigate to the LookML section.
- Click the + sign.
- Select Create View from Table.
- Select the table or tables you want to create views from by checking the box to the left of the table name.
- Click Create Views.
- Change the table name to the flex view name in Looker.
For more information about flex tables, see the Understanding Flex Tables in the Vertica documentation.
Table Joins and Relationships
Looker guesses relationships and links between fact tables and dimension tables based on column names. Carefully review the LookML model to verify that the relationships Looker created are correct. Manually fix the relationships if they are incorrect in your code.
By default, Looker caches information every five minutes. After data is in the cache, queries return data from the cache instead of requesting the same data from Vertica repeatedly. If the query requests data that is not in the cache, it requests the data from Vertica. Users do not receive updated data for queries that return data in the cache within the five minute window. If the data does not change within the five minute window, then the data is the most up-to-date.
You can configure the cache value. If the data changes frequently within the five-minute window, you can decrease the persistence of the cache. Decreasing the cache value may cause queries to execute more slowly. You can increase or decrease the cache value based on whether or not your data gets updated frequently and if you need the most recent data.
If you want Looker to ignore the cache and execute the query in Vertica, click Clear Cache & Run in the context menu.
For more information, see Query Caching Behavior in Looker.
Looker is highly integrated with GitHub. The GitHub repositories contain LookML files of your Looker project, not the actual data. LookML is a language for modeling dimensions, fields, aggregates, and relationships in SQL. Looker uses models written in LookML to write SQL queries against Vertica.
For an example application of Looker working with Vertica, go to GitHub site for Vertica QuickStart for Looker and search for VerticaQuickStartforLooker. This application contains LookML files from example reports that use the Vertica VMart sample database. You can pull this project from GitHub into your Looker instance. For more information, see the QuickStart documentation.
Avoid using the Calculations button in Looker. This button is used to create table calculations and is not efficient for Vertica users because the calculations are processed in Looker and not in Vertica.
Vertica Tuning Recommendations
Vertica has tuning mechanisms that optimize query performance and leverage the speed of your database. This section details those recommendations.
Create a Physical Design with Database Designer
To get the best performance from your Vertica database, create a physical design for your database that optimizes both query performance and data compression. The Vertica Database Designer automatically optimizes your physical design in the following ways:
- Analyzes your logical schema, sample data, and sample queries
- Creates a physical schema design (projections) that can be deployed automatically or manually
- Can be run any time for additional optimization without stopping the database
- Uses strategies to provide optimal query performance and data compression
Database Designer minimizes the time you spend on manual database tuning and provides the ability to redesign the database incrementally to optimize for changing workloads over time.
For more information, see Workflow for Running Database Designer in the Vertica documentation. If you cannot run Database Designer, for manual database design, following the recommendations in Redesigning Projections for Query Optimization.
Create a Separate Resource Pool for Looker
To verify that resources are allocated and available for Looker and not impacted by other process running on Vertica create a separate resource pool and user for your Looker application. Creating a separate resource pool isolates the Looker-related workload from other workloads. This resource pool can manage the resources assigned to and needed by Looker. The actual impact of using a separate resource pool depends on the amount of memory on the machine and other factors, such as how many other resource pools are created for other users.
When creating this resouce pool, consider the following:
- How much memory is needed exclusively for Looker (MEMSIZE parameter)?
- How much memory can each node borrow from another process (MAXMEMORYSIZE parameter)?
- How many reports need to execute concurrently (MAXCONCURRENCY parameter)? Maxconcurrency refers to the maximum concurrent queries. Do not set this parameter to a value higher than the number of machine cores. Consider that other users pools may be running statements at the same time.
For details on the resource pool parameters, see CREATE RESOURCE POOL in the Vertica documentation.
The following example shows how to create and manage access to the LOOKER_POOL resource pool. This example also shows the commands a database administrator could execute using VSQL:
=> CREATE RESOURCE POOL LOOKER_POOL MEMORY SIZE ‘4G’ MAXMEMORYSIZE ‘84G’
MAXCONCURRENCY 16; => DROP USER looker; => DROP SCHEMA looker_s; => CREATE SCHEMA looker_s; => CREATE USER looker IDENTIFIED BY ‘my_password’ SEARCH_PATH looker_s; => GRANT USAGE ON SCHEMA looker_s TO looker; => GRANT USAGE ON SCHEMA PUBLIC TO looker; => GRANT USAGE ON SCHEMA online_sales TO looker; => GRANT USAGE ON SCHEMA store TO looker; => GRANT SELECT ON ALL TABLES IN SCHEMA PUBLIC TO looker; => GRANT SELECT ON ALL TABLES IN SCHEMA store TO looker; => GRANT SELECT ON ALL TABLES IN SCHEMA online_sales TO looker; => GRANT ALL PRIVILEGES ON SCHEMA looker_s TO looker WITH GRANT OPTION; => GRANT CREATE ON SCHEMA looker_s TO looker; => GRANT USAGE ON RESOURCE POOL LOOKER_POOL TO looker; => ALTER USER looker RESOURCE POOL LOOKER_POOL;
For more information about resource pools, see
Enable Native Connection Load Balancing
Native connection load balancing is a Vertica feature that distributes the overhead of query planning among the nodes. Native connection load balancing also spreads the CPU and memory overhead caused by client connections across the hosts in the database. By default, Vertica does not use native connection load balancing.
If you want to use native connection load balancing, you must enable native connection load balancing on both the server side and the client side. To enable native connection load balancing on the server side, follow the steps in About Native Connection Load Balancing in the Vertica documentation.
To enable connection load balancing on the client side, you must enable it when you create a connection to Vertica. See the HPE Vertica Integration with Looker: Connection Guide for detailed steps about connecting to Vertica.
Enable connection load balancing using the JDBC connection parameter ConnectionLoadBalance=1.
You can pass several parameters one after another, separated by an ampersand, as shown in the following graphic.
For more information about native connection load balancing, watch the Vertica Best Practices: Native Connection Load Balancing video.
Use Live Aggregate Projections
When you create a live aggregate projection for a table, Vertica automatically aggregates data from that anchor table and loads it into the live aggregate projection. Because the data is already aggregated, retrieving the data directly from the live aggregate projection is faster than retrieving it from the anchor table.
For more information, see Live Aggregate Projections in the Vertica documentation.
Use Flattened Tables to Optimize Join Performance
You can speed up your queries significantly by denormalizing your schema using Vertica flattened tables.
Consider using flattened tables if your queries operate on a large fact table and require many joins from multiple dimension tables.
Vertica flattened table functionality allows you to add the columns from your dimension tables into the fact table at load time, avoiding joins at execution time.
Here are some considerations when using flattened tables:
- Denormalized columns utilize additional disk storage because they query the dimension table for its values and materialize them in the flattened table.
- Data in denormalized columns does not count towards your raw data license limit.
- Data changes in the source tables are not automatically propagated to the flattened table. If data changes in the source tables, your flattened table won’t be immediately updated.
- Denormalized columns created as DEFAULT columns are populated with every new INSERT into the flattened table.
- Denormalized columns created as SET USING columns allow you to decide when to refresh the denormalized data. You can refresh the flattened table at a convenient time with minimal impact to your database.
For more information, see Flattened Tables in the Vertica documentation.
For More Information
- Looker website
- Looker documentation
- Using Looker to Leverage Vertica Capabilities: ORC Reader, Flex Tables, and Place
- Vertica Integration with Looker: Connection Guide
- Technology Partner Integrations
- Vertica QuickStart for Looker
- Looker-Vertica Demo
- Vertica Community Edition
- Vertica Documentation