Share this article:

Vertica Integration with Pentaho Business Analytics: Connection Guide

Applies to Vertica 8.0.x and earlier 

About Vertica Connection Guides

Vertica connection guides provide basic information about setting up connections to Vertica from software that our technology partners create. These documents provide guidance using one specific version of Vertica and one specific version of the third party vendor’s software. Other versions of the third-party product may work with Vertica. However, other versions may not have been tested. This document provides guidance using the latest versions of Vertica and Pentaho as of November, 2016.

Pentaho BI Server Overview

Pentaho is a platform that offers tools for data movement and transformation, as well as discovery and ad hoc reporting with the Pentaho Data Integration and Pentaho Business Analytics products. This guide focuses on the business analytics component of the platform. For more information about ETL, see the Pentaho DI Tips and Techniques document.

Pentaho offers an open source as well as an enterprise edition of their products. You can use the community edition free of charge and upgrade to the enterprise edition if required. You can download the Business Analytics Platform from the Pentaho Community site.

Pentaho connects to your Vertica database in the cloud or on-premises using the Vertica JDBC driver. You must install the Vertica JDBC driver that is compatible with your Vertica server version.

This document is based on the results of testing Pentaho Business Analytics Community Edition 6.1 on Microsoft Windows Server 2012 (64-bit) with Vertica 8.0.

Prerequisites

Install and Configure a JRE

Before you can connect Pentaho BA to Vertica, you must install and configure a Java Runtime Environment (JRE) that is compatible with both Apache Tomcat and your version of Pentaho:

  1. Download Java JDK 1.7 from the Oracle site.
  2. Install Java JDK 1.7.
  3. Set the JAVA_HOME environment variable.
  4. Verify the JAVA_HOME environment variable is set correctly using the ECHO command on the Windows command prompt:
echo%JAVA_HOME%
C:\Program Files\Java\jre7

VMart Example Database

To follow the examples used in this document, install the Vertica example database, VMart by following the instructions in the Vertica documentation.

Download and Install Pentaho Business Analytics (BA) Platform

Download Pentaho BA Platform

Follow these steps to download and install the Pentaho Community Edition Business Analytics Platform. The BA platform consists of an Apache Tomcat web and Java Servlet application server that operates in port number 8080.

  1. Navigate to http://community.pentaho.com/.
  2. Scroll down to the Downloads section.
  3. Expand the Business Analytics section and click the All OS button.
  4. Save the downloaded .zip file to your computer.

Set up Paths and Environment Variables

  1. Unzip the file inside a location of your choice, such as C:\PentahoCE\.
  2. Open a Windows Command Prompt and change to the folder containing the Pentaho Community Edition you unzipped. For example, C:\PentahoCE\biserver-ce.
  3. Execute the set-pentaho-env.bat program to set up the necessary paths and environment variables that Pentaho needs to work properly. The set-pentaho-env.bat program is located in the directory where your Pentaho Community Edition is installed. In this example, that location is C:\PentahoCE\biserver-ce.

The final two lines of output should be:

D:\Source\PentahoCE\biserver-ce>echo 
DEBUG: _PENTAHO_JAVA_HOME=C:\Program Files\Java\jre7 
DEBUG: _PENTAHO_JAVA_HOME=C:\Program Files\Java\jre7 D:\Source\PentahoCE\biserver-ce>echo 
DEBUG: _PENTAHO_JAVA=C:\Program Files\Java\jre7\bin\java.exe 
DEBUG: _PENTAHO_JAVA=C:\Program Files\Java\jre7\bin\java.exe

This output indicates that the _PENTAHO_JAVA_HOME and _PENTAHO_JAVA variables were set correctly.

If you see a message that a JRE cannot be found, verify that JRE 1.7 was installed and that the JAVA_HOME environment was set up correctly.

Startup Pentaho BA Server

To initiate the Pentaho BA server, you must execute the start-pentaho.bat program. This script starts Apache Tomcat on port 8080 and starts the HSQLDB server. Follow these steps to execute the script:

  1. Open a Windows console.
  2. Navigate to the directory where your Pentaho Community Edition is installed. In this example, C:\PentahoCE\biserver-ce.
  3. Execute the start-pentaho.bat program.
  4. If it is the first time you are initiating the server, the program prompts you with a message about Version Control. Click OK to continue.
  5. To connect to Pentaho, open a web browser and connect to the machine running Pentaho BA server on port 8080, such as: http://localhost:8080/pentaho/Login
  6. The Pentaho user console opens. Login as an administrator using these credentials:
    • username: admin
    • password: password

Download Vertica Client Drivers

Pentaho BA connects to Vertica using the JDBC driver. Follow these instructions to download the JDBC driver:

  1. Navigate to the Vertica Client Drivers page.
  2. Download the version of the JDBC driver that is compatible with the architecture of your operating system and Vertica server version.
  3. Copy the JAR file you downloaded.
  4. Locate the directory where your Pentaho Community Edition is installed. In this example, C:\PentahoCE\biserver-ce.
  5. Paste the Vertica JAR file in the folder \tomcat\lib.
  6. Restart the Pentaho BA server.

Note Vertica drivers are forward compatible, so you can connect to the Vertica server using previous versions of the client. For more information about client and server compatibility, see Client Driver and Server Version Compatibility in the Vertica documentation.

Connect Pentaho BA to Vertica

  1. With the Pentaho BA server running, open a web browser and navigate to http://localhost:8080/pentaho/Login. This is the default BA server URL where the Tomcat server is running.
  2. The Pentaho User Console opens. Click Login as an Evaluator.
  3. Under Administrator, click Go.
  4. Login using the username "admin" and the password "password".
  5. To set up a data source that points to your Vertica database, click the Manage Data Sources button.
  6. Click the gear and select New Connection.
  7. Name the connection and enter the following information to connect to your database:
    • Database type: Vertica 5+
    • Access: Native (JDBC)
    • Host Name: Vertica server name or IP address.
    • Database Name: Your database name, such as vmart.
    • Port Number: The default port is 5433.
    • User Name: Your Vertica user name.
    • Password : Your database password. databaseconnection.png
  8. To verify the connection to Vertica is successful, click Test.
  9. In the left panel, click Options.
  10. Enter advance connection properties to enable JDBC settings, such as session label and connection load balancing.  databaseconnectionoptions.png
  11. Click OK to close the window. A new connection is listed.

Using Pentaho Community Dashboard Editor

After you create a connection to Vertica, you can use Pentaho Community Dashboard Editor (CDE) to create visualizations of your data stored in Vertica.

To open Pentaho CDE, follow these steps:

  1. Login to the Pentaho User Console using the browser http://localhost:8080.
  2. On the home page, click the Create New > CDE Dashboard. The CDE environment page opens.

The CDE environment has four main views you can use to build your dashboard:

  • Layout panel: This view lets you define the structure and organization of the dashboards in terms of titles, panels for visuals, headers, footers, and spacers.
  • Components panel: This view contains the collection of objects that you use to display data in the dashboard, such as grids and charts.
  • Data sources panel: This view contains the collection of objects that you can use to retrieve data from your database. For some objects you can use the connection you previously created to connect to Vertica.
  • Preview of your dashboard panel: This view displays a preview of the dashboard you are building.

Create a Dashboard

The following example explains how to create a dashboard named Inventory Overview, as shown in the following graphic: inventoryoverview.png

The queries used in these examples use the inventory data in the VMart example database. The first query is the Average Inventory in Stock Overtime and corresponds to the chart on the left of the Inventory Overview dashboard image:

=> SELECT
TO_CHAR(date_dimension.date, 'YYYY-MM') || '-01' AS Date,AVG(inventory_fact.qty_in_stock) as "Average quantity in stock"
FROM public.inventory_fact as inventory_fact
INNER JOIN public.date_dimension as date_dimension
ON inventory_fact.date_key = date_dimension.date_key
INNER JOIN public.warehouse_dimension as warehouse_dimension
ON inventory_fact.warehouse_key = warehouse_dimension.warehouse_key
WHERE TO_CHAR(date_dimension.date, 'YYYY-MM-DD') >= '2005-01-01' AND TO_CHAR(date_dimension.date, 'YYYY-MM-DD') <= '2007-12-31'
GROUP BY 1
ORDER BY 1

The second query is the Average Inventory Overtime by Warehouse Name and corresponds to the chart on the right of the Inventory Dashboard image:

=> SELECT
TO_CHAR(date_dimension.date, 'YYYY-MM') || '-01' as Date, warehouse_dimension.warehouse_name as Warehouse,
AVG(inventory_fact.qty_in_stock) as "Average quantity in stock"
FROM public.inventory_fact as inventory_fact
INNER JOIN public.date_dimension as date_dimension
ON inventory_fact.date_key = date_dimension.date_key
INNER JOIN public.warehouse_dimension as warehouse_dimension
ON inventory_fact.warehouse_key = warehouse_dimension.warehouse_key
WHERE TO_CHAR(date_dimension.date, 'YYYY-MM-DD') >= '2003-01-01' AND TO_CHAR(date_dimension.date, 'YYYY-MM-DD') <= '2005-12-31'
GROUP BY 1, 2
ORDER BY 1

Follow these steps to create this dashboard:

Step 1: Create a Data Source

You must first create two data sources, one for each query that displays on the Inventory Overview dashboard.

  1. Click the Data Sources panel.
  2. Click SQL Queries.
  3. Select either sql over sqljndi or sql over sqljdbc. For best results, choose sql over sqljndi.
  4. If you choose sql over sqljndi, in the Properties dialog, enter the following:
    • Name: Name your connection. In this example, sqljndi1.
    • Jndi: Select the connection JNDI you created when connecting to Vertica.
    • Query: Your Vertica query. This example uses the Average Inventory Overtime query listed in the preceding section.

    If you choose sql over sqljdbc, in the Properties dialog, enter the following:
    • Name: Name your connection. In this example, sqljdbc1.
    • Driver: com.vertica.jdbc.Driver
    • Password: Your Vertica database password.
    • User name: Your Vertica user name.
    • URL: The JDBC connection URL, following the format: jdbc:vertica://my_vertica_server:<my_vertica_port>/<my_database_name>
    • Query: Your Vertica Query. This example uses the Average Inventory Overtime query listed in the preceding section.
  5. Repeat steps 1 - 5 for the second query, called Average Inventory Overtime By Warehouse.

Step 2: Specify your Dashboard Layout

  1. Click on the Layout panel.
  2. Click Apply Template.
  3. Select a built-in template, such as 2 columns template.
  4. To change the title of your dashboard, click the layout element Header > Html and change the property HTML to Inventory Overview.

You now have a layout with two panels. Panel_1 corresponds to chart 1 for the query Average Inventory Overtime. Panel_2 corresponds to chart 2 for the query Average Inventory Overtime by Warehouse.

Step 3: Create a Visualization

In this step, you create two visualizations:

  • A bar chart that displays the Average Inventory Overtime query
  • A line chart that displays the Average Inventory Overtime by Warehouse name query

The following graphic shows the Average Inventory Overtime bar chart. The X axis indicates time and the Y axis indicates the measure of average quantity in stock:

barchart.png

To create the bar chart, follow these steps:

  1. Click the Components panel.
  2. Click Charts.
  3. Add a CCC bar chart.
  4. Enter the following properties:
    • Name: Name the bar chart. In this example, bc1.
    • Title: Enter a title that displays on the dashboard.
    • Datasource: The data source object created in step 1.
    • HtmlObject: The panel number on the layout panel where the chart is displayed.

The following graphic shows the Average Inventory Overtime by Warehouse Query. The X axis indicates time and the Y axis indicates the measure of average quantity in stock. Each warehouse is indicated by a separate line.

linechart.png

To create the line chart, follow these steps:

  1. Click the Components panel.
  2. Click Charts.
  3. Add a CCC line chart.
  4. Enter the following properties:
    • Name: Name the line chart. In this example, lc1.
    • Title: Enter a title that displays on the dashboard.
    • Datasource: The data source object created in step 1.
    • HtmlObject: The panel number on the layout panel where the chart is displayed.
    • crosstabMode: Change the value to false. The default value is true.
    • seriesInRows: Change the value to true. The default value is false. For this chart, you need one line for each warehouse name.
    • timeSeries: Change the value to true. The default value is false.

Step 4: Preview the Dashboard

  1. Save the dashboard.
  2. To visualize the data, click Preview your Dashboard.

Add Parameters to your Dashboards

You can use input controls to make your dashboard dynamic and more interactive. In the following example, you will learn how to improve the dashboard built in the previous section by adding a start date and end date input that the user can change. This example also shows how you can pass the input from the user as parameters to a query in Vertica.

Step 1: Modify Layout

First, you must modify the layout of your panel to include parameters:

  1. Click the Layout panel.
  2. Between the header and the body, add a new row and two new columns.
  3. Change the column names to StartDateParam and EndDateParam.

Step 2: Create Date Parameters

  1. Click the Components panel.
  2. Click Generic.
  3. To add the component, click the Date parameter.
  4. In the properties field enter the following:
    • Name: Enter a component name, such as StartDate.
    • Property date value: Specify a default value to display.
  5. To add another component, click the Duplicate symbol.
  6. In the properties field, enter the following:
    • Name: Enter a component name, such as EndDate.
    • Property date value: Specify a default value to display.

Step 3: Create Date Inputs

  1. Click the Components panel.
  2. Click Select.
  3. Click Date input Component
  4. In the properties field, enter the following:
    • Name: Name your Date input component, such as Start_Date.
    • Parameter: Select the name of the date parameter you created in the previous section.
    • HtmlObject: The name of the column in the layout panel, such as StartDateParam.
  5. Repeat steps 1-4 for the End_Date component.

Step 4: Apply Date Parameters to Charts

  1. In the Components panel, select the chart for which you want to apply the date parameter component. In this example, we chose the CCC line chart.
  2. In the properties field, enter the following:
    • Name: In this example, lc1.
    • Title: In this example, Avg qty in stock by warehouse.
    • Listeners: Select both StartDate and EndDate from the dropdown menu.
    • Parameters: Enter the parameters by specifying names and values
      • Arg: An arbitrary name. In this example, SD and ED. These must be the same names used in the Parameters property in the Datasource component.
      • Values: Select the name of the date parameter created previously. In this example, StartDate and EndDate.
    • Datasource: In this example, sqljndi2.
    • HtmlObject: The place in the layout panel where you display your chart. In this example, Panel_2.

  3. Repeat steps 1-2 for your bar chart.

Step 5: Apply Date Parameters to Queries

  1. Click the Data Sources panel.
  2. Click the sql over sqljndi component you created with the name sqljndi.
  3. In the properties field, enter the following:
    • Name: sqljndi
    • Jndi: The JNDI connection you created when connecting to Vertica.
    • Query: Your Vertica query, as shown in the following example. The parameters in the query use the syntax $(parameter_name).
      => SELECT 
      TO_CHAR(date_dimension.date, 'YYYY-MM') || '-01' as Date,warehouse_dimension.warehouse_name as Warehouse
      AVG(inventory_fact.qty_in_stock) as "Average quantity in stock"
      FROM public.inventory_fact as inventory_fact
      INNER JOIN public.date_dimension as date_dimension
      ON inventory_fact.date_key = date_dimension.date_key
      INNER JOIN public.warehouse_dimension as warehouse_dimension
      ON inventory_fact.warehouse_key= warehouse_dimension.warehouse_key
      WHERE TO_CHAR(date_dimension.date,'YYYY-MM-DD')>=${SD}AND
      TO_CHAR(date_dimension.date,'YYYY-MM-DD')<=${ED}
      GROUP BY 1,2
      ORDER BY 1
      
    • Parameters: Enter the following parameters to specify names and values:
      • Name: In this example, SD and ED. These must be the same names used in the Parameters property in the CCC line chart component.
      • Values: Select the name of the date parameter created previously. In this example, StartDate and EndDate.
      • Type: Leave as string.
  4. Repeat steps 1-3 for the sql over sqljndi component you created with the name sqljndi2.

Step 6: Preview your Dashboard

  1. Save the dashboard.
  2. Click Preview your Dashboard.

You can change the start and end dates and specify a different time period by using the date calendar pickers. The charts dynamically update when you change the values on each date input control.

Data Type Limitations

The following section describes data type limitations:

  • Pentaho BA displays binary, varbinary, and long varbinary data types in a different format than Vertica.
  • Pentaho BA does not display time zone offset for the TIMESTAMPTZ data type.
  • Pentaho BA does not display milliseconds for the TIME data type.
  • Pentaho BA displays TIMETZ values in a different time zone.
  • Pentaho BA displays a maximum of 16 digits for numeric values.
  • Pentaho BA truncates large numeric values to 16 digits. For example, 79228162514264337593543950335 is displayed as 7.922816251426434e+28.
  • Pentaho BA truncates large integer values to 16 digits.
  • Pentaho BA truncates digits to the right of a decimal point. For example, 7.9228162514264337593543950335 is displayed as 7.9228162514264335.
  • Pentaho BA rounds some decimal values. For example, -99999999999999.9999 rounds to -100000000000000
  • Pentaho BA does not support NaN, infinity and -infinity float values.

For More Information

For More Information About… … See
Pentaho http://www.pentaho.com/
Vertica Community https://my.vertica.com/community/
Vertica Documentation http://my.vertica.com/docs/latest/HTML/index.htm
Big Data and Analytics Community https://my.vertica.com/big-data-analytics-community-content/

Share this article: