|Share this article:|
Vertica Integration with Dataiku: Connection Guide
Applies to Vertica 7.2.x and earlier
About Vertica Connection Guides
Vertica connection guides provide basic information about setting up connections to Vertica from software that our technology partners create. These documents provide guidance using one specific version of Vertica and one specific version of the third-party vendor’s software. Other versions of the third-party product may work with Vertica. However, other versions may not have been tested. This document provides guidance using the latest versions of Vertica and Dataiku Data Science Studio as of November, 2015.
Dataiku Data Science Studio (DSS) is an analytic workbench that allows data scientists to build an end-to-end workflow that transforms raw data into visualizations of predictions. For more information, view a sample use case that shows how Dataiku Data Science Studio used Medicare data stored in Vertica for analysis and prediction.
This document is based on the results of testing Vertica 7.2.x with Dataiku Data Science Studio 2.0.1.
Download and Install Dataiku
Dataiku Data Science Studio is a web-based application available for Linux. A beta version is available for Mac OS X but is not recommended for a production environment. Data Science Studio uses the JDBC driver to connect to Vertica and is compatible with Chrome and Firefox.
Before you install Dataiku Data Science Studio, review the requirements for installing on Linux.
Download the latest version of Dataiku Data Science Studio that corresponds to your Linux distribution and architecture. After the download is complete, follow the instructions for installation.
Download and Install the Vertica Client Drivers
Before you can connect to Vertica using Dataiku Data Science Studio, you must download and install the Vertica client package. The
.rpm packages both contain the 32-bit and 64-bit versions of the client package.
Download Vertica Client Drivers
- Go to the Vertica Client Drivers
- Download the version of the Vertica client package that is compatible with the architecture of your operating system and Vertica server version.
Note Vertica drivers are forward compatible, so you can connect to the Vertica server using previous versions of the client. For more information about client and server compatibility, see Client Driver and Server Version Compatibility in the Vertica documentation.
Install Vertica Client Drivers
Based on the client package you downloaded, follow the steps for installation from the Vertica documentation.
Place the Client .jar File in an External Library Product Directory
For Data Science Studio to connect to Vertica through the JDBC driver, you need to place the
.jar file you downloaded with the client package into the product directory for external libraries.
Note Do not modify the CLASSPATH.
You must stop Data Science Studio while installing the JDBC drivers. Navigate to the directory where Data Science Studio is installed, which by default is DATA_DIR. Stop the application using the following command line:
$ DATA_DIR/bin/dss stop
Locate the Vertica JDBC
.jarfile from the driver location.
X.Xwith the version of your Vertica database.
Copy the Vertica driver’s .jar file into the DATA_DIR/lib/jdbc folder.
For example, on Linux Centos with a user called Dataiku:
X.Xwith your Vertica database version.
Restart Data Science Studio with the following command:
$ DATA_DIR/bin/dss start
Connect to Vertica from Dataiku
- Open Dataiku from your web browser.
- Click Create a New Project.
- In the upper right corner of the screen, click the gear button.
- Click Connections > New Connection and select HP Vertica.
Enter your connection information. Data Science Studio automatically tests your connection. The following fields are required:
- Connection name
- Click Create.
- Use this connection to explore data stored in Vertica.
Creating a Dataset
After you have an established connection, follow these steps to create a dataset:
From your project screen, click Datasets.
- Click the New Dataset icon.
- From the drop-down menu, select HP Vertica.
- On the Connection tab, enter the following required fields:
- Connection: Your connection to HP Vertica
- Mode: Choose connect to a table or write a query
- Table: Table name
- Schema: Schema name
Click Test to see a preview of the data.
- Enter a dataset name and click Create.
Data Type Limitations
Dataiku supports and correctly displays all Vertica data types. However, you might see the following behavior when you preview the data:
- Dataiku truncates CHAR, VARCHAR, and LONG VARCHAR values with more than 32,767 characters to 32,767 characters.
- Dataiku might not support TIMETZ and TIMESTAMPTZ values.
- BINARY, VARBINARY, and LONG VARBINARY values are displayed in hexadecimal format.
You might see the following behavior when you load data into Vertica:
- Empty values are loaded as NULL.
- All date values must have a time zone. Date values that are not assigned a time zone default to UTC.
- TIMETZ values might be loaded on the client time zone.
- BINARY, VARBINARY, and LONG VARBINARY values are loaded in the VARCHAR hexadecimal format.
- If you have a string that is longer than 16,200 characters, change the Table Creation Mode (located in Settings > Advanced) from Automatically generate to Manually define to load all the characters.
- Interval values are loaded as VARCHAR. To change the value, change the Table Creation Mode (located in Settings > Advanced) from Automatically generate to Manually define and change the value to Interval.
For More Information
|For More Information About…||…See|
Vertica Community Edition
Big Data and Analytics Community