Share this article:

Vertica Integration with Informatica: Connection Guide

For Vertica 7.x 

About Vertica Connection Guides

Vertica connection guides provide basic information about setting up connections to Vertica from software that our technology partners create. These documents provide guidance using one specific version of Vertica and one specific version of the third-party vendor’s software. Other versions of the third-party product may work with Vertica. However, other versions may not have been tested.

About This Document

This document provides guidance using the latest versions of Vertica and Informatica as of September, 2015.

Informatica PowerCenter Overview

Informatica offers the PowerCenter platform for connecting to Vertica. PowerCenter is a scalable, high-performance enterprise ETL platform that provides the following major capabilities:

  • Data delivery: Performing ETL (extract, transform, load) and ELT (extract, load, transform) operations, moving data among various data sources
  • Data transformation: Basic, intermediate, and complex data transformation
  • Design and development environment: Team-based development capabilities
  • Metadata and modeling: Physical-to-logical data mapping, including graphical attribute-level mapping

PowerCenter 9.6.1 HotFix 2 includes the PowerExchange (PWX) Connector for Vertica. The PWX Connector for Vertica includes additional capabilities and performance improvements when connected to your Vertica database.

This document assumes that the reader is familiar with both Informatica PowerCenter and Vertica.

For this document, Hewlett Packard Enterprise tested connecting to Vertica 7.x using Informatica PowerCenter 9.6.1 HotFix 2.

History of Integration Between Informatica and Vertica

Prior to PowerExchange (PWX) for Vertica, Hewlett-Packard developed and supported two connectors for Informatica and Vertica:

  • In 2009, Hewlett-Packard developed the Vertica Plug-in for Informatica. This connector used the native method of loading data from Informatica into Vertica.
  • In 2013, Hewlett-Packard replaced the earlier Vertica Plug-in for Informatica with a Java plug-in that supports all operating system platforms. This plug-in runs on generic JDBC and ODBC connections and includes new and improved features compared to the native plug-in.

In 2014, Informatica released PWX Connector for Vertica. This connector includes enhancements to the partitioning and pushdown capabilities of Informatica. Hewlett Packard Enterprise recommends that you use the new PWX Connector for Vertica with Informatica PowerCenter 9.6.1 HotFix 2 to connect to your Vertica database.

The following timeline shows the history of plug-ins for connecting from Informatica to Vertica:

history_informatica_vertica.PNG

PowerExchange Features for Vertica

IT organizations can access data sources without having to develop custom data access utilities using a high-performance tool—PowerExchange Connector for Vertica .

PowerExchange Connector for Vertica provides connectivity between Informatica PowerCenter and Vertica.  PWX uses the Vertica ODBC driver to write large volumes of data into your Vertica database.

The major features of PWX for Vertica include:

PWX for Vertica Feature Description

Bulk mode or relational mode for processing Vertica data

Bulk mode: Write large amounts of data to Vertica from multiple data sources. Bulk mode is only available for writing to Vertica.

Relational mode: Read data from a Vertica source and write data to a Vertica target. Relational mode supports pushdown optimization.

Design-time features

  • Preview source and target data before mapping.
  • Generate and execute DDL on Vertica.
  • Import Vertica tables.

Run-time features

  • Set the commit type, interval sources, and interval targets.
  • Update strategy: Data-driven approach to inserting, deleting, and updating data in Vertica.

Partitioning support

  • Key range: One or more ports make up a partition key.
  • Round robin: Data is distributed to one or more partitions.
  • Hash-based: Data is distributed to partitions in groups.

Uses Vertica COPY LOCAL statement for various partitions in the mapping pipeline.

SQL transformation

Flexibility to process logic per input row.

Lookup (cached and uncached)

Supports Vertica lookup in the existing data integration logic.

Connection resiliency

High availability

Compatibility with Vertica Java Plug-in for Informatica

Support for existing features

SQL pushdown

  • Source-side
  • Target-side
  • Full pushdown optimization

Supported pushdown transformations

  • Aggregator, filter, joiner, union, sort, router, and lookup transformation as source.
  • Expression transformation as source, target, and full.
  • Update transformation as full. (When the source and target database are the same, all transformations are pushed to the target.)

Configure and Connect to Vertica using the PWX Connector

PowerCenter includes the PowerExchange for Vertica connector. Informatica supports this connector from Informatica PowerCenter 9.6.1 HotFix 2 onwards. 

Install and Configure PowerExchange for Vertica

Before you start these procedures, you must install Vertica using the instructions in Installing Guide.

To install and configure PowerExchange for Vertica , perform the following steps on your Informatica server:

  1. Ask Informatica Support for a license file that includes PowerExchange for Vertica license.
  2. Ask Informatica Support for the Informatica PowerCenter 9.6.1 HotFix 2 software.
  3. Using the license file that contains PWX for Vertica , install PowerCenter 9.6.1 HotFix 2.
  4. Install the Vertica ODBC driver (7.0 and higher). For Vertica driver/server compatibility, review the information in Vertica Analytic Database Driver/Server Compatibility in the Vertica documentation.
  5. Create a configuration file named vertica.ini. This file defines the Vertica specific settings required by the ODBC drivers. Here’s an example vertica.ini file:
[VerticaDriverName]
 DriverManagerEncoding=UTF-16
 ODBCInstLib=<Informatica install Dir>/ODBC7.1/lib/libodbcinst.so
 ErrorMessagesPath=/opt/vertica/lib64 ## Installed Vertica Client Directory
 LogLevel=4
 LogPath=/tmp

For more information about the content of the vertica.ini file, see Setting ODBC Driver Settings for Linux and UNIX-like Platforms in the Vertica documentation. 

  1. Set the VERTICAINI environment variable to the path to the vertica.ini file.
  2. To use the PowerCenter bulk mode, register the PWX for Vertica connector:
  3. Make sure you have the PWX for Vertica XML file:
    <Informatica_installation_folder>\server\bin\plugin\VerticaConnector.xml
  4. Follow the instructions to register the plug-in on the Informatica server in Registering the Plug-In's Metadata. You must specify Verticaconnector.xml as the plug-in file.
  5. After you have registered the plug-in, in the Repository, on the Plug-Ins tab, make sure you see the line that reads “PowerExchange for Vertica ”.
  6. Before you continue, on the Repository Properties tab, you must switch the Operating mode back to Normal mode.

For detailed instructions about registering PWX for Vertica , see the Informatica PowerExchange for Vertica —User Guide for PowerCenter (Version 9.6.1 Hotfix 2 and above) document, available from Informatica Support.

Configure Vertica as Source and Target

Use PowerCenter Designer to import Vertica source and target definitions.

Before you import a source or target definition, you must create an ODBC data source for your Vertica database:

  1. To configure an ODBC data source on Windows, set up a DSN for your Vertica database(s) using the Informatica Control Panel. Under Administrative Tool, click Data Source (ODBC). Make sure to test the connection to verify that you can connect.
  2. To configure an ODBC data source on Linux or other non-Windows platform, add the Vertica database entries to the <Informatica_Installation_Directory>\ODBC7.1\odbc.ini file. Here is an example odbc.ini file:
[ODBC Data Sources]
vertica_odbc=libverticaodbc.so
[ODBC]
InstallDir=/opt/infa/Informatica/9.6.1/ODBC7.1
Trace=0
TraceFile=/opt/infa/Informatica/9.6.1/odbctrace.out
TraceDll=/opt/Infa/Informatica/9.6.1/ODBC7.1/lib/DWtrc27.so

[vertica_odbc]
Description = Vmart Database
Driver = /opt/vertica/lib64/libverticaodbc.so
Database = <database_name>
Servername = <server_name>
UID= <user_ID>
PWD= <password>
Port = <port_number>
ConnSettings=
SSLKeyFile=
SSLCertFile=
Locale=UTF-8
  1. Set the ODBCINI environment variable to the path to the ini file.
  2. Using Informatica Designer, import the Vertica source and target definition. In the Source Analyzer, on the Source menu, select Import from the Database.source_import_table.png
  3. Select the tables you want to import and click OK. The table names you imported now include the (ODBC) designation.

     source_in_mapping_designer.png

  1. In the Target Analyzer, on the Target menu, select Import from the Database.
  2. Select the tables you want to import and click OK. The table names you imported include the (ODBC) designation.

Create a Mapping

After you have configured the source and target tables, use PowerCenter Designer to create a mapping and create the desired transformations: 

  1. Open the Mapping Designer.
  2. On the Mapping menu, select Create.
  3. Name the new mapping and click OK.
  4. Drag the source and target tables into the Mapping Designer. Create the transformation between the source and the target.

    mapping_in_designer.png

  5. Save the mapping.

Create a Workflow

To create a workflow, you can use the Workflow Manager or the Mapping Designer. The following example shows how to use the Mapping Designer to create the workflow and configure the PWX Connector:

  1. In the Mapping Designer, right-click and select Generate Workflow. The Workflow Generation wizard opens.generate_workflow.png
  2. Step 1: Select the desired Workflow Generation Option and click Next.
  3. Step 2: From the drop-down list, select the Informatica Integration Service you want to run the workflow on.  If needed, under Connection Object, change the connection setting for your Vertica database. Click Next.
  4. Step 3: Make any necessary changes to the workflow settings and click Next.
  5. Step 4: When you see that the workflow generated successfully, click Finish.
  6. Make changes to the connection, memory, properties, and other settings using the Workflow Manager.
  7. After you have saved the changes, right-click the folder for the connector and click Connect.
  8. In the Workflow Manager, reconnecting to the folder allows you to view the newly created workflow.

Configure the Workflow

  1. Drag the workflow into the Workflow Designer.
  2. Double-click the task. The Edit Tasks window opens.
  3. Select the Mapping tab.configure_mapping.png
  4. In the left-hand pane, select your source. Under Readers, the value should be Relational Reader. Under Connections > Value, check that the Vertica connector is listed.rekational_reader.png
  5. If needed, under Properties, change the attribute values.
  6. In the left-hand pane, select your target. Under Writers, the value should be Relational Writer. Under Connections > Value, check that the Vertica connector is listed.relational_writer.png
  7. If needed, under Properties, change the attribute values.
  8. In the Edit Tasks dialog box, select the Properties Change the Commit Interval from the default (10000) to 100000, 1000000, or 10000000, depending on how many data rows you are loading. When writing large amounts of data to Vertica, larger commit intervals can improve the load performance.properties_commit_interval.png
  9. Make any other desired changes to the attributes. When you have completed your changes, click Apply and OK.
  10. Save the workflow.

Run the Workflow

  1. To run your workflow, right-click in the Workflow Manager and select Start Workflow.start_workflow.png
  2. To check the status of your running workflow, open the Workflow Monitor and select the folder for your workflow. The Workflow Monitor displays the status of the running workflow. The following window shows that the wf_m_demo_mapping workflow is running (no errors have occurred) and another workflow has completed successfully.workflow_log_in_progress.png
  3. When the running workflow completes, the Status reads either Succeeded or Failed.workflow_log_complete.png
  4. To review the workflow logs, right-click s_m_<mapping_name> and select Get Session Log.
  5. Review the session logs. Here is a sample from a successful workflow run:session_log.png

The session logs list any error or issue that occurred while the workflow was running. The session logs also show the number of rows that the workflow committed to the Vertica target, depending on the value of the commit interval.

Pushdown Optimization

PowerExchange for Vertica supports three types of pushdown optimization: 

  • Source Side—PowerCenter pushes certain transformations to the source database only. Use source-side pushdown optimization where you are downloading data into a file from your Vertica database.
  • Target Side—PowerCenter pushes certain transformations to the target database only. Use target-side pushdown optimization when you want to load data from a file or other database into Vertica.
  • Full—PowerCenter pushes all tranformations to the target database when the source and target database are the same. This situation requires a relational mode connection. Use full pushdown optimization to load data from Vertica, use Informatica to transform the data, and then write it back to the database.

Pushdown Considerations

Consider the following when using pushdown optimization with Informatica and Vertica:

  • Vertica strips the padding spaces from CHAR column values when you push down a function that takes a CHAR column as an argument.
  • Pushdown compatibility for connection properties have to be identical for the source and target databases. The connection properties are:
    • Code Page
    • Connect String
    • Connection environment SQL
    • Transaction environment SQL
  • Qualifying name for a table is <database_name>.<schema_name>.<table_name>

Support for Pushdown Optimization Expressions

Operators

+ - * /

%

||

= > < >= <= <>

!=

^=

NOT   AND   OR 

Variables

SESSSTARTTIME

SYSDATE

Functions

Functions listed with a * can be pushed to Vertica using source-side pushdown optimization.

ABS() FLOOR() MIN() SYSDATE()

ADD_TO_DATE()

GET_DATE_PART()

MOD()

SYSTIMESTAMP()

ASCII()

IIF()

POWER()

TAN()

AVG()

INITCAP()

ROUND(DATE)*

TANH()

CEIL()*

INSTR()

ROUND(NUMBER)*

TO_BIGINT

CHR()

ISNULL()

RPAD()

TO_CHAR(DATE)

CONCAT()

LAST_DAY()

RTRIM()

TO_CHAR(NUMBER)

COS()

LENGTH()

SIGN()*

TO_DATE()

COSH()

LN()

SIN()

TO_DECIMAL()

COUNT()

LOG()*

SINH()

TO_FLOAT()

DATE_COMPARE()

LOOKUP

SOUNDEX()

TO_INTEGER()

DATE_DIFF()

LOWER()

SQRT()

TRUNC(DATE)*

DECODE()

LPAD()

STDDEV()

TRUNC(NUMBER)*

EXP()

LTRIM()

SUBSTR()

UPPER()

MAX()

SUM()

VARIANCE()

Transformations

PWX for Vertica supports the following transformations:

Transformation Pushdown type

Aggregator transformation

Source-side, Full

Expression transformation

Source-side, Target-side, Full

Filter transformation

Source-side, Full

Joiner transformation

Source-side, Full

Lookup transformation

Source-side, Full

Router transformation

Source-side, Full

Sorter transformation

Source-side, Full

Source qualifier transformation

Source-side, Full

Target transformation

Target-side, Full

Union transformation

Source-side, Full

Update strategy transformation

Full

For more details, see the Informatica PowerCenter—Advanced Workflow Guide (Version 9.6.1 HotFix 2 onwards).

For More Information

For More Information About… …See

Tips and techniques for optimizing your Vertica and Informatica connection

Vertica and Informatica: Tips and Techniques

Informatica

http://www.informatica.com/

PowerCenter

https://www.informatica.com/products/data-integration/powercenter.html#fbid=4SOHvlwepTa

Vertica Community Edition

https://my.vertica.com/community/

Share this article: