Vertica Integration with Informatica: Tips and Techniques

About Vertica Tips and Techniques

Vertica Tips and Techniques documents provide you with the information you need to use Vertica with third-party products. This document provides guidance using one specific version of Vertica and one version of the vendor's software. While other combinations are likely to work, they may not have been tested.

About Informatica

Informatica offers a scalable and high-performance enterprise ETL platform—PowerCenter—that provides the following major capabilities:

  • Data delivery capabilities—Performing ETL and ELT operations, moving data among various data sources
  • Data transformation capabilities—Basic, intermediate, and complex data transformation
  • Design and development environment—Team-based development capabilities
  • Metadata and modeling capabilities—Physical-to-logical data mapping, including graphical attribute-level mapping

Tips & Techniques for Using the PowerExchange Connector for Vertica

We recommend that you use the PowerExchange (PWX) for Vertica connector. This connector, specific for Vertica, provides similar capabilities to other PWX connectors. These capabilities include:

  • Design-time and run-time features
  • Pushdown optimization (source side, target side, and full)
  • Support for Vertica-specific features, including:
    • Writing data to disk directly with Direct COPY option
    • Capability to use native connection load balancing to parallelize loading data throughout an Vertica cluster

To get the best performance using PowerExchange with Vertica, consider the tips in the following sections.

Enable Load Balancing in the ODBC Connection

Native connection load balancing is a feature built into the Vertica server and client libraries. Load balancing spreads the CPU and memory overhead caused by client connections across the hosts in the database. Native connection load balancing only has an effect when it is enabled by both the server and the client. To enable native connection load balancing on Vertica, follow the steps in Enabling and Disabling Native Connection Load Balancing in the Vertica documentation.

You must also enable the Load Balance attribute in Informatica so that PowerCenter works with Vertica native connection load balancing. In the Workflow Manager, depending on whether you are using bulk mode or relational mode, there are two ways to enable this setting.

To enable native connection load balancing for a bulk mode connector, take these steps:

  1. Select Connections > Relational.
  2. Under Select Type, select the name of your bulk mode connection from the drop-down list. In this example, the connection name Is Vertica Bulk Connection.

    bulk_mode.png

  3. Click Edit. In the Connection Object Definition dialog box, under attributes, check the box next to the Load Balance. Then click OK and Close.

    bulk_load_balance.PNG

 

To enable native connection load balancing for a relational mode connection, take these steps:

  1. Select Connections > Relational.
  2. Under Select Type, select the name of your relational mode connection from the drop-down list. In the following  example, the connection name is Vertica.

    relational_connections.PNG

  3. Click Edit. In the Connection Object Definition dialog box, check Load Balance.

    relational_load_balance.png

  4. Click OK and Close.

Relational Mode and Bulk Mode

There are two ways to write data to Vertica using PowerExchange for Vertica:

    • Bulk mode allows you to write large amounts of data to Vertica from multiple data sources. Bulk mode is only available for writing to Vertica.
    • Relational mode allows you to read data from an Vertica source and write data to a Vertica target. Relational mode supports pushdown optimization.

Using Bulk Mode to Load Data from Data Sources

The following steps describe how to create a bulk mode connection. In this example, you are loading data from files or other data sources.

  1. Using the Mapping Designer, create a mapping from the data source to your Vertica table:initial_mapping.png
  2. In Workflow Designer, create a workflow using the mapping you just created:workflow_first.png
  3. To set the properties for the target table, right-click the task for your mapping and select Edit. The Edit Tasks dialog box opens.
  4. On the Mapping tab, in the left-hand pane, select the target table:mapping_tab_target_table.png
  5. On the Writers drop-down menu and select Vertica Bulk Writer.writers.png
  6. Before executing the workflow, edit other attributes. Make sure to consider the attributes described in this document such as partitioning and load balancing. Modify them for improved performance loading the data into Vertica.
  7. To run the workflow, select Workflow > Start Workflow. Alternatively, right-click in the Workflow Designer window and select Start Workflow.

Using Relational Mode for Pushdown Optimization

PowerCenter implements pushdown optimization. This feature allows you to push the transformation operations to the database in order to take advantage of Vertica features.

There are three types of pushdown optimization:

  • Source Side—PowerCenter pushes certain transformations to the source database only. Use source-side pushdown optimization where you are downloading data into a file from a Vertica database.
  • Target Side—PowerCenter pushes certain transformations to the target database only. Use target-side pushdown optimization when you want to move data from a file or other database and load it into Vertica.
  • Full—PowerCenter pushes all tranformations to the target database when the source and target database are the same. This situation requires a relational mode connection. Use full pushdown optimization to do the following:
    1. Load data from Vertica
    2. Send the SQL command to Vertica to perform the transformation.
    3. Write the transformed data back to the database.

Before you continue, verify that you have set the locale in the DSN section of the ODBC.INI file to UTF-8, as in the last line of the following example. 

[VERTICA_DSN_NAME]
 Description=VerticaDSN ODBC driver
 Driver=/opt/vertica/lib64/libverticaodbc.so
 DriverUnicodeType=2
 Database=<DatabaseName>
 Servername=<IP ADDRESS OR HOSTNAME OF VERTICA>
 Port=5433
 Locale=UTF-8

The following steps describe how to use relational mode to enable full pushdown optimization. The source and target tables are both in Vertica.

  1. Using the Mapping Designer, create a mapping between the source and target tables, both of which are in the same Vertica database:map_source_target.png
  2. Using this mapping, in Workflow Designer, create a new workflow:workflow_source_target.png
  3. To set the attributes for the source table, right-click the task for your mapping and select Edit. The Edit Tasks dialog box opens.
  4. On the Mapping tab, in the left-hand pane, select the source table:readers.png
  5. In the Readers drop-down list, select Relational Reader. In a relational mode connection, always configure the source to use the relational reader to read data from the source:
  6. Under Connections, select your Vertica database relational Type:value_reader.PNG
  7. Now you need to configure the target table. In the Edit Tasks dialog box, in the left-hand pane, select the target table:

    mapping_tab_target_table.png

  8. In the Writers drop-down list, select Relational Writer. In a relational mode connection, always configure the target to as a relational writer:

    target_as_relational_writer.png

  9. Under Connections, select your Vertica database as the relational Type:value_writer.PNG
  10. In the Edit Tasks dialog box, in the left-hand pane, select Pushdown Optimization.

The Pushdown Optimization Viewer displays, and Pushdown Option is set to Full. At the bottom of the window, notice the SQL statement. When you run this workflow, Informatica sends the SQL statement to Vertica and Vertica executes that SQL statement to perform the transformations. That SQL statement performs the four tasks that comprise Group 1

Use PowerCenter Session-Level Partitioning to Improve Reads and Writes

When you configure partitioning in Informatica, PowerCenter executes multiple COPY statements, depending on how many partitions you have defined. The number of partitions you configure must be less than or equal to the number of nodes in your Vertica cluster.

There five types of partitioning:

      • Key range
      • Round robin
      • Pass-through
      • Hash user keys
      • Database partition

Note If you have a very large amount of data to copy, consider creating partitions for each memory core on each node. Make sure to do this at a time when the database is not in heavy use, because doing this prevents reads while the data is loading.

For example, if you have configured six partitions, PowerCenter creates six pipes and runs six COPY LOCAL statements. Each COPY statement reads from one pipe and writes the data to the Vertica target, as follows.

To configure key range partitioning for a session, define one or more ports for each partition key. Each port is based on a range of values. PowerCenter uses this information to distribute database rows to the relevant partition.

In the following example, five partitions have been configured, based on the value in the L_LINENUMBER column. If the value is between 1 and 2, the operation executes in Partition 1. If the value is between 2 and 3, the operation executes in Partition 2, and so on.

partitions.png

Tune the Memory Setting

When you run the workflow with the default settings, PowerCenter automatically calculates the amount of memory required to execute that workflow. The amount of memory is based on the data transformation that the mapping requires.

The default memory setting does not consider the amount of memory available on the host machine. To improve performance and make sure that you have enough memory, you can manually set the size of the default buffer block and DTM buffer size for a session, as follows:

  1. In the Workflow Manager, right-click the task and select Edit.
  2. On the Mapping tab, in the left-hand pane, select Memory Properties:memory_attributes.png
  3. Modify the DTM buffer size and Default buffer block size attributes as needed.

For more information about setting these memory parameters, see Setting Memory Properties for a Task in the Vertica documentation.

Tune the Commit Interval

The commit interval allows you to batch a transaction for execution in your Vertica database. By default, the commit interval is 10,000 rows. For this value, PowerCenter commits the data being written to the Vertica target every 10,000 rows. Higher commit intervals can improve the performance of the writes. For best performance, adjust the commit interval based on the amount of data you are writing to Vertica. If you are loading a large amount of data, we recommend that you set the commit interval to 100,000.

To change the commit interval, follow these steps:

  1. Right-click the task and select Edit.
  2. Select the Properties tab and edit the Commit Interval attribute:commit_interval.png 

Migrate Mappings and Workflows from Vertica Plug-in for Informatica to the PWX Connector

Informatica PowerCenter 9.6HF2 includes the new PowerExchange Connector for Vertica. If you have been using the Vertica Plug-In for Informatica, you can configure your mappings and workflows to take advantage of the new features in the PowerExchange Connector for Vertica.

Configure the Mappings with Mapping Designer

To configure a mapping to work with the PowerExchange Connector for Vertica, you must change the database types of both the source and the target:

  1. Open your mapping in the Mapping Designer. In this example, the source and target tables both display (VERTICA), indicating that the mapping uses the Vertica Plug-in for Informatica:mapping_plugin.png
  2. To work with the PowerExchange Connector for Vertica, you need to change the database type from Vertica to ODBC. Drag the source table into the Source Analyzer:old_source_table.png
  3. In the Source Analyzer, right-click the table and select Edit. The Database type is VERTICA.source_db_type.png
  4. To specify that this source use the PowerExchange Connector for Vertica, set the Database Type to ODBC. Click Apply and OK. In the Source Analyzer, the source table now displays (ODBC).

new_source_table.png

Now make the same changes for the target table.

  1. Drag the target table into the Target Designer.

    target_table.png

  2. Right-click the table and select Edit.
  3. Change the Database Type to ODBC. Click Apply and OK. In the Target Designer, the target table now displays (ODBC).

target_renamed.png

  1. Return to the Mapping Designer, where both tables display (ODBC):
  2. Save your mapping. If saving the mapping gives an error, delete the Source Qualifier, reload the source table, and remap it to the next transformation, in the same position as before.

Configure the Workflow with Workflow Designer

To migrate your workflow to the PowerExchange Connector for Vertica, follow these steps:

  1. Disconnect from the folder. To do this, right-click the folder name and select Disconnect.
  2. To reconnect to the folder, right-click the folder name and select Open.
  3. Drag your workflow into the Workflow Designer.

  4. A red circle with an exclamation point appears on the task. This icon indicates that the workflow is invalid.

    exclamation_point_screenshot.png

In addition, the logging window at the bottom of the dialog box contains a message that the session may have been invalidated:

logging_window.png

To validate the workflow:

  1. Delete the Source Qualifier, reload the source table, and remap it to the next transformation, in the same position as before.
  2. If the workflow is valid, right-click the task and select Edit.
  3. Open the Mapping tab of the Edit Tasks dialog box.
  4. In the left-hand pane, select your source. The reader is set to Relational Reader:relational_reader.png
  5. Under Connections, in the Value drop-down list, under Select Type, select Vertica, and then select the database object:db_object.png
  6. Click OK.
  7. Under Attributes, change the values as needed. In particular, verify that the Owner Name and Source Table Name reflect the correct database instance for your source.owner_table_name.PNG
  8. Now make similar changes for the target table. The Writers value is Relational Writer.relational_writer.png
  9. Under Connections, in the Value drop-down list, under Select Type, select Vertica, and then select the database object:db_object_writers.png
  10. Click OK.
  11. Under Attributes, change the values as needed. In particular, make sure the Table Name Prefix and Target Table Name are correct.name_prefix_target.PNG
  12. Click Apply and then click OK.
  13. Save the workflow. If errors display when you save the workflow, your workflow is invalid.
  14. If your workflow is invalid, delete the Source Qualifier, reload the source table, and remap it to the next transformation, in the same position as before.
  15. Before running the workflow, make any needed changes to the memory parameters and commit interval, as described earlier in this document.
  16. To run the workflow, in the Workflow Designer, right-click and select Start Workflow.

For More Information

For More Information About… …See

Connecting to Vertica using Informatica PWX Connector for Vertica

Vertica and Informatica Connection Guide

Informatica

http://www.informatica.com/

PowerCenter

https://www.informatica.com/products/data-integration/powercenter.html#fbid=4SOHvlwepTa

Vertica Community Edition

https://vertica.com/community/