Share this article:

Vertica Integration with IBM InfoSphere DataStage: Connection Guide

About Vertica Connection Guides

Vertica connection guides provide basic information about setting up connections to Vertica from software that our technology partners develop. These documents provide guidance using one specific version of Vertica and one specific version of the third-party vendor’s software. Other versions of the third-party product may work with Vertica. However, other versions may not have been tested. This document provides guidance using the latest versions of Vertica and IBM InfoSphere DataStage as of August, 2017.

IBM InfoSphere DataStage Overview

IBM InfoSphere DataStage is an ETL tool that integrates and transforms source system data into data marts and data warehouses.

The IBM InfoSphere DataStage client has three components:

  • Administrator client allows you to manage tasks for DataStage projects.
  • Designer client is the tool you use to design your DataStage jobs.
  • Director client manages the jobs in the DataStage engine.

This document is based on the results of testing IBM InfoSphere DataStage 11.5 and Vertica client 7.2.1.

The operating systems used during the testing are:

  • DataStage: Window Server 2012 R2 (server and client) Standard
  • Vertica: CentOS 6.2 (server only)

Download and Install IBM InfoSphere DataStage

To download IBM InfoSphere Information Service 11.5, go to the IBM Support downloads page. Scroll down the page to the appropriate table and click the name of your operating system. Before starting the download, review the information provided.

To view the release notes for the tested version of IBM InfoSphere Information Server 11.5, go to the IBM Knowledge Center.

Download and Install the Vertica Client Drivers

Before you can connect to Vertica using IBM InfoSphere DataStage, you must install the Vertica client package. This package includes the Vertica client driver for establishing the JDBC and ODBC connections that IBM InfoSphere DataStage needs to connect to Vertica.

  1. Go to the Vertica Client Drivers page.
  2. Download the version of the Vertica client package that is compatible with the architecture of your operating system and Vertica server version.

    Note Vertica drivers are forward compatible, so you can connect to the Vertica server using previous versions of the client.

    For more information about client and server compatibility, see Client Driver and Server Version Compatibility in the Vertica documentation.

  3. Based on the client package you downloaded, follow the steps for installation from the Vertica documentation.

Connecting to Vertica from DataStage

Before you configure your connection from DataStage to Vertica, make sure you are connected to IBM WebSphere Services.

ODBC Configuration on Windows

To test the ODBC driver:

  1. Open Control Panel > Administrative Tools.
  2. Open the ODBC Data Source Administrator for your system.
  3. Make sure your data source is configured to use the Vertica driver as shown.

  4. Double-click the connection.
  5. In the window that opens, click Test Connection.

 

ODBC Configuration on Linux

You need three files to configure the ODBC driver on Linux:

  • dsenv: The DataStage environment file. The default location for this file is DSHOME. On Linux, DSHOME is /opt/IBM/InformationServer/Server/DSEngine for Linux.
  • odbc.ini: The ODBC driver manager file, contains details related to your ODBC connection. The default locations for this file are DSHOME and IBM/InformationServer/Server/branded_odbc. At the DSHOME location, the odbc.ini file is a hidden file. The odbc.ini file needs to be configured at both locations.
  • uvodbc.config: Vertica file that contains information about the DSN and related DBMS. The default location for this file is DSHOME. Configure uvodbc.config for individual projects at the location DSHOME/Projects, as follows:

    <ds_src>
    DBMSTYPE = ODBC
     
    <ds_tgt>
    DBMSTYPE = ODBC

JDBC Configuration (Windows and Linux)

To configure the JDBC driver on Linux, you need the DataStage isjdbc.config file. This file is required to establish the JDBC connection. It contains information about CLASSPATH and CLASS_NAMES:

CLASSPATH=/opt/vertica/vertica-jdbc-7.2.1-0.jar;
CLASS_NAMES=com.vertica.jdbc.Driver

Configure the Environment Variables for Your Project

Before you create a project in DataStage, set up your environment variables:

  1. Open the Administrator Client. To do so, on the Windows Apps page, double-click IBM InfoSphere DataStage and QualityStage Administrator.

  2. On the Attach to DataStage window, enter your credentials and click Login.
  3. On the DataStage Administration page, select the Projects tab.

  4. Select your project name and click Properties.

  5. In the middle of the Project Properties page, click Environment.

  6. The environment variables are organized under categories. Set the DataStage environment variables as follows:

    Environment VariableCategory Setting
    APT_PM_SHOW_PIDSParallel > ReportingTrue
    APT_PM_PLAYER_TIMINGParallel > ReportingTrue
    APT_DEFAULT_TRANSPORT_BLOCK_SIZEParallel10000000
    APT_USE_IPV4ParallelTrue
    APT_PM_SHOWRSHParallel > ReportingTrue
    APT_DUMP_SCOREParallel > ReportingTrue
    APT_STARTUP_STATUSParallel > Reporting True
    APT_PMCONDUCTOR_HOSTNAMEParallelMachine name
    APT_IO_MAXIMUM_OUTSTANDINGParallel2097152
    APT_PM_CONDUCTOR_TIMEOUTUser-Defined320
    APT_PM_PLAYER_TIMEOUTUser-Defined120

Note For a full list of DataStage environment variables, see Environment Variables in the IBM Knowledge Center.

Configure the Designer Client

  1. Open the Designer Client. To do so, on the Windows Apps page, double-click IBM InfoSphere DataStage and QualityStage Designer:

  2. On the Attach to Project page, enter your credentials and click Login.
  3. On the main Designer window, select Import > Table Definitions > ODBC Table Definitions.

Import the Data Definitions into DataStage

To define the source and target table definitions in DataStage, you can import the Meta Data for those objects.

  1. Select Import > Table Definitions > ODBC Table Definitions.

    The Import Meta Data window opens.

  2. On the Import Meta Data window, under DSN, check to make sure that the drop-down menu lists both your source and target databases.

  3. For both the source and target databases, do steps 4, 5, and 6.
  4. Select the database from the DSN drop-down menu.
  5. Click OK.

    DataStage verifies the ODBC configuration and lists the tables that you have defined on that database.

    Note In this example, the source and the target Meta Data are the same. so you only have to import tha Meta Data from the source.

    The following image lists the target database tables.

    If the DSN entries appear in the drop-down list, the configuration of the odbc.ini and uvodbc.config files are correct. Otherwise, correct them in your Linux environment as described in ODBC Configuration on Linux .

  6. Click Close.

Create an ODBC DataStage Job

The steps for creating ODBC or JDBC jobs are the same, except for certain property settings.The JDBC job properties are explained in JDBC Configuration (Windows and Linux).

When you open a client, the Palette panel is inactive. When you create or open a job, the Palette panel becomes active.

To create a job for use with an ODBC connector:

  1. On the left panel, right-click the Jobs folder, and select New > Parallel Job.

    Now the Palette panel is active. To the left of the Palette panel is the Job window. At top-left of the Job window, the word “Parallel" indicates that this is a parallel job.

  2. Select File > Save.

  3. Name the job and save it in the Jobs folder.
  4. On the Palette panel, click Database.

  5. In the Database drop-down list, select the name of the connector you're using. This example uses the ODBC connector.
  6. Click in the Job window. The ODBC connector icon appears. This will be the source database (ODBC_connector_3 in this example).

  7. Repeat step 5 to add the target database (ODBC_connector_4 in this example) to the Job window.

  8. On the Palette panel, select General > Link.
  9. Create a connection by dragging the link from the source to the target.

  10. To display the source database information, double-click the source icon.

  11. On the Properties tab, specify the following:

    • Data source name
    • Username
    • Password
    • Generate SQL: Yes
    • Table name
    • CodePage: Unicode
  12. Click the Columns tab.
  13. To load the database schema from the source, click Load in the lower-right corner. The Table Definition window appears.
  14. Expand the Table Definitions folder tree.

  15. To list the available columns in that table, double-click the table name. Verify the column data types. If DataStage does not support a specific data type, SQL type is listed as 110, 111, and so on. Use the arrow keys to select the columns you need.

  16. After you've selected the columns, click OK and Save to save the changes you've made.
  17. If your source and target databases have identical schemas, as in this example, you're done. If they are not identical, repeat steps 11 and 12 for the target database.

Compile and Run a Job

To compile and run the job you just created:

  1. Go to the main DataStage and QualityStage Designer window.
  2. To compile the job, click .
  3. After the job compiles, to run the job, click .

    While the job is running, the link in the Job window is blue. If the job completes successfully, the link is green.

    If the link is red, there are some errors in the job that you need to fix.

Create a JDBC Job

The steps to create a JDBC job are mostly the same as for an ODBC job. The differences are noticable when you configure the JDBC_connector in the Job window.

For a JDBC job, you must enter a URL instead of a DSN. In addition, specify the user name and password. The Attributes field is optional.

Troubleshooting

This section describes issues you might encounter when trying to connect to Vertica from DataStage.

Empty DSN list

Explanation: When connecting to ODBC, the DSN drop-down list on the Import Meta Data page does not show the connection value.

Solution: Check the uvodbc.config file in the DSHOME/Projects directory. Depending on your operating system, add the following entries to the uvodbc.config file.

For Windows:

DSHOME=C:\IBM\InformationServer\Server\DSEngine for Windows

For Linux:

DSHOME=/opt/IBM/InformationServer/Server/DSEngine for Linux

Configuration file JDBC cannot be accessed

Explanation: This error occurs when you try to access the source or target database using JDBC.

Solution: Provide read access to the dsadm user, or give read access to all users. The isjdbc.config file should be located in the DSEngine direcctory

String Data Too Big Error

The text of this error is as follows:

target,0: ODBC function "SQLPutData()" reported:  SQLSTATE = 22001: Native Error Code = 10,170: Msg = [Vertica][ODBC] (10170) String data right truncation on data from data source: String data is too big for the driver's data buffer. (CC_OdbcLobSupport::writeLobData, file CC_OdbcLobSupport.cpp, line 261)

Explanation: If you are loading string data or similar data types with high precision, you might see the preceding message.

Solution: Set the value of the environment variable APT_DEFAULT_TRANSPORT_BLOCK_SIZE to 10000000. Note that this solution does not work for huge string sizes like 50000 characters.

ODBC Connector Array Size Error

The text of this error is as follows:

ODBC_Connector_0,0: The array size must be set to 1 when reading LOBs (CC_OdbcDBRecordDataSetProducer::dbsAllocateBindBuffers, file CC_OdbcDBRecordDataSetProducer.cpp, line 587)

Explanation: If you are loading string data or similar data types with high precision, you might see the preceding message.

Solution: In DataStage, change the array size value to 1.

Pxbridge Error

The text of this error is as follows:

main_program: Syntax error: Error in "pxbridge" operator: Error in output redirection: Error in output parameters: Error in modify adapter: Error in binding: Error parsing field type: Parsing parameters "1024,0" for schema type "decimal": Precision must be > 0 and <= 255, got: "1024"

Explanation: You see this error if you trying to load decimal data or similar data types with a precision value greater than 255 but the data inside the column is less than 255 characters.

Solution: Change the precision value in the DDL to an integer between 1 and 255.

SQLGetPrivateProfileString Error

The text of this error is as follows:

[S1000][unixODBC][DSI] The error message NoSQLGetPrivateProfileString could not be found    in the en-US locale. Check that /en-US/ODBCMessages.xml exists.
[ISQL]ERROR: Could not SQLConnect.

Explanation: You might see this error in a Linux environment when you configure the ODBC driver. It is related to odbc.ini file configuration.

Solution: Usually ErrorMessagePath is set as: /opt/vertica/lib64. Add the en-US dir under the lib64 entry.

Data Type Support

The following list identifies known data type issues when connecting to Vertica from DataStage:

  • DataStage interprets the following Vertica data types as follows:

    Vertica Data Type SQL Type
    INTERVAL HOUR TO SECOND(0) 112
    INTERVAL HOUR TO SECOND 112
    INTERVAL HOUR TO MINUTE 111
    INTERVAL 110
    INTERVAL YEAR TO MONTH 107

    These data types are unique to Vertica. DataStage cannot parse these values.

  • Change in value for milliseconds when passing TIME data types while using JDBC connection.

    For the same data type in an ODBC connection, loss of milliseconds value occurs. For example, if the value is 14:30:45.666666, the JDBC connector reads and writes it as 14:30:45.384. The ODBC connector reads and writes it as 14:30:45.

  • TIMETZ and TIMESTAMPTZ values also change when passed from Vertica to DataStage.

For More Information

 

Share this article: