|Share this article:|
Vertica Integration with SyncSort DMExpress: Connection Guide
For: Vertica 8.0
Vertica connection guides provide basic information about setting up connections to Vertica from software that our technology partners create. These documents provide guidance using one specific version of Vertica and one specific version of the third-party vendor’s software. Other versions of the third-party product may work with Vertica. However, other versions may not have been tested. This document provides guidance using Syncsort DMExpress 8.2.0 with Vertica 8.0 using the ODBC driver (bridge) on a Windows platform.
Syncsort DMExpress (DMX) is a data integration application tool that can extract large amount of data from any data source and load it into Vertica. DMX also allows you to transform that data while loading the data into a target data warehouse.
This document describes how to connect to an Vertica database—both as a source and as a target—from Syncsort DMExpress and transform the data.
The requirements for running DMExpress are:
- 64-bit Windows
- Windows XP/7, Windows Server 2003 or higher
- At least 15 GB of disk
- VMware Player or VMware Workstation, configured as described in the Welcome Kit (referenced below).
Before you download and install Syncsort DMExpress, review the DMExpress Installation Guide. To access this guide, log on to the Syncsort portal using your login credentials, or if needed, create an account. Follow the instructions for installing DMExpress.
For more information, see:
Before you can connect to Vertica using DMExpress, you must install the Vertica client package. This package includes the ODBC driver that DMExpress uses to connect to Vertica.
Install the client drivers:
- Go to the Vertica Client Drivers page.
- Download the version of the Vertica client package that is compatible with the architecture of your operating system and Vertica server version.
Note Vertica drivers are forward compatible, so you can connect to the Vertica server using previous versions of the client. For more information about client and server compatibility, see Client Driver and Server Version Compatibility in the Vertica documentation.
Based on the client package you downloaded, follow the steps for installation from the Vertica documentation.
A DMExpress job requires these components:
- DMExpress Job Editor
- DMExpress Task Editor
- DMExpress Server
Access these components from the Start > DMExpress menu.
Use the Job Editor to configure your data flow. In the Job Editor, you can add jobs and tasks to the canvas and create sequences between them. Doing so helps you develop, manage, and oversee the data flow from your sources to your targets.
Use the Task Editor to define your transformations. DMExpress allows you to perform the following set-level transformations on your data:
In the Task Editor, you can also specify how to perform transformations like the following on a field-level basis:
- Date/time functions
- String manipulations
- Mathematical operations
The DMExpress Server provides information about the server itself and about jobs that are scheduled, executing, or complete.
While working in DMX, at any time you can click the Status button to get information about your connection.
In this window, you can:
- View information about scheduled, executing, and completed jobs
- Monitor job execution status
- Modify DMExpress environment variables
The following steps describe how to use Vertica as a data source, transform the data, and run the job using Syncsort DMExpress.
Select Start > DMExpress > DMExpress Task Editor. In this example, you use the Task Editor to define the Vertica source, Vertica target, and transformation.
After the Task Editor launches, the default information for creating a new task should appear. If not, click New:
To define the source Vertica database, right-click the source in the list and select Add Database Table.
In the Source Database Table dialog box, click Add new in the upper-right corner.
In the Database Connection dialog box, enter the connection information about the Vertica source database. Choose Vertica from the DBMS drop-down list.
- After you have entered all the connection information, click Verify connection in the bottom-right corner.
- After you have verified the connection, click OK to exit the Database Connection dialog box.
The Source Database Table dialog box redisplays with information about the source database filled in. For this example, the connection is called DatabaseConnection1.
Select the source Vertica database table and columns that you want to transform and define the desired transformation.
DMExpress creates and saves the transformation and closes the Source Database Table dialog box.
The steps to add the target database information are almost identical to the steps for adding the source database information. For this example, the target database is also a Vertica database.
In the DMExpress Task Editor, right-click the target in the list and select Add Database Table.
- In the Target Database Table dialog box, click Add New in the upper-right corner.
- In the Database Connection dialog box, enter the information about the target Vertica database. Choose Vertica from the DBMS drop-down list.
- After you have entered all the data, click Verify connection in the bottom-right corner.
- After the connection has been verified, click OK to exit the Database Connection dialog box.
The Target Database Table dialog box redisplays with information about the source database filled in. For this example, the connection is called DatabaseConnection2.
Select the target Vertica database table and specify the desired column mappings between the source and the target.
DMExpress saves the column mapping and closes the Target Database Table dialog box.
Now you want to extract the data from the source, transform the data, and save it in the target.
Click the Run button to execute the job using the designated source and target Vertica databases.
When you run the job, DMExpress connects to the Vertica source, extracts the data, transforms it, and then connects to the Vertica target where it saves the data.
To check on the status of your job, click the Status button. On the Jobs tab, look under the Job Status field for the status of your job.
Possible values for the job status are:
- Completed successfully
- Completed with exceptions
- Ended in error
- Lost contact
Known limitations for data types using the ODBC driver with Syncsort DMExress are:
- DMExpress does not support INTERVAL SECOND and INTERVAL MONTH.
- For the following data types, the ODBC driver truncates the values or does not load those columns:
- LONG VARCHAR
To fix this issue, you must enable the Report Unicode columns as char setting:
Select Start > Control Panel > Administrative Tools > Data Sources (ODBC).
Click System DSN.
Select your DSN and click Configure.
Click the Client Settings tab.
Select Report Unicode columns as char.
This setting tells the ODBC driver to tell Syncsort DMExpress that Vertica uses the ODBC CHAR data type.
|For More Information About…||… See|
|Vertica Community Edition||https://my.vertica.com/community/|
|Big Data and Analytics Community||https://my.vertica.com/big-data-analytics-community-content/|