Vertica QuickStart for Informatica PowerCenter

To read this document in PDF format, click here.

The Vertica QuickStart for Informatica PowerCenter is a sample ETL application powered by the Vertica Analytic Database. The QuickStart extracts data from Vertica system tables and loads it into a data warehouse called VHist (Vertica History).

Details about the ETL processing and the source and target data sets are provided in the companion document, Vertica VHist ETL Overview.

You can download the Vertica QuickStart for Informatica PowerCenter from the following location:

https://www.vertica.com/quickstart/vertica-quickstart-for-informatica/

About the Vertica QuickStarts

The Vertica QuickStarts are free, sample applications created using front-end products from Vertica technology partners. For an overview, watch this short video:

The QuickStarts are posted for download on the Vertica QuickStart Examples page.

Note The Vertica QuickStarts are freely available for demonstration and educational purposes. They are not governed by any license or support agreements and are not suitable for deployment in production environments.

VHist ETL Overview

VHist ETL occurs in two steps:

Data is extracted from system tables in the V_CATALOG and V_MONITOR schemas and loaded into a staging schema called VHIST_STAGE. Only minimal transformation occurs during this step.
Data is extracted from VHIST_STAGE, transformed, and loaded into the VHist star schema.

For details, see Vertica VHist ETL Overview.

Requirements

The Vertica QuickStart for Informatica PowerCenter requires a Vertica database server with the VHist schemas, a Vertica client, and Informatica PowerCenter 10 or later.

The QuickStart was created using Informatica PowerCenter 10 and Vertica 7.2.x.

Vertica also has a plug-in for Informatica PowerCenter. This plug-in makes integrating the two products easier. For more information, see Vertica Plug-In for Informatica in the Vertica documentation.

Install the Software

To install the software that is required for running the QuickStart, follow these steps:

Install Informatica PowerCenter
Install the Vertica Database Server
Install the Client Driver
Install the QuickStart Application

Install Informatica PowerCenter

Informatica is a data integration tool that promotes automation, reuse, and agility. To install a free version of Informatica PowerCenter, navigate to the Informatica Marketplace and click Download Now.

Install the Vertica Database Server

If you do not have Vertica, you can download the Community Edition free of charge:

Navigate to Vertica Community Edition.
Log in or click Register Now to create an account
Follow the on-screen instructions to download and install the Vertica Community Edition.

Install the Client Driver

Informatica PowerCenter uses an ODBC Data Source Name (DSN) to connect to Vertica. To create the DSN, download the Vertica client package and install the ODBC driver, then configure the DSN in the ODBC Driver Manager.

To download the Vertica client package:

Navigate to the Vertica Client Drivers page.
Download the version of the Vertica client package that is compatible with the architecture of your operating system and Vertica server version. The Vertica client package includes client components and all the drivers.

Note Starting in Vertica 8.1.1, the ODBC drivers are both forward and backwards compatible. For details, see Client Driver and Server Version Compatibility in the Vertica documentation.

To install the ODBC driver and create the DSN, follow the instructions in the Vertica documentation:

To install the ODBC driver, see Client Drivers.
To create the DSN, see Creating an ODBC Data Source Name (DSN).

You can choose to install the complete Vertica client package or just the ODBC driver.

Install the QuickStart Application

Navigate to vertica.com/quickstart.
Select Vertica QuickStart for Informatica PowerCenter.
Log in or create an account.
Click Download.
Save the compressed file, VHIST_ETL_INFORMATICA.zip, on your machine.
Extract the contents of the file to any local directory. You will see the following files:
- seq_Batch_Id.xml: A reusable sequence generator transformation.
- wf_vhist_ddl.xml: A workflow to drop and create the required database tables for implementing VHist. This file is run only once to create the required tables and is not a part of the run schedule.
- wf_stg_vhist.xml: A workflow to populate staging tables form system tables. This workflow is designed to trigger the next phase of VHist and is critical for the job scheduler.
- wf_star_vhist.xml: A workflow to populate dimension and fact tables.
- Param_stg_vhist.txt: A parameter file for the wf_stg_vhist workflow. You can edit this file if you want to customize connection names.
- Param_star_vhist.txt: A parameter file for the wf_star_vhist workflow. You can edit this file if you want to customize connection names.
- Param_ddl_vhist.txt: A parameter file for the wf_vhist_ddl workflow. You can edit this file if you want to customize connection names.

Create a Connection to Vertica

Open the Informatica PowerCenter Workflow Manager.
From the top menu, select Connections > Relational > New > Vertica > OK.
Enter the following parameters:
- Connection name: The default connection name is vhist_stage for both the source and target. You can change the connection name but must update it in all parameter files.
- Database user name and password
- Connection string: DSN value
- Code page: UTF-8 encoding of Unicode
Click OK.

Create the Data Warehouse

To create the data warehouse, you must import wf_vhist_ddl.xml into the Repository Manager Client and execute the workflow in the Workflow Manager.

Import the .xml File

Open the Repository Manager Client.
Connect to the repository where you want to import objects.
Select the repository.
Select Folder > Create.
Name the folder vhist_DDL and click OK.
Select the vhist_DDL folder you created.
Select Repository > Import Objects.
Click Browse and select the wf_vhist_ddl.xml file.
Click Next.
Click Add All.
Click Next. Verify that both the destination folder and the folder in the .xml file are the same.
Select Next > New Rule.
Under Select Resolution, select Replace.
Click Next.
Under Status, verify that Resolved is displayed. If yes, click Import.
Click Done.
Right-click the vhist_DDL folder and click Refresh to view imported objects and their dependent child objects.

After a successful import, you should see:

Execute the Workflow

Open the Workflow Manager and connect to the Repository where you imported the objects.
Right-click the vhist_DDL folder and click Open.
Double-click Workflows and select the wf_vhist_ddl.xml workflow.
Place the Param_ddl_vhist.txt parameter file into the local machine folder C:\Informatica\10.0.0\server\infa_shared\BWParam.
Under the Workflow tab, click Edit.
In the General tab, check the box for your integration service.
Click OK.
Click the Workflow tab and select Start Workflow.

In the workflow monitor, verify that the workflow successfully runs.

At the database level, you should see the following tables:

vhist_stage.batch_dim
vhist_stage.VHIST_STAGE_LOAD_LOG
vhist_stage.nodes
vhist_stage.users
vhist_stage.system_resource_usage
vhist_stage.user_sessions
vhist_stage.QUERY_PROFILES
vhist_stage.LOAD_STREAMS
vhist_stage.DISK_STORAGE
vhist_stage.RESOURCE_REJECTIONS
vhist_stage.PROJECTION_STORAGE
vhist_stage.PROJECTION_USAGE
vhist.batch_dim
vhist.VHIST_LOAD_LOG
vhist.time_dim
vhist.date_dim
vhist.nodes_dim
vhist.users_dim
vhist.sru_fact
vhist.user_sessions_fact
vhist.QUERY_PROFILES_FACT
vhist.LOAD_STREAMS_FACT
vhist.DISK_STORAGE_FACT
vhist.RESOURCE_REJECTIONS_FACT
vhist.PROJECTION_STORAGE_FACT
vhist.PROJECTION_USAGE_FACT

Import Data Warehouse Objects

Before you import the wf_stg_vhist.xml file, you must import the seq_Batch_Id.xml file. The seq_Batch_Id.xml file imports all stage-related mappings and workflows.

Import the seq_Batch_Id File

Open the Repository Manager Client.
Connect to the Repository where you want to import objects.
Click the Repository.
Select Folder > Create.
Name the folder infa_shared.
Under Options, check Allow Shortcut.
Click OK.
Select the infa_shared folder you created.
Select Repository > Import Objects.
Click Browse and select the seq_Batch_Id.xml file.
Click Next.
Click Add All.
Click Next. Verify that both the destination folder and the folder in the .xml file are the same.
Select Next > New Rule.
Under Select Resolution, select Replace from the drop-down menu.
Click Next.
Under Status, verify that Resolved is displayed. If yes, click Import.
Click Done.
Right-click the infa_shared folder and click Refresh to view the imported object and its dependent child objects.

After you successfully import the object, follow these steps to verify the properties of the object in Designer:

Log in to the environment where the object was imported.
Open the infa_shared folder.
Navigate to the transformation developer.
Open the seq_Batch_Id shared object by double-clicking it.
Click Properties.
Change the current value field to 1 and click Save.

Import the wf_stg_vhist File

Open the Repository Manager Client.
Connect to the Repository where you want to import objects.
Select the Repository.
Select Folder > Create.
Name the folder vhist_stage.
Click OK.
Selectthe vhist_stage folder you created.
Select Repository > Import Objects.
Click Browse and select the wf_stg_vhist.xml file.
Click Next.
Click Add All.
Click Next. Verify that both the destination folder and the folder in the .xml file are the same.
Select Next > New Rule.
Under Select Resolution, select Replace from the drop-down menu.
Click Next.
Under Status, verify that Resolved is displayed. If yes, click Import.
Click Done.
Right-click the vhist_stage folder and click Refresh to view the imported object and its dependent child objects.

Import the wf_star_vhist File

To configure the star stage, fact tables, and dimension table population, you must import the wf_stg_star.xml file. This imports all the star-related mappings and workflows. You can load data from the source V_CATALOG to target both the stage and star tables.

Open the Repository Manager Client.
Connect to the Repository where you want to import objects.
Click the Repository.
Select Folder > Create.
Name the folder vhist_star.
Click OK.
Selectthe vhist_star folder you created.
Select Repository > Import Objects.
Click Browse and select the wf_star_vhist.xml file.
Click Next.
Click Add All.
Click Next. Verify that both the destination folder and the folder in the .xml file are the same.
Select Next > New Rule.
Under Select Resolution, select Replace from the dropdown menu.
Click Next.
Under Status, verify that Resolved is displayed. If yes, click Import.
Click Done.
Right-click the infa_shared folder and click Refresh to view the imported object and its dependent child objects.

After a successful import, you should see the following:

Loading Data into the VHist Data Warehouse

The first time you run the jobs, you must populate the staging tables, fact tables, and dimension tables by following these steps.

Execute the Workflow

The first time you execute this workflow, it loads from source tables to stage tables and then load from stage tables to fact tables. Follow these steps to execute the workflow:

Open the Workflow Manager and connect to the Repository where you previously imported objects.
Right-click the vhist_stage folder and click Open.
Double-click Workflows and select the wf_stg_vhist workflow.
Place the Param_stg_vhist.txt parameter file into the local machine folder C:\Informatica\10.0.0\server\infa_shared\BWParam.
Under the Workflow tab, click Edit.
Under the General tab, check the box for your integration service.
Click OK.
Click the Workflow tab and select Start Workflow.
Upon successful completion, the next VHist job starts.

In the workflow monitor, you can verify if the workflow successfully runs.

Validate the ETL

The VHist ETL process records events in log tables that you can query to determine the success or failure of the data load.

To query the ETL log tables:

Connect to the target database using vsql or a client tool like DBVisualizer.

Run this query to validate the vhist_stage schema:

SELECT *
   FROM vhist_stage.vhist_stage_load_log
      WHERE batch_id =(SELECT max(batch_id)
         FROM vhist_stage.vhist_stage_load_log)

Run this query to validate the vhist schema:

SELECT *
   FROM vhist.vhist_load_log
      WHERE batch_id =(SELECT MAX(batch_id
      FROM vhst.vhist_load_log);

Schedule Incremental Loads

Once the data warehouse has been created and populated, you can perform incremental loads to keep the warehouse up to date and start accumulating historical data. To refresh the data warehouse, schedule incremental loads to run at intervals.

You must schedule the wf_stg_vhist workflow on an incremental load based on your needs.

Take care when scheduling incremental loads to avoid placing undue demands on system resources or causing the data warehouse to grow too large. The amount of data stored in Vertica system tables depends on many factors, and the individual tables are not flushed at the same rate. Keep the following in mind:

To avoid running incremental loads more often than is needed, try starting with daily loads then review the results in the log tables. If there are gaps in the results, decrease the interval between loads until you find an optimal balance.
Repeated incremental loads increase the size of the data warehouse over time. The growth amount varies depending on system activity and frequency of collection.

Note The data that you load into VHist counts towards the limit specified in your Vertica license.

Tip If you are using the Community Edition, your license allows up to one terabyte of free storage. If you already have a licensed installation of Vertica, you can build and maintain the VHist warehouse using the Community Edition in a separate cluster.

Vertica QuickStart for Informatica PowerCenter

About the Vertica QuickStarts

VHist ETL Overview

Requirements

Install the Software

Install Informatica PowerCenter

Install the Vertica Database Server

Install the Client Driver

Install the QuickStart Application

Create a Connection to Vertica

Create the Data Warehouse

Import the .xml File

Execute the Workflow

Import Data Warehouse Objects

Import the seq_Batch_Id File

Import the wf_stg_vhist File

Import the wf_star_vhist File

Loading Data into the VHist Data Warehouse

Execute the Workflow

Validate the ETL

Schedule Incremental Loads

Find More Informatio