Share this article:

Denodo Connection Guide and Use Cases

Vertica connection guides provide basic information about setting up connections to Vertica from software that our technology partners create. These documents provide guidance using one specific version of Vertica and one specific version of the third party vendor’s software. Other versions of the third-party product may work with Vertica. However, we may not have tested these other versions.

Denodo and Vertica Product Versions

The Denodo Connection Guide and Use Cases provides guidance using the software listed as follows:.

Software Version

Denodo

Denodo 6.0

Client Platform

Windows Server 2008 R2 Standard

Vertica Client

Vertica JDBC 8.0

Vertica Server

Vertica Analytic Database 8.0

Denodo Overview

Denodo is a software platform that lets you integrate data from different sources in real time. Supported data sources include relational databases, web services, XML documents, spreadsheets, and flat files.

The Denodo Platform includes the following modules:

  • Virtual DataPort—Executes the data federation, provides a JDBC and ODBC interface to run queries, and publishes views as SOAP and REST web services. The Virtual DataPort is the core of the Denodo Data Virtualization Platform.
  • ITPilot module—Extracts user-specified information from a website..
  • Aracne —Web crawler that indexes user-specified text in a website.
  • Scheduler—Schedules tasks in the Virtual DataPort, ITPilot, and Aracne modules.

The following diagram illustrates the basic architecture of the Denodo Platform:

Note The Denodo Platform supports an optional cache that you can configure to store query results for faster retrieval. You can configure Vertica to run as the cache, or as a data source, or both. For details, see Using Vertica as the Denodo Cache.

For details about Denodo, visit the Denodo website.

Install Denodo

To install Denodo:

  1. Follow the installation steps in the Denodo Platform 6.0 Installation Guide.

    Important Be sure to follow all the steps for configuring the Virtual DataPort module.

  2. When Denodo installation is complete, install the latest update.

Install the Vertica Client Driver

Prerequisites

Before you install the Vertica client driver, verify that your client system meets the following requirements.

Requirement Description
Operating system

For the list of supported platforms, see Vertica Client Drivers in the Vertica documentation.

UTF-8, UTF-16 and UTF-32 support

Configure the driver to use the required encoding method. This encoding allows strings to be passed between the driver and the application without intermediate conversion.

Install the JDBC Driver on Windows

Denodo uses the Vertica JDBC driver to connect to Vertica. To install the client driver, follow the instructions in Installing the JDBC Client Driver for Windows in the Vertica documentation.

Note Vertica drivers are forward compatible, so you can connect to the Vertica server using previous versions of the client. For more information about client and server compatibility, see Client Driver and Server Version Compatibility in the Vertica documentation.

Connect Vertica to Denodo

To connect your Vertica database to Denodo:

  1. Create a JDBC data source for Vertica.
  2. Create base views that represent your Vertica tables.

Create a JDBC Data Source

To create a JDBC data source, follow these steps:

  1. Go to the directory DENODO_HOME\bin.
  2. Start Denodo by double-clicking the Denodo executable, denodo_platform.exe.
  3. In the Denodo Platform Control Center, click Virtual DataPort.
  4. Click the Launch button for the Virtual DataPort Administration tool.

  5. Log in to the Denodo Platform using the default credentials:
    • User name: admin
    • Password: admin
  6. Select File > New > Data source >JDBC.
  7. In the Create New JDBC Data Source dialog box, select the Connection tab.
  8. Type the connection details for your Vertica database.

    pp

  9. Click OK.
    If the connection information is correct, a list of the schemas in your Vertica database displays.

Create a Base View

After you create the JDBC data source, follow these steps to create a base view of the data you want to query:

  1. Click the plus sign (+) to expand the schema that contains the tables.
  2. Select the tables.
  3. Select Create selected base views.
    After several seconds, the list of views displays.

Query a Base View

To query a base view, follow these steps:

  1. Right-click the base view name and click Execute to display the Execute tab.
  2. If you want to add conditions to the query, edit the SELECT statement.
  3. Click Execute to execute the query.

Accessing Views from a JDBC Client

For information about working with views in Denodo, see the Denodo Virtual DataPort Developer Guide.

Default Join Processing in Denodo

By default, Denodo executes SELECT statements in each data source and then performs joins and other connecting operations in the Denodo server without caching.

The Denodo server pushes down as much processing as it can to the data sources, but it still has to retrieve the data from them and perform the join. In addition, it does not have the processing speed or capacity of Vertica.

Use Cases for Vertica with Denodo

The process of joining columns from tables in different databases is the phase of data virtualization that places the most demand on system resources. Use cases that offload join processing to Vertica can significantly improve query performance. In the following topics, we discuss two approaches:

Using Vertica With Denodo Statistics

In Denodo, you can gather statistics to enable query optimization. Statistics include the average size of a field, maximum and minimum values, and the number of distinct values. When statistics are gathered, Denodo calculates the relative cost of different execution plans and chooses the plan that is the least costly.

The overall goal of optimization is to push processing down to the data source. Optimization reduces data movement across the network and leverages the power of the underlying data sources.

Use Case

When joining a small data set with a large data set, Denodo creates a temporary table in the source that contains the large data set and populates it with the contents of the small data set from the other source.

When Vertica is one of the data sources, it typically contains the largest amount of data. For example, a large fact table in Vertica may be joined with dimensions from a different data source. Denodo statistics in this scenario cause the dimensions to be loaded into temporary tables in Vertica, where the join is processed and the combined data set returned to the Denodo server.

Example

Try the following example to learn about Denodo statistics with Vertica.

First, execute a query without statistics:

  1. Create views in Denodo for the call_center_dimension and online_sales_fact tables in Vertica.
  2. Create a date_dimension view for the date dimension table in a different JDBC data source.
  3. Query the views as follows:

    SELECT a.category_description, co.calendar_year, SUM(c.sales_quantity)
    FROM online_sales_fact
    c LEFT OUTER JOIN product_dimension a
    ON c.product_key =a.product_key and c.product_version=a.product_version
    LEFT OUTER JOIN date_dimension co
    ON co.date_key=c.sale_date_key
    GROUP BY a.category_description,co.calendar_year
    ORDER BY a.category_description,co.calendar_year;

    Without optimization, Denodo executes two separate queries and then joins the results in the Denodo engine.

    The Vertica log shows the query as follows:

    SELECTt1.category_description AS category_description, t0.sales_quantity AS sales_quantity
    t0.sale_date_key AS sale_date_key
    FROM Vmart.online_sales.online_sales_fact t0
    LEFT JOIN Vmart.public.product_dimension t1
    ON ((t0.product_key = t1.product_key and t0.product_vendor=t1.product_vendor) )
    ORDER BY sale_date_key DESC

    The Denodo execution plan log looks like this:

    For each one of the views, date_dimension, product_dimension and online_sales_fact, do the following:

    1. Open the view:
    2. Click the Options button.
    3. Click the Statistics tab.
    4. Select Complete missing statistics executing SELECT queries.
    5. Click Gather statistics for selected fields.

  4. Open one of the JDBC data sources from the list and go to the Read & Write tab. Check can be data movement target and click Save.

  5. Repeat Step 4 for the other JDBC data source.

  6. On the Tools menu, select VQL Shell.

    Here you can execute the query that you previously executed without optimizations. This time, Denodo optimizes the query because you have gathered statistics.

    7. Query the date_dimension view and insert the results in a temporary table in the Vertica data source.

    The Vertica log shows the statement that creates the temporary table:

    CREATE TABLE T_ (date_key DECIMAL(38, 0), field_0 TIMESTAMP, full_date_description VARCHAR(18), 
       day_of_week VARCHAR(9),day_number_in_calendar_month DECIMAL(38, 0), 
       day_number_in_calendar_year DECIMAL(38, 0), 
       day_number_in_fiscal_month DECIMAL(38, 0), 
       day_number_in_fiscal_year DECIMAL(38, 0), 
       last_day_in_week_indicator DECIMAL(38, 0),
       last_day_in_month_indicator DECIMAL(38, 0), 
       calendar_week_number_in_year DECIMAL(38, 0), 
       calendar_month_name VARCHAR(9), 
       calendar_month_number_in_year DECIMAL(38, 0), 
       calendar_year_month VARCHAR(7), calendar_quarter DECIMAL(38, 0), 
       calendar_year_quarter VARCHAR(7), calendar_half_year DECIMAL(38, 0), 
       calendar_year DECIMAL(38, 0), holiday_indicator VARCHAR(10), weekday_indicator VARCHAR(7),
      selling_season VARCHAR(32) )
    

    Now that the data of the three views is in the same database, the two left joins can be pushed down to Vertica.

    The Vertica log shows the query of the temporary table followed by the DROP TABLE command:

    SELECTs0. category_description  AS category_description, 
      s0.calendar_year AS calendar_year, sum( s0.sales_quantity) AS sum_1
    FROM (SELECT t1.category_description AS category_description, t2.calendar_year AS calendar_year, 
      t0.sales_quantity AS sales_quantity 
      FROM (Vmart.online_sales.online_sales_fact 
      t0 LEFT JOIN Vmart."public".product_dimension t1 
      ON ((t0.product_key = t1.product_key AND t0.product_version = t1.product_version) ))  
      LEFT JOIN t2 ON ((t2.date_key = t0.sale_date_key) )) s0 
    GROUP BY s0.category_description, s0.calendar_year
    ORDER BY category_description ASC, calendar_year ASC
    Drop table

Using Vertica as the Denodo Cache

The Denodo cache contains views of the source data. When you select from cached views, Denodo creates copies of the source data in the cache and pushes the join processing into the cache database.

When Vertica is configured as the cache, the processing-intensive work of data virtualization is pushed into Vertica, where it can be performed quickly. This use case is generally more performant than the use case with statistics, because Denodo does not create temporary tables every time it runs a query.

Example

Let's say you want to join the date dimension in Oracle with the online_sales_fact table in Vertica while using Vertica as the cache.

Assuming you already have a view in Denodo called date_dimension that maps to the date_dimension table in Oracle, follow these steps:

  1. Create the base view of Vertica in the vdpcachedatasource data source of the Denodo admin database.
  2. In vdpcachedatasource, create a view of online_sales_fact in Vertica.

  3. Enable caching for the date_dimension view.

  4. Execute a join of online_sales_fact and date_dimension.

Denodo pushes the join processing down to Vertica.

The query appears in the Vertica log as follows:

SELECT
   s0.calendar_year AS calendar_year,
   sum( s0.sales_quantity) AS sum_1
FROM
   (SELECT
       s0.calendar_year AS calendar_year,
	t0.sales_quantity AS sales_quantity
    FROM
	online_sales.online_sales_fact t0
    LEFT JOIN
	   (SELECT
	      t1.date_key AS date_key,
             t1.calendar_year AS calendar_year
           FROM
             C_DATE_DIMENSION t1
	    WHERE
    	      t1.rowStatus = 'V'
           AND
	      (t1.expirationDate = 0 OR
             t1.expirationDate >
	      1490949990861)) s0
	    ON
	      ((s0.date_key = t0.sale_date_key) 
	      )) s0
	  GROUP BY
	     s0.calendar_year
	  ORDER BY
	     calendar_year ASC	

The Denodo log looks like this:

Use Case: Vertica as Cache with Update

When you are using Vertica as the cache, you must explicitly load the cache in full mode. To ensure data consistency, you must set cache_invalidate to matching_rows. When these settings are used together, Vertica performs updates during the cache load.

  • full mode—Caching in full mode means that the data in the view is always retrieved from the cache engine instead of from the source. This mode always requires explicit cache loads.
  • cache_invalidate setting—When set to matching_rows, Denodo invalidates any row that matches the WHERE condition of the query before loading new data. If the query does not have a WHERE clause, it invalidates all the rows.
    An alternative is cache_invalidate=all_rows. This setting invalidates all the rows regardless of the WHERE clause of the query.
  • Denodo refreshes the cache by invalidating any row for which new data exists before loading the new data. Thus each row of data in the cache is an exact duplicate of the corresponding row in the data source.

For this example, you would load the date dimension from Oracle into the Vertica cache as shown here:

When this query loads the cache, Denodo stores the rows with status Processing and later changes the status to Valid. This indicates that Vertica has executed updates during the load. The updates have a negative impact on performance.

The Vertica log shows the updates:

UPDATE C_DATE_DIMENSION 
   SET rowStatus = 'I3' , expirationDate = '1490261596508'::Integer 
WHERE rowStatus = 'V''
				
UPDATE C_DATE_DIMENSION
  SET expirationDate = '1490265196102'::Integer ,rowStatus = 'V' 
WHERE rowStatus = 'P2'

Use Case: Vertica as Cache with No Update

Because of the impact of Vertica update operations on cache load performance, Hewlett Packard Enterprise recommends setting the cache_atomic_operation setting to false. This setting prevents Denodo from saving rows with status Processing before saving them as Valid and thus prevents updates from occuing.

Therefore, the most performant query for loading the cache would look like this:

SELECT * FROM date_dimension 
	CONTEXT ( 
	'cache_preload'='true' 
	, 'cache_invalidate'='matching_rows' 
	, 'cache_wait_for_load'='true' 
	, 'cache_atomic_operation'='false'
	, 'cache_return_query_results'='false'); 							

With the parameter cache_return_query_results=false, the query does not return the result of the query to the client; it just stores the result in the cache database. This makes the query much faster and the memory footprint is also much lower.

The Vertica log would look like this:

INSERT INTO C_CHAR1_TABLE 
   (datatypeset, valuedesc, char1_column, expirationDate, rowStatus) 
   VALUES ( ? , ? , ? , ? , ?)'

However, the performance gain would be offset by possible inconsistency in the data. There is no way to invalidate existing records and ensure data consistency in the cache while also benefiting from the high performance capabilities of Vertica.

Use Case: Vertica as Cache for Other Data Sources

You can boost query performance by using Vertica as the cache even if you do not have any data stored in Vertica. For example, say your data is stored in MySQL, SQL Server, and Oracle and your queries are running too slow. You could speed things up by deploying Vertica as the cache.

Known Limitations

Denodo does not recognize schema organization in the source. Therefore if you have two tables with the same name in different schemas i Vertica and you want to include them both in a Denodo view, you must rename one of the views in Denodo.

For More Information

For more information, see:

Share this article: