Analyze Mismatched Series with Event Series Joins

Posted June 20, 2016 by Sarah Lemaire, Manager, Vertica Documentation

Event series occur in tables with a time column, most typically a TIMESTAMP data type. In Vertica, you perform an event series join to analyze two series in different tables when their measurement intervals don’t align, such as with mismatched timestamps.

With the event series join functionality (similar to that of an outer join), you can compare values from the two series directly, rather than having to first normalize the series to the same measurement interval.

This blog assumes you are familiar with joins in Vertica. If you are not familiar with joins, or want to learn more, see our Working with Joins topic in the Get Started with Vertica series.

Example schema

For this blog we’ll use the following two tables to represent our series of events. In this case the events include a series of bid values at certain times. You might recognize the XYZ_bid table below as the TickStore table in our Time Series Analytics blog.

Table XYZ_bid

time symbol bid
2016-04-30 10:00:00 XYZ 0
2016-04-30 10:01:00 XYZ 32
2016-04-30 10:04:00 XYZ 38
2016-04-30 10:06:00 XYZ 36
2016-04-30 10:10:00 XYZ 33

Table UVW_bid

time symbol bid
2016-04-30 10:00:00 UVW 50
2016-04-30 10:03:00 UVW 51
2016-04-30 10:05:00 UVW 51
2016-04-30 10:07:00 UVW 52
2016-04-30 10:08:00 UVW 49

As you can see, both tables have periods of inactivity where no trades occur (i.e., missing events). Missing events can cause problems if you want to compare two stocks whose timestamps don?’t match. Such is the case with our example tables. In fact, the only two events that do align occur at 10:00:00. Let? see how to successfully join and compare these two event tables using event series joins.

Performing event series joins

Event series joins are an extension of outer joins, but instead of padding the non-preserved side with NULL values when there is no match, the event series join pads the non-preserved side with values that it interpolates from the previous value.

Here’?s what our example data set looks like when we perform a full outer join:

In regular full outer joins, Vertica fills in the missing data with NULL values. For example, performing a full outer join on our example tables results in the following output:

=> SELECT * FROM XYZ_bid x FULL OUTER JOIN UVW_bid u ON (x.time=u.time);
event1

 

But how do we compare XYZ?’s stock with UVW’?s stock at time 10:03:00?

If you haven?’t caught on, the answer is: with an event series join.

INTERPOLATE predicate

To change the regular join to an event series join, simply add an ON clause with the INTERPOLATE predicate. The interpolate predicate is used to join two event series together using some ordered attribute. The predicate’?s general syntax is as follows:

expression1 INTERPOLATE PREVIOUS VALUE expression2

Because event series joins require both tables to be sorted on columns in the equality predicates (in our example, the time column), you may find improved query performance if your data is already sorted prior to issuing the event series join.

The following statement tells Vertica to perform a full outer join on the time columns from tables XYZ_bid and UVW_bid, and to use the previous values in the respective table to interpolate data that would otherwise be NULL.

=> SELECT * FROM XYZ_bid x FULL OUTER JOIN UVW_bid u ON (x.time INTERPOLATE PREVIOUS VALUE u.time);

The green rectangles in the output below show which values Vertica interpolated:

 

event2

 

Other event series joins options

As with regular joins, event series joins have inner, right outer, and left outer join modes.

For example, here?’s a regular inner join with its event series join counterpart:

=> SELECT * FROM XYZ_bid x INNER JOIN UVW_bid u ON x.time 
            INTERPOLATE PREVIOUS VALUE u.time;
          time         | symbol | bid |        time         | symbol | bid 
  ---------------------+--------+-----+---------------------+--------+-----  
   2016-04-30 10:00:00 | XYZ    |   0 | 2016-04-30 10:00:00 | UVW    |  50 
(1 row)   

=> SELECT * FROM XYZ_bid x INNER JOIN UVW_bid u ON x.time=u.time; 
        time         | symbol | bid |        time         | symbol | bid 
---------------------+--------+-----+---------------------+--------+-----  
 2016-04-30 10:00:00 | XYZ    |   0 | 2016-04-30 10:00:00 | UVW    |  50 
(1 row)

Do you know why the above two queries produce the same results? If you do, you?’ve got a good grasp on what the INTERPOLATE PREVIOUS VALUE does, or in this case, what it doesn?’t do. With inner joins, there is nothing to interpolate!

Learn more

Vertica provides many more opportunities for you to employ joins in for your business needs. Check out our other resources to learn more:

Blogs: