Flattened Tables

Posted December 4, 2017 by Soniya Shah, Information Developer

This blog post was authored by Soniya Shah.

Before release 8.1., Vertica users could denormalize their data by combining all fact and dimension table columns in a single ‘fat’ table. These tables facilitated faster query execution. However, this approach required users to maintain redundant sets of normalized and denormalized data, which incurred its own overhead.

Vertica 8.1 addressed this issue by introducing flattened tables. Flattened tables include columns that get their values by querying other tables. Operations on the source tables and flattened table are decoupled: changes in one are not automatically propagated to the other. Instead, columns are populated with queried data only when the column is created, or when you explicitly refresh it. This approach minimizes the overhead that is otherwise typical of conventional denormalized tables.

Example

Flattened tables are typically fact tables where one or more columns query other tables for their values, using DEFAULT or SET USING constraints.

The following example uses two normalized tables:

• Dimension table custDim: => SELECT * FROM custDim; cid | name | age | gender -----+-------+-----+-------- 1 | Alice | 25 | F 2 | Bob | 30 | M 3 | Eve | 28 | F (3 rows) • Fact table orderFact: => SELECT * FROM orderFact ORDER BY order_id; order_id | cid | amount ----------+-----+--------- 100 | 1 | 15.00 200 | 1 | 1000.00 300 | 2 | -50.00 400 | 3 | 100.00 500 | 2 | 200.00 (5 rows) You can create a flattened table that combines data from both tables, as follows:

1. Convert the orderFact table to a flattened table. Add a column with DEFAULT constraint that queries customer names from the custDim table: => ALTER TABLE orderFact ADD COLUMN cust_name VARCHAR(20) DEFAULT (SELECT name FROM custDim WHERE orderFact.cid = custDim.cid); ALTER TABLE => SELECT * FROM orderFact ORDER BY order_id; => SELECT * FROM orderFact ORDER BY order_id; order_id | cid | amount | cust_name ----------+-----+---------+----------- 100 | 1 | 15.00 | Alice 200 | 1 | 1000.00 | Alice 300 | 2 | -50.00 | Bob 400 | 3 | 100.00 | Eve 500 | 2 | 200.00 | Bob (5 rows) 2. Next, insert a new row of data into orderFact: => INSERT INTO orderFact VALUES (600, 3, 10.00, DEFAULT); OUTPUT -------- 1 (1 row) => SELECT * FROM orderFact ORDER BY order_id; order_id | cid | amount | cust_name ----------+-----+---------+----------- 100 | 1 | 15.00 | Alice 200 | 1 | 1000.00 | Alice 300 | 2 | -50.00 | Bob 400 | 3 | 100.00 | Eve 500 | 2 | 200.00 | Bob 600 | 3 | 10.00 | Eve (6 rows) The value Eve is inserted automatically into the cust_name column.

3. After you insert values, update custDim with a name change: => UPDATE custDim SET name ='Charlene' WHERE cid = 3; OUTPUT -------- 1 (1 row) => COMMIT; COMMIT => SELECT * from custDim; cid | name | age | gender -----+---------+-----+-------- 1 | Alice | 25 | F 2 | Bob | 30 | M 3 | Charlene | 28 | F (3 rows) 4. Propagate the name change to the orderFact table: => UPDATE orderFact SET cust_name = DEFAULT WHERE cid = 3; OUTPUT -------- 2 => SELECT * FROM orderFact ORDER BY order_id; order_id | cid | amount | cust_name ----------+-----+---------+----------- 100 | 1 | 15.00 | Alice 200 | 1 | 1000.00 | Alice 300 | 2 | -50.00 | Bob 400 | 3 | 100.00 | Charlene 500 | 2 | 200.00 | Bob 600 | 3 | 10.00 | Charlene (6 rows) 5. Now add another column to orderFact that queries orderDim with a SET USING constraint: => ALTER TABLE orderFact ADD COLUMN cust_gender VARCHAR(1) SET USING (SELECT gender FROM custDim WHERE orderFact.cid = custDim.cid); ALTER TABLE Initially, the new column is empty: => SELECT * FROM orderFact ORDER BY cid; order_id | cid | amount | cust_name | cust_gender ----------+-----+---------+-----------+------------- 100 | 1 | 15.00 | Alice | 200 | 1 | 1000.00 | Alice | 300 | 2 | -50.00 | Bob | 500 | 2 | 200.00 | Bob | 400 | 3 | 100.00 | Charlene | 600 | 3 | 10.00 | Charlene | (6 rows) 6. Populate the SET USING column by calling the Vertica function REFRESH_COLUMNS: => SELECT REFRESH_COLUMNS ('orderFact', 'cust_gender'); REFRESH_COLUMNS --------------------------- refresh_columns completed (1 row) => SELECT * FROM orderFact ORDER BY order_id; order_id | cid | amount | cust_name | cust_gender ----------+-----+---------+-----------+------------- 100 | 1 | 15.00 | Alice | F 200 | 1 | 1000.00 | Alice | F 300 | 2 | -50.00 | Bob | M 400 | 3 | 100.00 | Charlene | F 500 | 2 | 200.00 | Bob | M 600 | 3 | 10.00 | Charlene | F (6 rows)

DEFAULT versus SET USING

The following table outlines the differences between DEFAULT and SET USING constraints:

DEFAULT SET USING
Initialize from source table -During load (COPY, INSERT)
-ALTER COLUMN…ADD COLUMN DEFAULT
-REFRESH_COLUMNS
-ALTER COLUMN…ALTER COLUMN SET USING
Refresh from modified source table -UPDATE flattened table -REFRESH_COLUMNS


Note: A column can have both DEFAULT and SET USING constraints.

For More Information

See Flattened Tables in the Vertica documentation.