Using Apache Kafka with Vertica

This topic assumes you have installed and configured Apache Kafka. For details on installing and using Kafka, please see the Apache Kafka site.

How Kafka and Vertica Work Together

Kafka is designed for a streaming use case (high volumes of data with low latency). In Vertica, you can achieve this streaming effect by running a series of COPY statements, each of which loads small amounts of data into your database. However, this process can become tedious and complex. Instead, you can use the Kafka integration feature to automatically load data to your database as it streams through Kafka. For more information, see How Vertica and Data Streaming Work Together in the Vertica documentation.

The Job Scheduler

The Vertica team created a built-in Kafka job scheduler for continuous loading of data from Kafka into Vertica. The job scheduler uses a UDL library to continuously consume data from Kafka with exactly-once semantics. For more information, see Data Streaming Job Scheduler.

Using Kafka with Vertica

To stream data through Kafka into Vertica, use the vkconfig script to complete the following tasks from your Vertica database and follow the steps described in Using Streaming Data with Vertica in the Vertica documentation.

Learn More

To learn how to do more with Kafka and Vertica, including Using COPY with Data Streaming, adding Streaming Utility Options, and viewing Data Streaming Schema Tables, see the Vertica documentation.