The User-Defined Load (UDL) feature lets you develop one or more functions that change how the COPY statement operates in Vertica. While the COPY statement offers many options and settings to control how you load data, these options may not suit all the types of data loads you want to perform. User-Defined Load functions are written using C++ or Java SDK.
UDx examples are now available on GitHib at https://github.com/vertica/UDx-Examples
How COPY Uses UDLs
You can specify that the COPY statement use one or more UDLs to load data. For more information, see Loads.
You implement a source by subclassing UDSource. The UDSource class acquires data from an external source and produces that data in an output stream for filtering and parsing.
For more specific methods, see the Vertica documentation.
You implement a filter or sequence of filters by subclassing UDFilter. The UDFilter class reads raw input data from a source and prepares it to load into Vertica or to be processed by a parser.
See filter function example on GitHub here: https://github.com/vertica/UDx-Examples/blob/master/Java-and-C%2B%2B/FilterFunctions/GZip.cpp
You implement a parser by subclassing UDParser. The UDParser class parses an input stream into tuples and rows for insertion into a Vertica table. You should use the UDParser class when you need to parse data that is in a format that the COPY statement’s native parser cannot handle.
See parser examples here: https://github.com/vertica/UDx-Examples/tree/master/Java-and-C%2B%2B/ParserFunctions
You can use factory classes with your user-defined sources, filters, and parsers to perform initial validation and query planning.
In C++ you can use the UDChunker class, which operates with a UDParser. The chunker divides the input into pieces (chunks) that can be individually parsed by the parser.
For more information, see the following: