myVertica  

Saving an Apache Spark DataFrame to a Vertica Table

Before you save an Apache Spark DataFrame to a Vertica table, make sure that you have the following setup:

• Vertica cluster
• Spark cluster
• HDFS cluster. The Vertica Spark connector uses HDFS as an intermediate storage before it writes the DataFrame to Vertica.

This checklist identifies potential problems you might encounter when using the Vertica Spark connector.

Problem Solution
You have a bad Vertica and Hadoop configuration. Verify that you have configured Vertica correctly to talk to HDFS. To configure Vertica Nodes for HDFS access, follow the Vertica and Hadoop configuration instructions found in Configuring the hdfs Scheme.
You are using a connector that is not compatible with the Spark and Scala version combination in your environment. If you see one of the following errors, your Vertica Spark connector is not compatible with the Spark and Scala version combination in your environment:
• java.lang.ClassNotFoundException
• java.lang.AbstractMethodError

Verify that you are using the right connector for your specific Spark and Scala combination. As of Vertica 8.1.1, there are five connectors that support the following environments:
• Apache Spark 1.6/Scala 2.10
• Apache Spark 2.0/Scala 2.10
• Apache Spark 2.0/Scala 2.11
• Apache Spark 2.1/Scala 2.10
• Apache Spark 2.1/Scala 2.11

These connectors are available at https://my.vertica.com.
When loading Vertica data into Spark, your Spark script fails with a java.lang.IllegalArgumentException error. Vertica can store numeric values with a higher precision than the column definition. When you create a DataFrame for a table that has NUMERIC columns, every NUMERIC column in the DataFrame is assigned the maximum precision supported in Spark.

If your script tries to load data into the DataFrame column that exceeds the Spark maximum numeric precision, the script fails with the following error: java.lang.IllegalArgumentException: requirement failed: Decimal precision 41 exceeds max precision 38 There is no workaround for this. For more information, see Loading Vertica Data into a Spark DataFrame or RDD in the Vertica documentation.

Learn More

For complete details about integrating Vertica with Spark, see Integrating with Spark in the Vertica documentation.