Saving an Apache Spark DataFrame to a Vertica Table

Product
- Product Overview
  
  Product Overview
  
  Vertica delivers unified analytics and machine learning at unprecedented speed, scale, and value.
  
  Learn More
- Product
  - Vertica Accelerator,
    Vertica-as-a-Service
    - Vertica SaaS offering
    - Built on and delivers all the functionality of the Vertica Unified Analytics Platform
    - Automated administration and runs in your own AWS account
- second column
  - Vertica Unified Analytics Platform,
    Customer-Managed Software
    - Bring Your Own License (BYOL) analytics software
    - Runs on-premises, hybrid, multi clouds, and containerized
    - Advanced analytics, in database ML, and data lake query engine
- Product Resource
  
  Vertica Announces Vertica 12 for Future-Proof Analytics
  
  Latest version of analytics database enables more deployment flexibility, advanced analytics, and enhanced machine learning
Industries
- Solutions Overview
  
  Featured Use Case:
  Customer Behavior Analytics
  
  Customer centricity is a mission critical initiative across industries. Unify customer data, deliver personalized, omni-channel experiences, and grow and retain your customer base.
  
  Learn More
- Industries
- Industries
- Industries Resource
  
  Harness the Internet of Things (IoT)
  
  IoT data is expected to grow exponentially across industries. Learn how to leverage sensor data at massive scale for business and customer value.
  
  Read On
Support & Services
- Support Resource
  
  Support & Services
  
  Access subscription-based pricing: New customers eligible for a 50% discount.
  
  Act now
- Support Links
- Documentation
- Downloads
Partners
- Partners Overview
  
  Partners
  
  Tight integration with and support from leading technology and solution providers.
  
  Learn More
- Partners
- col 2
  - 3rd Party Technology Partner Integration
  - Quickstarts
- Partners Resource
  
  Vertica Inside – Embedded Analytics at Scale
  
  Seize the huge growth opportunity for OEM software developers
Resources
- Resource Library
  
  Resources
  
  Explore our Thought Leadership library, including the most recent articles, webcasts and reports, with expert insights.
  
  Browse Resources
- Resource Library
  - Blog
  - Case Studies
  - Demos
  - eBooks
  - Infographics
  - Videos
- Webcasts
- What is…analytics and database technology topics
About
- About Vertica
  
  About Vertica
  
  Built for Fast. Built for Freedom.
  
  Learn More
- col 2
  - Careers
  - Contact us
- About Vertica
  - News & Recognition
  - Events
- About Resource
  
  Stay Informed
  
  Sign-up to receive our monthly newsletter.
  
  Subscribe
  
  Latest newsletter
Try Vertica
My Account

Before you save an Apache Spark DataFrame to a Vertica table, make sure that you have the following setup:

• Vertica cluster
• Spark cluster
• HDFS cluster. The Vertica Spark connector uses HDFS as an intermediate storage before it writes the DataFrame to Vertica.

This checklist identifies potential problems you might encounter when using the Vertica Spark connector.

Problem	Solution
You have a bad Vertica and Hadoop configuration.	Verify that you have configured Vertica correctly to talk to HDFS. To configure Vertica Nodes for HDFS access, follow the Vertica and Hadoop configuration instructions found in Configuring the hdfs Scheme.
You are using a connector that is not compatible with the Spark and Scala version combination in your environment.	If you see one of the following errors, your Vertica Spark connector is not compatible with the Spark and Scala version combination in your environment: • java.lang.ClassNotFoundException • java.lang.AbstractMethodError Verify that you are using the right connector for your specific Spark and Scala combination. As of Vertica 8.1.1, there are five connectors that support the following environments: • Apache Spark 1.6/Scala 2.10 • Apache Spark 2.0/Scala 2.10 • Apache Spark 2.0/Scala 2.11 • Apache Spark 2.1/Scala 2.10 • Apache Spark 2.1/Scala 2.11 These connectors are available at https://my.vertica.com.
When loading Vertica data into Spark, your Spark script fails with a java.lang.IllegalArgumentException error.	Vertica can store numeric values with a higher precision than the column definition. When you create a DataFrame for a table that has NUMERIC columns, every NUMERIC column in the DataFrame is assigned the maximum precision supported in Spark. If your script tries to load data into the DataFrame column that exceeds the Spark maximum numeric precision, the script fails with the following error: `java.lang.IllegalArgumentException: requirement failed: Decimal precision 41 exceeds max precision 38` There is no workaround for this. For more information, see Loading Vertica Data into a Spark DataFrame or RDD in the Vertica documentation.

Learn More

For complete details about integrating Vertica with Spark, see Integrating with Spark in the Vertica documentation.

Product Overview

Vertica Announces Vertica 12 for Future-Proof Analytics

Harness the Internet of Things (IoT)

Support & Services

Partners

Vertica Inside – Embedded Analytics at Scale

Resources

About Vertica

Stay Informed

Saving an Apache Spark DataFrame to a Vertica Table

Learn More