Kerberos Authentication for Hadoop Integration: Keeping your data safe

Posted August 27, 2015 by Sarah Lemaire, Manager, Vertica Documentation

Like all Vertica releases, 7.1.2 includes a bunch of new features. One specific feature can help make your HDFS data safer: Kerberos integration for Hadoop. This blog will give you an overview of using Kerberos authentication with Vertica for SQL on Hadoop. For information about using Kerberos with Community Edition or Enterprise Edition, see the documentation.

Why Use Kerberos

Kerberos authentication helps keep your data secure. With Kerberos authentication, you can restrict access to individual users, ensuring only appropriate users view the respective files. In some industries, like finance and medicine, such restrictions are essential. You may already be using this type of authentication on your Hadoop Distributed File System (HDFS).

Until now you were limited in how you could use Vertica on a Kerberos-protected HDFS cluster. You could read HDFS data using the HDFS Connector, but if you wanted the performance benefits of having the data in Vertica, you could only do so in an unsecured way. This meant that if you had data in Vertica that you wanted to store in HDFS, you could not use Kerberos authentication.

With Vertica 7.1.2 this is no longer the case. You can create a secure storage location in HDFS – the only type of storage location permitted for SQL on Hadoop – or you can read Kerberos-protected data in place using the HDFS or HCatalog connectors or by reading native ORC files. Because all access methods into Hadoop data support Kerberos (Hcat Connector, HDFS Connector, and the new ORCReader), Vertica fits into your security model.

This table gives you a quick overview of what access methods support Kerberos:

kerberos1

This new feature will help you protect and secure sensitive data when using Vertica for SQL on Hadoop.

How Does This Work?

So how does this Vertica use Kerberos with Hadoop? Vertica authenticates with Hadoop in two ways that require different configurations:

  • User Authentication—On behalf of the user, by passing along the user’s existing Kerberos credentials.
  • Vertica Authentication—On behalf of system processes, by using a special Kerberos credential.

User Authentication

To use Vertica with Kerberos and Hadoop,

  1. The client user first authenticates using whatever method the Hadoop administrator provides for clients to authenticate with Kerberos. This method might be by logging in to Active Directory, for example.
  2. A user who authenticates to a Kerberos server receives a Kerberos ticket.
  3. At the beginning of a client session, Vertica automatically retrieves this ticket.
  4. The database then uses this ticket to get a Hadoop token, which Hadoop uses to grant access.
  5. Vertica uses this token to access HDFS, such as when executing a query on behalf of the user.
  6. When the token expires, the database automatically renews it, also renewing the Kerberos ticket if necessary.

The following figure shows how the user, Vertica, Hadoop, and Kerberos interact in user authentication:

kerberos2

When using the HCatalog Connector, Vertica uses the client identity as the preceding figure shows. If you are using the HCatalog connector to read data from HDFS, you must verify that all users who need access to Hive have been granted access to HDFS.

Vertica Authentication

Automatic processes, such as the Tuple Mover, do not log in the way users do. Instead, Vertica uses special identities (principals) stored in encrypted keytab files on Vertica nodes. Each keytab file will contain the principle only for that node. Initial setup is more involved than creating one keytab file containing all principles, but the method described here makes it easier to maintain nodes and add nodes later.

After you configure the keytab files, Vertica uses the principal residing there to automatically obtain and maintain a Kerberos ticket, much as in the client scenario. In this case, the client does not interact with Kerberos.

Configure: Setting Up Users and the Keytab File

If you have not already configured Kerberos authentication for Vertica, follow the instructions in Implementing Security. Specifically:

Use Case

Clearly this feature can benefit any type of organization. Here is just one example:

Say you run a health care business and use HDFS storage locations to store confidential patient data. Before 7.1.2, you couldn’’t use Vertica for SQL on Hadoop without putting patient information at risk because the read process was not secure. Now, since Vertica can use the same authentication method as HDFS, you can safely read the files.

The possibilities for using Vertica for SQL on Hadoop are endless. And with the new security feature, you won’’t have to worry about compromising all that data. How will you use it?