Some boring stuff first, to put things into context, then the solution.
- Kerberos: it's complicated by nature (think cryptography network), even without considering that Microsoft has its own implementation and extensions
- Java and Kerberos: it's even more complicated (only partial support, subtle changes in Java versions, etc.)
- Hadoop and Java and Kerberos: it's complicated and ugly (read the GitBook "Hadoop and Kerberos, the Madness beyond the Gate" if you really want to lose your sanity) and it's even worse on Windows cf. lack of an official build for the required Hadoop "native libs"
- Hive and JDBC and Kerberos: the good news is that you don't need the Hadoop "ugly" part unless you are using the Apache JDBC driver on Windows (hint: ditch it and opt for the Cloudera JDBC driver!); the bad news is that you may need raw JAAS configuration and specific Java system properties
- R and Java/JDBC: it works quite well, except that sometimes you want to pass specific Java system properties to the JVM -- either at launch time or at run time -- but
.jinit does not support that AFAIK, you must resort to a workaround
There is one
Java system property that must be set for Kerberos auth to work in JDBC, and it's not always set by default.
But you can't set that Java property from R directly; you have to set an
environment variable (either before starting R, or from R code but before .jinit)
Option 1: from a Linux shell script, before starting R...
export JAVA_TOOL_OPTIONS="-Djavax.security.auth.useSubjectCredsOnly=false"
Option 2: from your R code...
Sys.setenv(JAVA_TOOL_OPTIONS="-Djavax.security.auth.useSubjectCredsOnly=false")
.jinit(...)
Now, that may not be sufficient in all cases. Maybe you need to use a specific Kerberos config because your Hadoop cluster uses its own KDC. Maybe you don't want to use the default Kerberos ticket, but instead authenticate as a service account, using a password stored in a keytab file.
And maybe you need some debugging information because, well, shit happens
(and security libraries are quite secretive by default, not to make things too easy for hackers, I suppose...)
Please refer to that post for more information about advanced Java configuration for Hive/Impala JDBC with Kerberos.
And be careful when setting the environment variable: simulate a Java command-line i.e. -Dsome.key=value -Dsome.other.key=blahblah; in shell script, use quotes (because of the separating space); in R code, use a single string, no array.