Can't connect to Hive server with spark JDBC in kerberised cluster

Question

I try to read data from one Hive (hive n°1) and write result into another Hive (hive n°2) ( they are from 2 different cluster ). I can't use a single spark session to connect to both Hive, so i will use jdbc to read data and the spark hive context to write data.

Both cluster use kerberos, knox, ranger.

ths batch spark will run on hive n°2

this is my main error :

GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)

What i have already tried :

I can connect to hive server using beeline commande, with zookeeper discovery mode and also in direct http.

beeline -u "jdbc:hive2://<hiveServer2Host>:<Port>/<db>;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=truststore.jks;trustStorePassword=<psw>;principal=<hive server 2 principal>;"

my code :

String url = "jdbc:hive2://<hiveServer2Host>:<Port>/<db>;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=truststore.jks;trustStorePassword=<psw>;principal=<hive server 2 principal>";
String table = "test";
Properties connectionProperties = new Properties();
Dataset<Row> testDS= spark.read().jdbc(url, table, connectionProperties);
testDS.show();

i try to add some manual keberos connection before JDBC :

org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
conf.set("fs.hdfs.impl", DistributedFileSystem.class.getName());
conf.set("hadoop.security.authentication", "kerberos");
conf.set("hadoop.rpc.protection", "privacy");
UserGroupInformation.setConfiguration(conf);
UserGroupInformation.loginUserFromKeytab(args[0], args[1]);
System.out.println("login with: " + UserGroupInformation.getLoginUser());
System.out.println("Current User:" + UserGroupInformation.getCurrentUser());

i am well identified ( in spark log ) :

login with: A@XXX.LOCAL (auth:KERBEROS)

Current User: A (auth:SIMPLE)

i also do a kinit before spark submit :

kinit -kt <A_keytabs> <A principal>

klist is fine

i try to add --keytab and -- principal in spark-submit but that change nothing.

only the connection to JDBC get problem, otherwhise i am well identified to acces HDFS ressources, kafka topic ...

i also try to connect to JDBC with zookeeper discovery in java mode but that dont work too.

java.sql.SQLException: Could not open client transport for any of the Server URI's in ZooKeeper: Unable to read HiveServer2 configs from ZooKeeper

i use HDP 2.6.4 , Java 8 , Spark 2.2.1

UPDATED : after Samson reply :

I add jass and KBR property to system :

System.setProperty("java.security.auth.login.config", spark_jaas.conf);
System.setProperty("sun.security.jgss.debug", "true");
System.setProperty("javax.security.auth.useSubjectCredsOnly", "false");
System.setProperty("java.security.krb5.realm", <realm from krb5.conf>);
System.setProperty("java.security.krb5.kdc", <kdc from krb5.conf>);

my app dont crash but I get this error.

GSSException: No valid credentials provided (Mechanism level: Attempt to obtain new INITIATE credentials failed! (null))

Search Subject for Kerberos V5 INIT cred (<>, sun.security.jgss.krb5.Krb5InitCredential)

Never tried with Spark, but with plain Java, what works is (1) connect to target Metastore / HDFS with standard Hadoop conf files ; (2) set the Java system property that defines the JAAS conf file to use for further Kerberos auth _(don't use ticket cache, use this principal and this keytab)_ ; (3) launch JDBC connection to source. The trick is to propagate the JAAS conf file and system prop to the executors - and avoid dynamic allocation else the order would probably be random — Samson Scharfrichter, May 22 '19 at 15:36
What's the content of "spark_jaas.conf"? Cf. https://stackoverflow.com/a/42506620/5162372 — Samson Scharfrichter, May 27 '19 at 17:00
com.sun.security.jgss.initiate { com.sun.security.auth.module.Krb5LoginModule required useTicketCache=false doNotPrompt=true useKeyTab=true keyTab="" principal="" debug=false; }; I was able to connect from cluster 2 to cluster 2 , but not in cluster 2 to cluster 1 — maxime G, Jun 18 '19 at 15:08
Side note : props `java.security.krb5.*` are useless (and counter-productive) if you have a proper "/etc/krb5.conf" — Samson Scharfrichter, Jun 18 '19 at 18:48
Side note: I think `com.sun.security.jgss.initiate` default entry name is deprecated in favor of `com.sun.security.jgss.krb5.initiate` — Samson Scharfrichter, Jun 18 '19 at 19:00
Cf. https://stackoverflow.com/a/42506620/5162372 for extra details - and the flags to enable debugging traces about the way JAAS interprets (or silently ignores) its conf — Samson Scharfrichter, Jun 18 '19 at 19:03
BTW if you already have a ticket in the Kerberos cache, for Spark, you can use it in JAAS and don't need to create another (process-private and volatile) ticket from keytab. Unless you need to use a different identity, of course. — Samson Scharfrichter, Jun 18 '19 at 19:06

score 0 · Answer 1 · answered May 27 '19 at 10:01

0

Try using this. I am assuming you are passing principal and the keytab in spark submit

val principal: String = sparkSession.sparkContext.getConf.get("spark.yarn.principal")
val keytab: String = sparkSession.sparkContext.getConf.get("spark.yarn.keytab")
UserGroupInformation.loginUserFromKeytab(principal, keytab);

Let me know if it works.

answered May 27 '19 at 10:01

Pinaki

93
5

That change nothing because as i said, i am already well identified – maxime G Jun 17 '19 at 13:17

Can't connect to Hive server with spark JDBC in kerberised cluster

1 Answers1