0

I try to read data from one Hive (hive n°1) and write result into another Hive (hive n°2) ( they are from 2 different cluster ). I can't use a single spark session to connect to both Hive, so i will use jdbc to read data and the spark hive context to write data.

Both cluster use kerberos, knox, ranger.

ths batch spark will run on hive n°2

this is my main error :

GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)

What i have already tried :

  • I can connect to hive server using beeline commande, with zookeeper discovery mode and also in direct http.
beeline -u "jdbc:hive2://<hiveServer2Host>:<Port>/<db>;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=truststore.jks;trustStorePassword=<psw>;principal=<hive server 2 principal>;"

my code :

String url = "jdbc:hive2://<hiveServer2Host>:<Port>/<db>;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=truststore.jks;trustStorePassword=<psw>;principal=<hive server 2 principal>";
String table = "test";
Properties connectionProperties = new Properties();
Dataset<Row> testDS= spark.read().jdbc(url, table, connectionProperties);
testDS.show();

i try to add some manual keberos connection before JDBC :

org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
conf.set("fs.hdfs.impl", DistributedFileSystem.class.getName());
conf.set("hadoop.security.authentication", "kerberos");
conf.set("hadoop.rpc.protection", "privacy");
UserGroupInformation.setConfiguration(conf);
UserGroupInformation.loginUserFromKeytab(args[0], args[1]);
System.out.println("login with: " + UserGroupInformation.getLoginUser());
System.out.println("Current User:" + UserGroupInformation.getCurrentUser());

i am well identified ( in spark log ) :

login with: A@XXX.LOCAL (auth:KERBEROS)

Current User: A (auth:SIMPLE)

i also do a kinit before spark submit :

kinit -kt <A_keytabs> <A principal> 

klist is fine

i try to add --keytab and -- principal in spark-submit but that change nothing.

only the connection to JDBC get problem, otherwhise i am well identified to acces HDFS ressources, kafka topic ...

i also try to connect to JDBC with zookeeper discovery in java mode but that dont work too.

java.sql.SQLException: Could not open client transport for any of the Server URI's in ZooKeeper: Unable to read HiveServer2 configs from ZooKeeper

i use HDP 2.6.4 , Java 8 , Spark 2.2.1

UPDATED : after Samson reply :

I add jass and KBR property to system :

System.setProperty("java.security.auth.login.config", spark_jaas.conf);
System.setProperty("sun.security.jgss.debug", "true");
System.setProperty("javax.security.auth.useSubjectCredsOnly", "false");
System.setProperty("java.security.krb5.realm", <realm from krb5.conf>);
System.setProperty("java.security.krb5.kdc", <kdc from krb5.conf>);

my app dont crash but I get this error.

GSSException: No valid credentials provided (Mechanism level: Attempt to obtain new INITIATE credentials failed! (null))

Search Subject for Kerberos V5 INIT cred (<>, sun.security.jgss.krb5.Krb5InitCredential)

Community
  • 1
  • 1
maxime G
  • 1,660
  • 1
  • 10
  • 27
  • Never tried with Spark, but with plain Java, what works is (1) connect to target Metastore / HDFS with standard Hadoop conf files ; (2) set the Java system property that defines the JAAS conf file to use for further Kerberos auth _(don't use ticket cache, use this principal and this keytab)_ ; (3) launch JDBC connection to source. The trick is to propagate the JAAS conf file and system prop to the executors - and avoid dynamic allocation else the order would probably be random – Samson Scharfrichter May 22 '19 at 15:36
  • updated my post – maxime G May 27 '19 at 08:09
  • What's the content of "spark_jaas.conf"? Cf. https://stackoverflow.com/a/42506620/5162372 – Samson Scharfrichter May 27 '19 at 17:00
  • com.sun.security.jgss.initiate { com.sun.security.auth.module.Krb5LoginModule required useTicketCache=false doNotPrompt=true useKeyTab=true keyTab="" principal="" debug=false; }; I was able to connect from cluster 2 to cluster 2 , but not in cluster 2 to cluster 1 – maxime G Jun 18 '19 at 15:08
  • Side note : props `java.security.krb5.*` are useless (and counter-productive) if you have a proper "/etc/krb5.conf" – Samson Scharfrichter Jun 18 '19 at 18:48
  • Side note: I think `com.sun.security.jgss.initiate` default entry name is deprecated in favor of `com.sun.security.jgss.krb5.initiate` – Samson Scharfrichter Jun 18 '19 at 19:00
  • Cf. https://stackoverflow.com/a/42506620/5162372 for extra details - and the flags to enable debugging traces about the way JAAS interprets (or silently ignores) its conf – Samson Scharfrichter Jun 18 '19 at 19:03
  • BTW if you already have a ticket in the Kerberos cache, for Spark, you can use it in JAAS and don't need to create another (process-private and volatile) ticket from keytab. Unless you need to use a different identity, of course. – Samson Scharfrichter Jun 18 '19 at 19:06

1 Answers1

0

Try using this. I am assuming you are passing principal and the keytab in spark submit

val principal: String = sparkSession.sparkContext.getConf.get("spark.yarn.principal")
val keytab: String = sparkSession.sparkContext.getConf.get("spark.yarn.keytab")
UserGroupInformation.loginUserFromKeytab(principal, keytab);

Let me know if it works.

Pinaki
  • 93
  • 5