I try to read data from one Hive (hive n°1) and write result into another Hive (hive n°2) ( they are from 2 different cluster ). I can't use a single spark session to connect to both Hive, so i will use jdbc to read data and the spark hive context to write data.
Both cluster use kerberos, knox, ranger.
ths batch spark will run on hive n°2
this is my main error :
GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
What i have already tried :
- I can connect to hive server using beeline commande, with zookeeper discovery mode and also in direct http.
beeline -u "jdbc:hive2://<hiveServer2Host>:<Port>/<db>;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=truststore.jks;trustStorePassword=<psw>;principal=<hive server 2 principal>;"
my code :
String url = "jdbc:hive2://<hiveServer2Host>:<Port>/<db>;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=truststore.jks;trustStorePassword=<psw>;principal=<hive server 2 principal>";
String table = "test";
Properties connectionProperties = new Properties();
Dataset<Row> testDS= spark.read().jdbc(url, table, connectionProperties);
testDS.show();
i try to add some manual keberos connection before JDBC :
org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
conf.set("fs.hdfs.impl", DistributedFileSystem.class.getName());
conf.set("hadoop.security.authentication", "kerberos");
conf.set("hadoop.rpc.protection", "privacy");
UserGroupInformation.setConfiguration(conf);
UserGroupInformation.loginUserFromKeytab(args[0], args[1]);
System.out.println("login with: " + UserGroupInformation.getLoginUser());
System.out.println("Current User:" + UserGroupInformation.getCurrentUser());
i am well identified ( in spark log ) :
login with: A@XXX.LOCAL (auth:KERBEROS)
Current User: A (auth:SIMPLE)
i also do a kinit before spark submit :
kinit -kt <A_keytabs> <A principal>
klist is fine
i try to add --keytab and -- principal in spark-submit but that change nothing.
only the connection to JDBC get problem, otherwhise i am well identified to acces HDFS ressources, kafka topic ...
i also try to connect to JDBC with zookeeper discovery in java mode but that dont work too.
java.sql.SQLException: Could not open client transport for any of the Server URI's in ZooKeeper: Unable to read HiveServer2 configs from ZooKeeper
i use HDP 2.6.4 , Java 8 , Spark 2.2.1
UPDATED : after Samson reply :
I add jass and KBR property to system :
System.setProperty("java.security.auth.login.config", spark_jaas.conf);
System.setProperty("sun.security.jgss.debug", "true");
System.setProperty("javax.security.auth.useSubjectCredsOnly", "false");
System.setProperty("java.security.krb5.realm", <realm from krb5.conf>);
System.setProperty("java.security.krb5.kdc", <kdc from krb5.conf>);
my app dont crash but I get this error.
GSSException: No valid credentials provided (Mechanism level: Attempt to obtain new INITIATE credentials failed! (null))
Search Subject for Kerberos V5 INIT cred (<>, sun.security.jgss.krb5.Krb5InitCredential)