ExecutorBackend blocked at "UserGroupInformation.doAs"

Question

I have been structured an cluster of Spark tomorrow, and today I want to run an WordCount program on cluster my environment : jdk1.8.121 + scala2.10.4 + hadoop 2.6.5 + spark1.6.2 cluster : master + slave01 + slave02 client : client additional env : master, slave01, slave02, client are both in same LAN[master, slave01, slave02 could login with no secret each other], and the logged user are both root

demo code as follow :

  def main(args :Array[String]) = { 

    val inputPath = "hdfs://master/970655147/input/01WordCount/" 

// 1. localMode 
// val conf = new SparkConf().setMaster("local").setAppName("WordCount") 
// 2. standaloneMode 
    val conf = new SparkConf().setMaster("spark://master:7077").setAppName("WordCount") 
      .set("spark.executor.memory", "64M") 
      .set("spark.executor.cores", "1") 

    val sc = new SparkContext(conf) 
    val line = sc.textFile(inputPath) 

    line.foreach(println) 

    sc.stop 

  }

first, I use localMode, then program run normally
second, I run it on cluster[deploy in idea], but failed, it seems there are interactive blocked, according to the log, Master allocate Executor for app, and then Executor run, but it seems blocked at "UserGroupInformation.doAs(UserGroupInformation.java:1643); SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59)", so executor not register for driver, and then driver got none resources
next I tried use spark-submit[spark-submit --master spark://master:7077 --class com.hx.test.Test01WordCount HelloSpark.jar] on client, or slave02, but the result are the same please give me some advice, thx

some log info as follw :

ExecutorBackend's bootstrap cmd

root@slave02:~# jps 
7984 CoarseGrainedExecutorBackend 
6468 NodeManager 
8037 Jps 
955 Worker 
7981 CoarseGrainedExecutorBackend 
7982 CoarseGrainedExecutorBackend 
6366 DataNode 
7983 CoarseGrainedExecutorBackend 
root@slave02:~# ps -ef | grep 7983 
root 7983 955 14 06:21 ? 00:00:03 /usr/local/ProgramFiles/jdk1.8.0_121/bin/java -cp /usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/conf/:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/ProgramFiles/hadoop-2.6.5/etc/hadoop/ -Xms64M -Xmx64M -Dspark.driver.port=37230 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@192.168.0.191:37230 --executor-id 1 --hostname 192.168.0.182 --cores 1 --app-id app-20170408062155-0015 --worker-url spark://Worker@192.168.0.182:46466 
root 8050 4249 4 06:22 pts/1 00:00:00 grep --color=auto 7983 
root@slave02:~#

executor's error log

root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# cat work/app-20170408062155-0015/0/stderr 
17/04/08 06:22:20 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 
17/04/08 06:22:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
17/04/08 06:22:28 INFO spark.SecurityManager: Changing view acls to: root 
17/04/08 06:22:28 INFO spark.SecurityManager: Changing modify acls to: root 
17/04/08 06:22:28 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 
17/04/08 06:23:06 INFO spark.SecurityManager: Changing view acls to: root 
17/04/08 06:23:06 INFO spark.SecurityManager: Changing modify acls to: root 
17/04/08 06:23:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 
17/04/08 06:23:24 INFO slf4j.Slf4jLogger: Slf4jLogger started 
17/04/08 06:23:29 INFO Remoting: Starting remoting 
Exception in thread "main" 17/04/08 06:23:46 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 
17/04/08 06:23:47 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 
java.lang.reflect.UndeclaredThrowableException 
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643) 
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68) 
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:151) 
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:253) 
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) 
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds] 
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) 
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) 
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) 
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) 
at scala.concurrent.Await$.result(package.scala:107) 
at akka.remote.Remoting.start(Remoting.scala:179) 
at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184) 
at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:620) 
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:617) 
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:617) 
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:634) 
at akka.actor.ActorSystem$.apply(ActorSystem.scala:142) 
at akka.actor.ActorSystem$.apply(ActorSystem.scala:119) 
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121) 
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) 
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:52) 
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2024) 
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) 
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2015) 
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:55) 
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:266) 
at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:217) 
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:186) 
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69) 
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68) 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.auth.Subject.doAs(Subject.java:422) 
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) 
... 4 more 
root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# cat work/app-20170408062155-0015/0/stdout 
root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6#

driver's log

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
17/04/08 21:21:44 INFO SparkContext: Running Spark version 1.6.2 
17/04/08 21:21:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
17/04/08 21:21:45 INFO SecurityManager: Changing view acls to: root 
17/04/08 21:21:45 INFO SecurityManager: Changing modify acls to: root 
17/04/08 21:21:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 
17/04/08 21:21:46 INFO Utils: Successfully started service 'sparkDriver' on port 37230. 
17/04/08 21:21:47 INFO Slf4jLogger: Slf4jLogger started 
17/04/08 21:21:47 INFO Remoting: Starting remoting 
17/04/08 21:21:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.0.191:43974] 
17/04/08 21:21:48 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 43974. 
17/04/08 21:21:48 INFO SparkEnv: Registering MapOutputTracker 
17/04/08 21:21:48 INFO SparkEnv: Registering BlockManagerMaster 
17/04/08 21:21:48 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-ef79b656-b7f4-4cb3-be3e-0f8bb61baa9d 
17/04/08 21:21:48 INFO MemoryStore: MemoryStore started with capacity 431.3 MB 
17/04/08 21:21:48 INFO SparkEnv: Registering OutputCommitCoordinator 
17/04/08 21:21:54 INFO Utils: Successfully started service 'SparkUI' on port 4040. 
17/04/08 21:21:54 INFO SparkUI: Started SparkUI at http://192.168.0.191:4040 
17/04/08 21:21:54 INFO AppClient$ClientEndpoint: Connecting to master spark://master:7077... 
17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20170408062155-0015 
17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/0 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 
17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/0 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 
17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/1 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 
17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/1 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 
17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/2 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 
17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/2 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 
17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/3 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 
17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/3 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 
17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/4 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 
17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/4 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 
17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/5 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 
17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/5 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 
17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/6 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 
17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/6 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 
17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/7 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 
17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/7 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 
17/04/08 21:21:55 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42255. 
17/04/08 21:21:55 INFO NettyBlockTransferService: Server created on 42255 
17/04/08 21:21:56 INFO BlockManagerMaster: Trying to register BlockManager 
17/04/08 21:21:57 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.191:42255 with 431.3 MB RAM, BlockManagerId(driver, 192.168.0.191, 42255) 
17/04/08 21:21:57 INFO BlockManagerMaster: Registered BlockManager 
17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/0 is now RUNNING 
17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/1 is now RUNNING 
17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/2 is now RUNNING 
17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/3 is now RUNNING 
17/04/08 21:22:00 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/4 is now RUNNING 
17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/5 is now RUNNING 
17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/6 is now RUNNING 
17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/7 is now RUNNING 
17/04/08 21:22:03 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 
17/04/08 21:22:05 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 107.7 KB, free 107.7 KB) 
17/04/08 21:22:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 9.8 KB, free 117.5 KB) 
17/04/08 21:22:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.191:42255 (size: 9.8 KB, free: 431.2 MB) 
17/04/08 21:22:06 INFO SparkContext: Created broadcast 0 from textFile at Test01WordCount.scala:30 
17/04/08 21:22:21 INFO FileInputFormat: Total input paths to process : 1 
17/04/08 21:22:21 INFO SparkContext: Starting job: foreach at Test01WordCount.scala:33 
17/04/08 21:22:21 INFO DAGScheduler: Got job 0 (foreach at Test01WordCount.scala:33) with 2 output partitions 
17/04/08 21:22:21 INFO DAGScheduler: Final stage: ResultStage 0 (foreach at Test01WordCount.scala:33) 
17/04/08 21:22:21 INFO DAGScheduler: Parents of final stage: List() 
17/04/08 21:22:21 INFO DAGScheduler: Missing parents: List() 
17/04/08 21:22:21 INFO DAGScheduler: Submitting ResultStage 0 (hdfs://master/970655147/input/01WordCount/ MapPartitionsRDD[1] at textFile at Test01WordCount.scala:30), which has no missing parents 
17/04/08 21:22:21 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.0 KB, free 120.5 KB) 
17/04/08 21:22:21 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1842.0 B, free 122.3 KB) 
17/04/08 21:22:21 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.0.191:42255 (size: 1842.0 B, free: 431.2 MB) 
17/04/08 21:22:21 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 
17/04/08 21:22:21 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (hdfs://master/970655147/input/01WordCount/ MapPartitionsRDD[1] at textFile at Test01WordCount.scala:30) 
17/04/08 21:22:21 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 
17/04/08 21:22:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
17/04/08 21:23:04 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
17/04/08 21:23:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
17/04/08 21:23:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
17/04/08 21:23:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
17/04/08 21:23:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
17/04/08 21:24:02 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/1 is now EXITED (Command exited with code 1) 
17/04/08 21:24:02 INFO SparkDeploySchedulerBackend: Executor app-20170408062155-0015/1 removed: Command exited with code 1 
17/04/08 21:24:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
17/04/08 21:24:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
17/04/08 21:24:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
17/04/08 21:24:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
17/04/08 21:25:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
17/04/08 21:25:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
17/04/08 21:25:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
17/04/08 21:25:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
17/04/08 21:26:02 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(1,Command exited with code 1)] in 1 attempts 
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout 
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) 
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) 
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) 
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) 
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) 
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) 
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) 
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:370) 
at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.executorRemoved(SparkDeploySchedulerBackend.scala:144) 
at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:184) 
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) 
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) 
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) 
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745) 
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] 
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) 
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) 
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) 
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) 
at scala.concurrent.Await$.result(package.scala:107) 
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) 
... 12 more

some refered links as follw :

How can I run Spark job programmatically
Intermittent Timeout Exception using Spark
http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-td16989.html
https://issues.streamsets.com/browse/SDC-4249
https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Unable-to-create-SparkContext-to-Spark-1-3-Standalone-service-in/td-p/29176

these posts I've searched before, and the not fix this problem, there are some log in my cluster about tests. or maybe I consider wrong, please help me check that.

ufw status and ssh connectivity

root@master:/usr/local/ProgramFiles# ufw status
Status: inactive
root@master:/usr/local/ProgramFiles# ssh slave01
Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-62-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
Last login: Sat Apr  8 21:33:44 2017 from 192.168.0.119
root@slave01:~# ufw status
Status: inactive
root@slave01:~# ssh slave02
Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-62-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
Last login: Sat Apr  8 21:10:33 2017 from 192.168.0.119
root@slave02:~# ufw status
Status: inactive
root@slave02:~#

network connectivity by ip or FQDN

2.1. nc in master
root@master:/usr/local/ProgramFiles# netcat -l 12306
root@master:/usr/local/ProgramFiles# nc -l 12306
root@master:/usr/local/ProgramFiles# nc -l 12306
root@master:/usr/local/ProgramFiles# nc -l 12306

2.2. nc in slave01
root@slave01:~# nc -vz 192.168.0.180 12306
Connection to 192.168.0.180 12306 port [tcp/*] succeeded!
root@slave01:~# nc -vz master 12306
Connection to master 12306 port [tcp/*] succeeded!

2.3. nc in slave02
root@slave02:/usr/local/ProgramFiles# nc -vz 192.168.0.180 12306
Connection to 192.168.0.180 12306 port [tcp/*] succeeded!
root@slave02:/usr/local/ProgramFiles# nc -vz master 12306
Connection to master 12306 port [tcp/*] succeeded!
root@slave02:/usr/local/ProgramFiles#

reinstall other version of Spark :

I reinstall other version of Spark, but the same question is still here, and there may be some problem in environment ...

please give me some advice, thanks

`WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources` means your cluster don't have free resource to start you application. Open your cluster UI check whether any other application is running and using full resource of your cluster. — Kaushal, Apr 10 '17 at 14:18
@Kaushal thanks for your reply, in my cluster, have only one app[I'm just testing], this is the most apparent reason, deep more it caused by following steps, 1. master schedule allocate executor for driver, 2. worker launch executor for driver, 3. executor register for driver[at this phase, executor blocked at 'CoarseGrainedExecutorBackend$.main'], 4. driver schedule executor to execute driverProgram, and so on .. — Jerry.X.He, Apr 10 '17 at 15:44
Are you sure, your hdfs Url `"hdfs://master/970655147/input/01WordCount/"` is correct or you need to specify hdfs port? — Kaushal, Apr 10 '17 at 15:55
en, yes, I use the default port, run locally it could be detected in Spark program, and hadoop program could detected too — Jerry.X.He, Apr 11 '17 at 02:57
@Kaushal hi, thanks, this problem be solved muddleheaded, please see the comment below — Jerry.X.He, Apr 11 '17 at 09:28

score 0 · Answer 1 · answered Apr 11 '17 at 09:27

today, I wanna to change environment of java, scala, and then I find an post, it said he structured Spark by jdk1.7.0_80 & scala2.11.8 and then I download the jdk1.7.0_40 & scala2.11.8, then installed on my cluster[master, slave01, slave02]

um, update environment variable, hadoop's hadoop-env.sh, spark's spark.env-sh, and then shutdown spark, hadoop, startup hadoop, spark then I use "/bin/spark-shell --master spark://master:7077 --executor-memory 64M", run spark-shell

then spark-shell not normally running, but the log seems have different with before, and then I look for the Executor's log

I seed that

    Exception in thread "main" java.lang.IllegalArgumentException: System memory 64880640 must be at least 4.718592E8. Please use a larger heap size.
            at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:198)
            at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:180)
            at org.apache.spark.SparkEnv$.create(SparkEnv.scala:354)
            at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:217)
            at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:186)
            at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
            at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:415)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
            at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
            at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:151)
            at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:253)
            at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)

and then I got "CoarseGrainedExecutorBackend.scala:186", it seems not blocked at "UserGroupInformation.doAs(UserGroupInformation.java:1643)", and according to the error log information I changed "--executor-memory", speciify it at "512M"

and then retype command spark-shell at this time, I successfully got in, and then I try to run 'WordCount' on cluster

first update the ".set("spark.executor.memory", "64M") " then build jar, and put it into cluster, it running normally

got it, it seems this problem been fixed ??

then I want to find why this problem occured, in jdk or scala??

then I tests to update environment variable, spark's spark-env.sh, hadoop's hadoop-env.sh between 'jdk1.8.0_121 & scala2.10.4' and 'jdk1.7.0_40 & scala 2.11.8', but I found at this time, these two environment is ok for spark-shell, and 'WordCount'

even if I remove the 'jdk1.7.0_40 & scala 2.11.8', and all config revert to before[when I encounter this question], but it still ok

oh, my god, this is a mysterious problem ... even if I didn't find the original reason of this problem, but I still statisfied, at least I cluster is ok now

thank you, @Kaushal

ExecutorBackend blocked at "UserGroupInformation.doAs"

1 Answers1