0

I have been structured an cluster of Spark tomorrow, and today I want to run an WordCount program on cluster my environment : jdk1.8.121 + scala2.10.4 + hadoop 2.6.5 + spark1.6.2 cluster : master + slave01 + slave02 client : client additional env : master, slave01, slave02, client are both in same LAN[master, slave01, slave02 could login with no secret each other], and the logged user are both root

demo code as follow :

  def main(args :Array[String]) = { 

    val inputPath = "hdfs://master/970655147/input/01WordCount/" 

// 1. localMode 
// val conf = new SparkConf().setMaster("local").setAppName("WordCount") 
// 2. standaloneMode 
    val conf = new SparkConf().setMaster("spark://master:7077").setAppName("WordCount") 
      .set("spark.executor.memory", "64M") 
      .set("spark.executor.cores", "1") 

    val sc = new SparkContext(conf) 
    val line = sc.textFile(inputPath) 

    line.foreach(println) 

    sc.stop 

  } 
  1. first, I use localMode, then program run normally
  2. second, I run it on cluster[deploy in idea], but failed, it seems there are interactive blocked, according to the log, Master allocate Executor for app, and then Executor run, but it seems blocked at "UserGroupInformation.doAs(UserGroupInformation.java:1643); SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59)", so executor not register for driver, and then driver got none resources
  3. next I tried use spark-submit[spark-submit --master spark://master:7077 --class com.hx.test.Test01WordCount HelloSpark.jar] on client, or slave02, but the result are the same please give me some advice, thx

some log info as follw :


  1. ExecutorBackend's bootstrap cmd

    root@slave02:~# jps 
    7984 CoarseGrainedExecutorBackend 
    6468 NodeManager 
    8037 Jps 
    955 Worker 
    7981 CoarseGrainedExecutorBackend 
    7982 CoarseGrainedExecutorBackend 
    6366 DataNode 
    7983 CoarseGrainedExecutorBackend 
    root@slave02:~# ps -ef | grep 7983 
    root 7983 955 14 06:21 ? 00:00:03 /usr/local/ProgramFiles/jdk1.8.0_121/bin/java -cp /usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/conf/:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/ProgramFiles/hadoop-2.6.5/etc/hadoop/ -Xms64M -Xmx64M -Dspark.driver.port=37230 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@192.168.0.191:37230 --executor-id 1 --hostname 192.168.0.182 --cores 1 --app-id app-20170408062155-0015 --worker-url spark://Worker@192.168.0.182:46466 
    root 8050 4249 4 06:22 pts/1 00:00:00 grep --color=auto 7983 
    root@slave02:~# 
    
  2. executor's error log

    root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# cat work/app-20170408062155-0015/0/stderr 
    17/04/08 06:22:20 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 
    17/04/08 06:22:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
    17/04/08 06:22:28 INFO spark.SecurityManager: Changing view acls to: root 
    17/04/08 06:22:28 INFO spark.SecurityManager: Changing modify acls to: root 
    17/04/08 06:22:28 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 
    17/04/08 06:23:06 INFO spark.SecurityManager: Changing view acls to: root 
    17/04/08 06:23:06 INFO spark.SecurityManager: Changing modify acls to: root 
    17/04/08 06:23:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 
    17/04/08 06:23:24 INFO slf4j.Slf4jLogger: Slf4jLogger started 
    17/04/08 06:23:29 INFO Remoting: Starting remoting 
    Exception in thread "main" 17/04/08 06:23:46 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 
    17/04/08 06:23:47 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 
    java.lang.reflect.UndeclaredThrowableException 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643) 
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68) 
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:151) 
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:253) 
    at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) 
    Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds] 
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) 
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) 
    at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) 
    at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) 
    at scala.concurrent.Await$.result(package.scala:107) 
    at akka.remote.Remoting.start(Remoting.scala:179) 
    at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184) 
    at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:620) 
    at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:617) 
    at akka.actor.ActorSystemImpl._start(ActorSystem.scala:617) 
    at akka.actor.ActorSystemImpl.start(ActorSystem.scala:634) 
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:142) 
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:119) 
    at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121) 
    at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) 
    at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:52) 
    at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2024) 
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) 
    at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2015) 
    at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:55) 
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:266) 
    at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:217) 
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:186) 
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69) 
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) 
    ... 4 more 
    root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# cat work/app-20170408062155-0015/0/stdout 
    root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# 
    
  3. driver's log

    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
    17/04/08 21:21:44 INFO SparkContext: Running Spark version 1.6.2 
    17/04/08 21:21:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
    17/04/08 21:21:45 INFO SecurityManager: Changing view acls to: root 
    17/04/08 21:21:45 INFO SecurityManager: Changing modify acls to: root 
    17/04/08 21:21:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 
    17/04/08 21:21:46 INFO Utils: Successfully started service 'sparkDriver' on port 37230. 
    17/04/08 21:21:47 INFO Slf4jLogger: Slf4jLogger started 
    17/04/08 21:21:47 INFO Remoting: Starting remoting 
    17/04/08 21:21:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.0.191:43974] 
    17/04/08 21:21:48 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 43974. 
    17/04/08 21:21:48 INFO SparkEnv: Registering MapOutputTracker 
    17/04/08 21:21:48 INFO SparkEnv: Registering BlockManagerMaster 
    17/04/08 21:21:48 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-ef79b656-b7f4-4cb3-be3e-0f8bb61baa9d 
    17/04/08 21:21:48 INFO MemoryStore: MemoryStore started with capacity 431.3 MB 
    17/04/08 21:21:48 INFO SparkEnv: Registering OutputCommitCoordinator 
    17/04/08 21:21:54 INFO Utils: Successfully started service 'SparkUI' on port 4040. 
    17/04/08 21:21:54 INFO SparkUI: Started SparkUI at http://192.168.0.191:4040 
    17/04/08 21:21:54 INFO AppClient$ClientEndpoint: Connecting to master spark://master:7077... 
    17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20170408062155-0015 
    17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/0 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 
    17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/0 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 
    17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/1 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 
    17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/1 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 
    17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/2 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 
    17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/2 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 
    17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/3 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 
    17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/3 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 
    17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/4 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 
    17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/4 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 
    17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/5 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 
    17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/5 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 
    17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/6 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 
    17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/6 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 
    17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/7 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 
    17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/7 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 
    17/04/08 21:21:55 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42255. 
    17/04/08 21:21:55 INFO NettyBlockTransferService: Server created on 42255 
    17/04/08 21:21:56 INFO BlockManagerMaster: Trying to register BlockManager 
    17/04/08 21:21:57 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.191:42255 with 431.3 MB RAM, BlockManagerId(driver, 192.168.0.191, 42255) 
    17/04/08 21:21:57 INFO BlockManagerMaster: Registered BlockManager 
    17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/0 is now RUNNING 
    17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/1 is now RUNNING 
    17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/2 is now RUNNING 
    17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/3 is now RUNNING 
    17/04/08 21:22:00 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/4 is now RUNNING 
    17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/5 is now RUNNING 
    17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/6 is now RUNNING 
    17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/7 is now RUNNING 
    17/04/08 21:22:03 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 
    17/04/08 21:22:05 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 107.7 KB, free 107.7 KB) 
    17/04/08 21:22:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 9.8 KB, free 117.5 KB) 
    17/04/08 21:22:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.191:42255 (size: 9.8 KB, free: 431.2 MB) 
    17/04/08 21:22:06 INFO SparkContext: Created broadcast 0 from textFile at Test01WordCount.scala:30 
    17/04/08 21:22:21 INFO FileInputFormat: Total input paths to process : 1 
    17/04/08 21:22:21 INFO SparkContext: Starting job: foreach at Test01WordCount.scala:33 
    17/04/08 21:22:21 INFO DAGScheduler: Got job 0 (foreach at Test01WordCount.scala:33) with 2 output partitions 
    17/04/08 21:22:21 INFO DAGScheduler: Final stage: ResultStage 0 (foreach at Test01WordCount.scala:33) 
    17/04/08 21:22:21 INFO DAGScheduler: Parents of final stage: List() 
    17/04/08 21:22:21 INFO DAGScheduler: Missing parents: List() 
    17/04/08 21:22:21 INFO DAGScheduler: Submitting ResultStage 0 (hdfs://master/970655147/input/01WordCount/ MapPartitionsRDD[1] at textFile at Test01WordCount.scala:30), which has no missing parents 
    17/04/08 21:22:21 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.0 KB, free 120.5 KB) 
    17/04/08 21:22:21 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1842.0 B, free 122.3 KB) 
    17/04/08 21:22:21 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.0.191:42255 (size: 1842.0 B, free: 431.2 MB) 
    17/04/08 21:22:21 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 
    17/04/08 21:22:21 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (hdfs://master/970655147/input/01WordCount/ MapPartitionsRDD[1] at textFile at Test01WordCount.scala:30) 
    17/04/08 21:22:21 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 
    17/04/08 21:22:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:23:04 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:23:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:23:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:23:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:23:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:24:02 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/1 is now EXITED (Command exited with code 1) 
    17/04/08 21:24:02 INFO SparkDeploySchedulerBackend: Executor app-20170408062155-0015/1 removed: Command exited with code 1 
    17/04/08 21:24:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:24:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:24:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:24:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:25:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:25:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:25:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:25:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 
    17/04/08 21:26:02 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(1,Command exited with code 1)] in 1 attempts 
    org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout 
    at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) 
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) 
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) 
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) 
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) 
    at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) 
    at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) 
    at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:370) 
    at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.executorRemoved(SparkDeploySchedulerBackend.scala:144) 
    at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:184) 
    at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) 
    at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) 
    at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) 
    at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
    Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] 
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) 
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) 
    at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) 
    at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) 
    at scala.concurrent.Await$.result(package.scala:107) 
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) 
    ... 12 more 
    

some refered links as follw :


  1. How can I run Spark job programmatically
  2. Intermittent Timeout Exception using Spark
  3. http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-td16989.html
  4. https://issues.streamsets.com/browse/SDC-4249
  5. https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Unable-to-create-SparkContext-to-Spark-1-3-Standalone-service-in/td-p/29176

these posts I've searched before, and the not fix this problem, there are some log in my cluster about tests. or maybe I consider wrong, please help me check that.

  1. ufw status and ssh connectivity

    root@master:/usr/local/ProgramFiles# ufw status
    Status: inactive
    root@master:/usr/local/ProgramFiles# ssh slave01
    Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-62-generic x86_64)
    
     * Documentation:  https://help.ubuntu.com
     * Management:     https://landscape.canonical.com
     * Support:        https://ubuntu.com/advantage
    Last login: Sat Apr  8 21:33:44 2017 from 192.168.0.119
    root@slave01:~# ufw status
    Status: inactive
    root@slave01:~# ssh slave02
    Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-62-generic x86_64)
    
     * Documentation:  https://help.ubuntu.com
     * Management:     https://landscape.canonical.com
     * Support:        https://ubuntu.com/advantage
    Last login: Sat Apr  8 21:10:33 2017 from 192.168.0.119
    root@slave02:~# ufw status
    Status: inactive
    root@slave02:~# 
    
  2. network connectivity by ip or FQDN

    2.1. nc in master
    root@master:/usr/local/ProgramFiles# netcat -l 12306
    root@master:/usr/local/ProgramFiles# nc -l 12306
    root@master:/usr/local/ProgramFiles# nc -l 12306
    root@master:/usr/local/ProgramFiles# nc -l 12306
    
    2.2. nc in slave01
    root@slave01:~# nc -vz 192.168.0.180 12306
    Connection to 192.168.0.180 12306 port [tcp/*] succeeded!
    root@slave01:~# nc -vz master 12306
    Connection to master 12306 port [tcp/*] succeeded!
    
    2.3. nc in slave02
    root@slave02:/usr/local/ProgramFiles# nc -vz 192.168.0.180 12306
    Connection to 192.168.0.180 12306 port [tcp/*] succeeded!
    root@slave02:/usr/local/ProgramFiles# nc -vz master 12306
    Connection to master 12306 port [tcp/*] succeeded!
    root@slave02:/usr/local/ProgramFiles# 
    

reinstall other version of Spark :


I reinstall other version of Spark, but the same question is still here, and there may be some problem in environment ...

please give me some advice, thanks

Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
  • `WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources` means your cluster don't have free resource to start you application. Open your cluster UI check whether any other application is running and using full resource of your cluster. – Kaushal Apr 10 '17 at 14:18
  • @Kaushal thanks for your reply, in my cluster, have only one app[I'm just testing], this is the most apparent reason, deep more it caused by following steps, 1. master schedule allocate executor for driver, 2. worker launch executor for driver, 3. executor register for driver[at this phase, executor blocked at 'CoarseGrainedExecutorBackend$.main'], 4. driver schedule executor to execute driverProgram, and so on .. – Jerry.X.He Apr 10 '17 at 15:44
  • Are you sure, your hdfs Url `"hdfs://master/970655147/input/01WordCount/"` is correct or you need to specify hdfs port? – Kaushal Apr 10 '17 at 15:55
  • en, yes, I use the default port, run locally it could be detected in Spark program, and hadoop program could detected too – Jerry.X.He Apr 11 '17 at 02:57
  • @Kaushal hi, thanks, this problem be solved muddleheaded, please see the comment below – Jerry.X.He Apr 11 '17 at 09:28

1 Answers1

0

today, I wanna to change environment of java, scala, and then I find an post, it said he structured Spark by jdk1.7.0_80 & scala2.11.8 and then I download the jdk1.7.0_40 & scala2.11.8, then installed on my cluster[master, slave01, slave02]

um, update environment variable, hadoop's hadoop-env.sh, spark's spark.env-sh, and then shutdown spark, hadoop, startup hadoop, spark then I use "/bin/spark-shell --master spark://master:7077 --executor-memory 64M", run spark-shell

then spark-shell not normally running, but the log seems have different with before, and then I look for the Executor's log

I seed that

    Exception in thread "main" java.lang.IllegalArgumentException: System memory 64880640 must be at least 4.718592E8. Please use a larger heap size.
            at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:198)
            at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:180)
            at org.apache.spark.SparkEnv$.create(SparkEnv.scala:354)
            at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:217)
            at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:186)
            at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
            at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:415)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
            at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
            at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:151)
            at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:253)
            at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)

and then I got "CoarseGrainedExecutorBackend.scala:186", it seems not blocked at "UserGroupInformation.doAs(UserGroupInformation.java:1643)", and according to the error log information I changed "--executor-memory", speciify it at "512M"

and then retype command spark-shell at this time, I successfully got in, and then I try to run 'WordCount' on cluster

first update the ".set("spark.executor.memory", "64M") " then build jar, and put it into cluster, it running normally


got it, it seems this problem been fixed ??

then I want to find why this problem occured, in jdk or scala??

then I tests to update environment variable, spark's spark-env.sh, hadoop's hadoop-env.sh between 'jdk1.8.0_121 & scala2.10.4' and 'jdk1.7.0_40 & scala 2.11.8', but I found at this time, these two environment is ok for spark-shell, and 'WordCount'

even if I remove the 'jdk1.7.0_40 & scala 2.11.8', and all config revert to before[when I encounter this question], but it still ok

oh, my god, this is a mysterious problem ... even if I didn't find the original reason of this problem, but I still statisfied, at least I cluster is ok now

thank you, @Kaushal