1

I am trying to use a spark cluster remotely, so basically I have a 3 nodes, 1 master and 2 workers 'far' from my laptop, but I want to use them to make computation. I can connect easily to the driver typing: ./spark-shell --master spark://xxx.xxx.xxx.xxx:7077

And I see my application on spark web interface on the driver, but actions are not executed, I think because workers which should connect to me have networks problems. So I set on spark-env.sh my public IP: SPARK_LOCAL_IP=XX.XX.15.215

but doing that I get:

16/02/11 16:34:04 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1.

16/02/11 16:34:04 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1. 16/02/11 16:34:04 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1. 16/02/11 16:34:04 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1. 16/02/11 16:34:04 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1. 16/02/11 16:34:04 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1. 16/02/11 16:34:04 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1. 16/02/11 16:34:04 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1. 16/02/11 16:34:04 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1. 16/02/11 16:34:04 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1. 16/02/11 16:34:04 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1. 16/02/11 16:34:04 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1. 16/02/11 16:34:04 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1. 16/02/11 16:34:04 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1. 16/02/11 16:34:04 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1. 16/02/11 16:34:04 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1. 16/02/11 16:34:04 ERROR SparkContext: Error initializing SparkContext.

Enrico D' Urso
  • 196
  • 2
  • 9
  • What do you see on your spark master web UI? Is it waiting for resources? Do you see the workers connected to the master? Can you post a screenshot of the master web UI? – Saket Feb 11 '16 at 16:39
  • The warning you have posted should not cause a problem for you as it will attempt to bind to the next port. – Saket Feb 11 '16 at 16:39
  • Saket as you can see it stops trying at some point – Enrico D' Urso Feb 11 '16 at 16:58

1 Answers1

0

It is quite important for the DNS entries to be correct and for the workers and the master to be able to connect to the driver. If they are not directly reachable using the hostname, you can specify the spark.driver.host and spark.driver.port as part of the spark submit command.

Refer to this for a similar problem I had.

Community
  • 1
  • 1
Saket
  • 3,079
  • 3
  • 29
  • 48
  • Do you think is not possible to solve using spark-shell? – Enrico D' Urso Feb 11 '16 at 17:01
  • You can specify these in `spark-env.sh`. – Saket Feb 11 '16 at 17:30
  • lie this: SPARK_DRIVER_PORT=10002 ? – Enrico D' Urso Feb 11 '16 at 17:35
  • Sorry, you have to use conf/spark-defaults.conf as specified at https://spark.apache.org/docs/latest/submitting-applications.html#loading-configuration-from-a-file and https://spark.apache.org/docs/latest/configuration.html – Saket Feb 11 '16 at 17:39
  • So I need to set those values those values in the workers and in the master not in my PC ? – Enrico D' Urso Feb 11 '16 at 17:56
  • I think it should be sufficient to set it on the driver. However you will have to test it out to be sure. – Saket Feb 11 '16 at 18:00
  • I passed those parameters to submit but it still does not work.. It is not able to bind – Enrico D' Urso Feb 11 '16 at 18:26
  • Do you have permissions to bind a port? – Saket Feb 11 '16 at 18:36
  • above 1024 yes, but it looks like I cannot bind on public IP – Enrico D' Urso Feb 11 '16 at 18:44
  • it looks like workers are still using private IP: 16/02/11 18:50:37 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 16/02/11 18:50:41 ERROR UserGroupInformation: PriviledgedActionException as:edge7 (auth:SIMPLE) cause:java.io.IOException: Failed to connect to /10.1.4.225:10001 Exception in thread "main" java.io.IOException: Failed to connect to /10.1.4.225:10001 – Enrico D' Urso Feb 11 '16 at 18:52
  • I had a very similar error on my machine and the problem was that my DNS was not setup correct for the host I was running the spark job on. – Saket Feb 15 '16 at 11:01