1
import com.github.plokhotnyuk.rtree2d.core._
import EuclideanPlane._

val box1 = entry(1.0f, 1.0f, 2.0f, 2.0f, "Box 1")
val box2 = entry(2.0f, 2.0f, 3.0f, 3.0f, "Box 2")
val entries = Seq(box1, box2)

val rtree = RTree(entries)
val broadcastedIndex = spark.sparkContext.broadcast(rtree)

fails when forcing kryo serialization, i.e. when starting spark with a configuration of:

.set("spark.serializer", classOf[KryoSerializer].getCanonicalName)
.set("spark.kryo.registrationRequired", "true")

As the RTreeNode class is not registered. So far, so good.

The problem arises when trying to: - use KryoSerializer but not force kryo: spark gets stuck / does not continue operation - trying to register the classes:

Class is not registered: com.github.plokhotnyuk.rtree2d.core.RTree

can be fixed with:

kryo.register(Class.forName("com.github.plokhotnyuk.rtree2d.core.RTreeNode"))

However, Class is not registered: com.github.plokhotnyuk.rtree2d.core.RTree[]

kryo.register(classOf[scala.Array[com.github.plokhotnyuk.rtree2d.core.RTreeNode[A]]])

fails to compile as I cannot register a generic RTreeNode to spark when trying to create a custom kryo registrator within the com.github.plokhotnyuk.rtree2d.core namespace to access the private classes.

How can I get generic classes registered without specifying a concrete implementation or alternatively prevent spark from getting stuck when falling back to java serialization?

NOTE when not using kryo at all it works just fine.

edit

Spark Kryo register for array class

kryo.register(Array.newInstance(Class.forName("com.github.plokhotnyuk.rtree2d.core.RTreeNode"), 0).getClass())

in java, unfortunately, I fail to get this to compile in scala.

Georg Heiler
  • 16,916
  • 36
  • 162
  • 292

1 Answers1

0

Please try combination of lazy and transient.

Scala lazy val denotes a field that will only be calculated once it is accessed for the first time and is then stored for future reference.

With @transient on the other hand one can denote a field that shall not be serialized.

reference: https://stackoverflow.com/questions/34769220/difference-when-serializing-a-lazy-val-with-or-without-transient
maogautam
  • 318
  • 1
  • 13
  • No this is not going to work. For a broadcast variable I do need to also serialize its contents. – Georg Heiler Aug 22 '19 at 04:32
  • In that case you might have to write a custom serializer. https://stackoverflow.com/questions/36144618/spark-kryo-register-a-custom-serializer – maogautam Aug 23 '19 at 18:38