I have a Scala case class as
final case class TestStruct(
num_1: Long,
num_2: Long
)
I have a method that converts string to struct and want to use it in pyspark. I define the following class:
package com.path.test
// other imports including Gson, etc.
import org.apache.spark.sql.api.java.UDF1
class TestJob extends UDF1[String, TestStruct] {
def call(someString: String): TestStruct = {
// code to get TestStruct from someString
}
}
Then I register it in pyspark using
spark.udf.registerJavaFunction("get_struct", "com.path.test")
But when I use df = spark.sql("select get_struct(string_col) from db.tb"), it returns an empty struct. Even when I use df.printSchema() it just shows struct (nullable = true) and not the fields of it (num_1 and num_2)
Other people have shown success with integers (links below), but not with struct / case class data types:
spark-how-to-map-python-with-scala-or-java-user-defined-functions
using-scala-classes-as-udf-with-pyspark
Any help would be appreciated.