1

I need to create a Spark column in Java with the following requirements: - Column should be created independently. i.e. should be created without any knowledge on other columns in the dataset - Should have a custom unique value per row returned from an API, e.g: UUID.randomUUID().toString() or MYCLASS.customMethod();

I have tried the following:

1) Using a UDF - it works but this is deterministic (Spark Dataframe Random UUID changes after every transformation/action)

2) Used expr("uuid()"), but this uses the internal implementation and I want to use my own API here.

Any leads on how to create a column independently with the value of an API is appreciated (in Java). Considering the upstream process is deterministic. Additionally, I am looking at how the sql uuid() was implemented.

Thoughts for now: can I create a custom Expression in Java and pass to the Column constructor.

DennisLi
  • 3,915
  • 6
  • 30
  • 66
Malaka
  • 107
  • 1
  • 6

0 Answers0