I need to create a Spark column in Java with the following requirements:
- Column should be created independently. i.e. should be created without any knowledge on other columns in the dataset
- Should have a custom unique value per row returned from an API, e.g: UUID.randomUUID().toString() or MYCLASS.customMethod();
I have tried the following:
1) Using a UDF - it works but this is deterministic (Spark Dataframe Random UUID changes after every transformation/action)
2) Used expr("uuid()"), but this uses the internal implementation and I want to use my own API here.
Any leads on how to create a column independently with the value of an API is appreciated (in Java). Considering the upstream process is deterministic. Additionally, I am looking at how the sql uuid() was implemented.
Thoughts for now: can I create a custom Expression in Java and pass to the Column constructor.