I am generating some test data using UDFs in Spark SQL. I have one field, field_b, that uses random number generation in combination with another field, field_a. A third field, field_c, is the value of field_b divided by 100.
i.e.
select
field_a
randomUDF(field_a) as field_b
from
my_table
I do this first, then use a second select (as I can't refer to the generated field) to form the third field, like so:
select
field_a
field_b
divisionUDF(field_b) as field_c
from
my_table
My problem is that it doesn't calculate the value of field_b; it keeps the reference to the function. This means that the randomly generated part differs and field_c is not field_b/100
Is there a way I can force it to evaluate field_b once and hold the value (short of writing to disk)? Even better, if it could be done in a single select statement (I know I could use a sub-query) that would be great to know.