0

After a series of validations over a DataFrame,

I obtain a List of String with certain values like this:

List[String]=(lvalue1, lvalue2, lvalue3, ...)

And I have a Dataframe with n values:

dfield 1  | dfield 2  | dfield 3
___________________________
dvalue1   | dvalue2   | dvalue3
dvalue1   | dvalue2   | dvalue3

I want to append the values of the List at the beggining of my Dataframe, in order to get a new DF with something like this:

dfield 1  | dfield 2  | dfield 3 | dfield4 | dfield5 | dfield6
__________________________________________________________
lvalue1   | lvalue2   | lvalue3  | dvalue1 | dvalue2 | dvalue3
lvalue1   | lvalue2   | lvalue3  | dvalue1 | dvalue2 | dvalue3

I have found something using a UDF. Could be this correct for my purpose?

Regards.

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Erik Barajas
  • 193
  • 2
  • 12

1 Answers1

3

TL;DR Use select or withColumn with lit function.

I'd use lit function with select operator (or withColumn).

lit(literal: Any): Column Creates a Column of literal value.

A solution could be as follows.

val values = List("lvalue1", "lvalue2", "lvalue3")
val dfields = values.indices.map(idx => s"dfield ${idx + 1}")

val dataset = Seq(
  ("dvalue1", "dvalue2", "dvalue3"),
  ("dvalue1", "dvalue2", "dvalue3")
).toDF("dfield 1", "dfield 2", "dfield 3")

val offsets = dataset.
  columns.
  indices.
  map { idx => idx + colNames.size + 1 }

val offsetDF = offsets.zip(dataset.columns).
  foldLeft(dataset) { case (df, (off, col)) => df.withColumnRenamed(col, s"dfield $off") }

val newcols = colNames.zip(dfields).
  map { case (v, dfield) => lit(v) as dfield } :+ col("*")

scala> offsetDF.select(newcols: _*).show
+--------+--------+--------+--------+--------+--------+
|dfield 1|dfield 2|dfield 3|dfield 4|dfield 5|dfield 6|
+--------+--------+--------+--------+--------+--------+
| lvalue1| lvalue2| lvalue3| dvalue1| dvalue2| dvalue3|
| lvalue1| lvalue2| lvalue3| dvalue1| dvalue2| dvalue3|
+--------+--------+--------+--------+--------+--------+
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
  • I have used the **select** function as you suggested. It was so easy of this way: **dfNew= df.select(lit(array(x)), col(x), col(y), ...)** – Erik Barajas Jul 13 '17 at 14:48
  • Interesting. What's `array(x)` and `x` and `y`? I'd like to learn from your experience and would appreciate more info how you did that. Thanks. – Jacek Laskowski Jul 13 '17 at 23:52