val tvalues: Array[Double] = Array(1.866393526974307, 2.864048126935307, 4.032486069215076, 7.876169953355888, 4.875333799256043, 14.316322626848278)
val pvalues: Array[Double] = Array(0.064020056478447, 0.004808399479386827, 8.914865448939047E-5, 7.489564524121306E-13, 2.8363794106756046E-6, 0.0)
I have two Arrays as above, i need to build a DataFrame from this Arrays like the following,
我有两个如上所述的数组,我需要从这个数组构建一个DataFrame,如下所示,
Tvalues Pvalues
1.866393526974307 0.064020056478447
2.864048126935307 0.004808399479386827
...... .....
As of now i am trying with StringBuilder
in Scala. which doesnt go as expected. help me on this please.
截至目前,我正在尝试使用Scala中的StringBuilder。没有按预期进行。请帮帮我。
1 个解决方案
#1
7
Try for instance
试试吧
val df = sc.parallelize(tpvalues zip pvalues).toDF("Tvalues","Pvalues")
and thus
因此
scala> df.show
+------------------+--------------------+
| Tvalues| Pvalues|
+------------------+--------------------+
| 1.866393526974307| 0.064020056478447|
| 2.864048126935307|0.004808399479386827|
| 4.032486069215076|8.914865448939047E-5|
| 7.876169953355888|7.489564524121306...|
| 4.875333799256043|2.836379410675604...|
|14.316322626848278| 0.0|
+------------------+--------------------+
Using parallelize
we obtain an RDD
of tuples -- the first element from the first array, the second element from the other array --, which is transformed into a dataframe of rows, one row for each tuple.
使用parallelize,我们获得元组的RDD - 第一个数组的第一个元素,另一个数组的第二个元素 - ,它被转换为行的数据帧,每个元组一行。
Update
更新
For dataframe'ing multiple arrays (all with the same size), for instance 4 arrays, consider
对于数据帧的多个阵列(都具有相同的大小),例如4个阵列,请考虑
case class Row(i: Double, j: Double, k: Double, m: Double)
val xs = Array(arr1, arr2, arr3, arr4).transpose
val rdd = sc.parallelize(xs).map(ys => Row(ys(0), ys(1), ys(2), ys(3))
val df = rdd.toDF("i","j","k","m")
#1
7
Try for instance
试试吧
val df = sc.parallelize(tpvalues zip pvalues).toDF("Tvalues","Pvalues")
and thus
因此
scala> df.show
+------------------+--------------------+
| Tvalues| Pvalues|
+------------------+--------------------+
| 1.866393526974307| 0.064020056478447|
| 2.864048126935307|0.004808399479386827|
| 4.032486069215076|8.914865448939047E-5|
| 7.876169953355888|7.489564524121306...|
| 4.875333799256043|2.836379410675604...|
|14.316322626848278| 0.0|
+------------------+--------------------+
Using parallelize
we obtain an RDD
of tuples -- the first element from the first array, the second element from the other array --, which is transformed into a dataframe of rows, one row for each tuple.
使用parallelize,我们获得元组的RDD - 第一个数组的第一个元素,另一个数组的第二个元素 - ,它被转换为行的数据帧,每个元组一行。
Update
更新
For dataframe'ing multiple arrays (all with the same size), for instance 4 arrays, consider
对于数据帧的多个阵列(都具有相同的大小),例如4个阵列,请考虑
case class Row(i: Double, j: Double, k: Double, m: Double)
val xs = Array(arr1, arr2, arr3, arr4).transpose
val rdd = sc.parallelize(xs).map(ys => Row(ys(0), ys(1), ys(2), ys(3))
val df = rdd.toDF("i","j","k","m")