
时间:2022-02-04 23:45:59

DataFrame/DataSet 创建

  • 读文件接口
import org.apache.spark.sql.SparkSession
val spark = SparkSession
.appName("Spark SQL basic example")
.config("spark.some.config.option", "some-value")
// For implicit conversions like converting RDDs to DataFrames
import spark.implicits._

DataFrame/DataSet 读取数据源文档 返回 DataFrameReader

spark.readStream 返回 DataStreamReader

后续读文件操作雷同,可以参考作者的 Structured Streaming 文章

  • RDD 转换成 DataFrame/DataSet
    • 方式1:已知元数据
      val peopleDF = spark.sparkContext
      .map(attributes => Person(attributes(0), attributes(1).trim.toInt))
    • 方式2:未知元数据
      val schemaString = "name age"
      // Generate the schema based on the string of schema
      val fields = schemaString.split(" ")
      .map(fieldName => StructField(fieldName, StringType, nullable = true))
      val schema = StructType(fields)
      // Convert records of the RDD (people) to Rows
      val rowRDD = peopleRDD
      .map(attributes => Row(attributes(0), attributes(1).trim))