Spark SQL 增加了DataFrame 即带有Schema信息的RDD
DataFrame 创建
启动pyspark(由于内存不够 启动本地,模式)
pyspark --master local
pyspark 自动生成 sc,sparksession
from pyspark import SparkContext,SparkConf from pyspark.sql import SparkSession spark=SparkSession.builder.config(conf=SparkConf()).getOrCreate() df=spark.read.json("file:///usr/local/spark/examples/src/main/resources/people.json") df.show()
---- -------
| age| name|
---- -------
|null|Michael|
| 30| Andy|
| 19| Justin|
---- -------