Spark Programming--Fundamental operation

时间:2023-03-10 05:40:47
Spark Programming--Fundamental operation

max

max(key=None)

Find the maximum item in this RDD.

Parameters:key – A function used to generate key for comparing

例子:

Spark Programming--Fundamental operation

mean

mean()

Compute the mean of this RDD’s elements.

Spark Programming--Fundamental operation

min

min(key=None)

Find the minimum item in this RDD.

Parameters:key – A function used to generate key for comparing

Spark Programming--Fundamental operation

name/setName

name()

setName(name)

给RDD命名或者返回RDD的名字

例子:

Spark Programming--Fundamental operation

others

sc.parallelize():创建RDD,建议使用xrange

getNumPartitions():获取分区数

sc.emptyRDD():返回一个空的RDD

glom():以分区为单位返回list

collect():返回list(一般是返回driver program)

例子:

Spark Programming--Fundamental operation

sc.textFile(path):读取文件,返回RDD(具体见Actions II)

官网函数:textFile(nameminPartitions=Noneuse_unicode=True)

支持读取文件:a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI, and return it as an RDD of Strings.

例子(本地文件读取)

Spark Programming--Fundamental operation