001Spark文件分析测试

使用spark-1.4.1-bin-hadoop2.6进行处理，测试文件大小为3G，

测试结果：

1:统计一个文件中某个字符的个数

scala> sc.textFile("/home/y/my_temp/1.txt").filter(line=>line.contains("ok")).count()

scala> sc.textFile("/home/y/my_temp/1.txt").

用时：Duration    13 s

记录条数：res5: Long = 101824020

2:统计行数

scala> sc.textFile("/home/y/my_temp/1.txt").count()

用时：Duration    12 s

记录条数：res2: Long = 10

秒客网

001Spark文件分析测试

相关文章