learning-pyspark.pdf

时间:2022-06-16 12:09:09
【文件属性】:

文件名称:learning-pyspark.pdf

文件大小:9.38MB

文件格式:PDF

更新时间:2022-06-16 12:09:09

pyspark

Learning pyspark It is estimated that in 2013 the whole world produced around 4.4 zettabytes of data; that is, 4.4 billion terabytes! By 2020, we (as the human race) are expected to produce ten times that. With data getting larger literally by the second, and given the growing appetite for making sense out of it, in 2004 Google employees Jeffrey Dean and Sanjay Ghemawat published the seminal paper MapReduce: Simplified Data Processing on Large Clusters. Since then, technologies leveraging the concept started growing very quickly with Apache Hadoop initially being the most popular. It ultimately created a Hadoop ecosystem that included abstraction layers such as Pig, Hive, and Mahout – all leveraging this simple concept of map and reduce.


网友评论