SparkR-Install

时间：2017-03-30 23:05:18 阅读：17 评论：0 收藏：0 [点我收藏+]

标签：too 下载安装jdk context writing 磁盘 anti 1.5 products

1.下载R

https://cran.r-project.org/src/base/R-3/

SparkR-Install

1.2 环境变量配置：

SparkR-Install

1.3 测试安装：

SparkR-Install

2.下载Rtools33

https://cran.r-project.org/bin/windows/Rtools/

SparkR-Install

2.1 配置环境变量

SparkR-Install

2.2 测试：

SparkR-Install

3.安装RStudio

https://www.rstudio.com/products/rstudio/download/ 直接下一步即可安装

SparkR-Install

4.安装JDK并设置环境变量

4.1环境变量配置：

SparkR-Install

4.2测试：

SparkR-Install

5.下载Spark安装程序

5.1 URL: http://spark.apache.org/downloads.html

SparkR-Install

5.2解压到本地磁盘的对应目录

SparkR-Install

6.安装Spark并设置环境变量

SparkR-Install

7.测试SparkR

SparkR-Install

注意：如果发现了提示 WARN NativeCodeLader：Unable to load native-hadoop library for your platform.....using

builtin-java classes where applicable 需要安装本地的hadoop库

8.下载hadoop库并安装

http://hadoop.apache.org/releases.html

SparkR-Install

9.设置hadoop环境变量

SparkR-Install

10.重新测试SparkR

10.1 如果测试时候出现以下提示，需要修改log4j文件INFO为WARN，位于\spark\conf下

SparkR-Install

10.2 修改conf中的log4j文件：

SparkR-Install

10.3 重新运行SparkR

SparkR-Install

11.运行SprkR代码

在Spark2.0中增加了RSparkSql进行Sql查询

dataframe为数据框操作

data-manipulation为数据转化

ml为机器学习

SparkR-Install

11.1 使用crtl+ALT+鼠標左鍵打开控制台在此文件夹下

SparkR-Install

11.2 执行spark-submit xxx.R文件即可

SparkR-Install

12.安装SparkR包

12.1 将spark安装目录下的R/lib中的SparkR文件拷贝到..\R-3.3.2\library中，注意是将整个Spark文件夹，而非里面每一个文件。

源文件夹：

SparkR-Install

目的文件夹：

SparkR-Install

12.2 在RStudio中打开SparkR文件并运行代码dataframe.R文件，采用Ctrl+Enter一行行执行即可

SparkR语言的dataframe.R源代码如下

#

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements.  See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License.  You may obtain a copy of the License at

#

#    http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

#

library(SparkR)

# Initialize SparkContext and SQLContext

sc <- sparkR.init(appName="SparkR-DataFrame-example")

sqlContext <- sparkRSQL.init(sc)

# Create a simple local data.frame

localDF <- data.frame(name=c("John", "Smith", "Sarah"), age=c(19, 23, 18))

# Convert local data frame to a SparkR DataFrame

df <- createDataFrame(sqlContext, localDF)

# Print its schema

printSchema(df)

# root

#  |-- name: string (nullable = true)

#  |-- age: double (nullable = true)

# Create a DataFrame from a JSON file

path <- file.path(Sys.getenv("SPARK_HOME"), "examples/src/main/resources/people.json")

peopleDF <- read.json(sqlContext, path)

printSchema(peopleDF)

# Register this DataFrame as a table.

registerTempTable(peopleDF, "people")

# SQL statements can be run by using the sql methods provided by sqlContext

teenagers <- sql(sqlContext, "SELECT name FROM people WHERE age >= 13 AND age <= 19")

# Call collect to get a local data.frame

teenagersLocalDF <- collect(teenagers)

# Print the teenagers in our dataset

print(teenagersLocalDF)

# Stop the SparkContext now

sparkR.stop()

13.Rsudio 运行结果

SparkR-Install

END~