关键字:Linux CentOS Spark Scala Java
版本号:CentOS7 Spark2.1.1 Scala2.12.2 JDK1.8
说明:单机版的Spark的机器上只需要安装Scala和JDK即可,其他诸如Hadoop、Zookeeper之类的东西可以一概不安装。如果要安装基于Hadoop的Spark集群,请参考该博文:
http://blog.csdn.net/pucao_cug/article/details/72353701
1 安装Spark依赖的Scala
Scala的下载和解压缩可以参考下面博文的对应章节,步骤和方法一模一样:
http://blog.csdn.net/pucao_cug/article/details/72353701
1.2 为Scala配置环境变量
编辑/etc/profile这个文件,在文件中增加一行配置:
export SCALA_HOME=/opt/scala/scala-2.12.2
在该文件的PATH变量中增加下面的内容:
${SCALA_HOME}/bin
添加完成后,我的/etc/profile的配置如下:
export JAVA_HOME=/opt/java/jdk1.8.0_121
export SCALA_HOME=/opt/scala/scala-2.12.2
export CLASS_PATH=.:${JAVA_HOME}/lib:$CLASS_PATH
export PATH=.:${JAVA_HOME}/bin: ${SCALA_HOME}/bin:$PATH
环境变量配置完成后,执行下面的命令:
source /etc/profile
1.3 验证Scala
执行命令:
scala -version
如图:
2 下载和解压缩Spark
下载和解压缩Spark可以参考该博文的下载和解压缩章节,步骤和方法一模一样:
http://blog.csdn.net/pucao_cug/article/details/72353701
单机版的Spark机器上只要安装JDK、Scala、Spark
如图:
3 Spark相关的配置
说明:因为我们搭建的是基于hadoop集群的Spark集群,所以每个hadoop节点上我都安装了Spark,都需要按照下面的步骤做配置,启动的话只需要在Spark集群的Master机器上启动即可,我这里是在hserver1上启动。
3.1 配置环境变量
编辑/etc/profile文件,增加
export SPARK_HOME=/opt/spark/spark-2.1.1-bin-hadoop2.7
在该文件的PATH变量中增加下面的内容:
${SPARK_HOME}/bin
修改完成后,我的/etc/profile文件内容是:
export JAVA_HOME=/opt/java/jdk1.8.0_121
export ZK_HOME=/opt/zookeeper/zookeeper-3.4.10
export SCALA_HOME=/opt/scala/scala-2.12.2
export SPARK_HOME=/opt/spark/spark-2.1.1-bin-hadoop2.7
export CLASS_PATH=.:${JAVA_HOME}/lib:$CLASS_PATH
export PATH=.:${JAVA_HOME}/bin:${SPARK_HOME}/bin:${ZK_HOME}/bin:${SCALA_HOME}/bin:$PATH
如图:
编辑完成后,执行命令:
source /etc/profile
3.2 配置conf目录下的文件
对/opt/spark/spark-2.1.1-bin-hadoop2.7/conf目录下的文件进行配置。
3.2.1 新建spark-env.h文件
执行命令,进入到/opt/spark/spark-2.1.1-bin-hadoop2.7/conf目录内:
cd /opt/spark/spark-2.1.1-bin-hadoop2.7/conf
以spark为我们创建好的模板创建一个spark-env.h文件,命令是:
cp spark-env.sh.template spark-env.sh
如图:
编辑spark-env.h文件,在里面加入配置(具体路径以自己的为准):
export SCALA_HOME=/opt/scala/scala-2.12.2
export JAVA_HOME=/opt/java/jdk1.8.0_121
export SPARK_HOME=/opt/spark/spark-2.1.1-bin-hadoop2.7
export SPARK_MASTER_IP=hserver1
export SPARK_EXECUTOR_MEMORY=1G
如图:
3.2.2 新建slaves文件
执行命令,进入到/opt/spark/spark-2.1.1-bin-hadoop2.7/conf目录内:
cd /opt/spark/spark-2.1.1-bin-hadoop2.7/conf
以spark为我们创建好的模板创建一个slaves文件,命令是:
cp slaves.template slaves
如图:
编辑slaves文件,里面的内容为:
localhost
如图:
4 测试单机模式的Spark
4.1 用单机模式运行Spark示例程序
上面的配置完成后,不需要启动任何东西,直接执行下面的命令即可。
进入到主目录,也就是执行下面的命令:
cd /opt/spark/spark-2.1.1-bin-hadoop2.7
执行命令运行计算圆周率的Demo程序:
./bin/run-example SparkPi 10
如图:
几秒后,执行完成
如图:
完整信息是:
[root@hserver1 ~]# cd/opt/spark/spark-2.1.1-bin-hadoop2.7
[root@hserver1 spark-2.1.1-bin-hadoop2.7]#./bin/run-example SparkPi 10
Using Spark's default log4j profile:org/apache/spark/log4j-defaults.properties
17/05/17 11:43:21 INFO SparkContext:Running Spark version 2.1.1
17/05/17 11:43:22 WARN NativeCodeLoader:Unable to load native-hadoop library for your platform... using builtin-javaclasses where applicable
17/05/17 11:43:25 INFO SecurityManager:Changing view acls to: root
17/05/17 11:43:25 INFO SecurityManager:Changing modify acls to: root
17/05/17 11:43:25 INFO SecurityManager:Changing view acls groups to:
17/05/17 11:43:25 INFO SecurityManager:Changing modify acls groups to:
17/05/17 11:43:25 INFO SecurityManager:SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups withview permissions: Set(); users withmodify permissions: Set(root); groups with modify permissions: Set()
17/05/17 11:43:25 INFO Utils: Successfullystarted service 'sparkDriver' on port 42970.
17/05/17 11:43:26 INFO SparkEnv:Registering MapOutputTracker
17/05/17 11:43:26 INFO SparkEnv:Registering BlockManagerMaster
17/05/17 11:43:26 INFOBlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapperfor getting topology information
17/05/17 11:43:26 INFOBlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/05/17 11:43:26 INFO DiskBlockManager:Created local directory at /tmp/blockmgr-fa083902-0f9c-4e44-9712-93fe301a4895
17/05/17 11:43:26 INFO MemoryStore:MemoryStore started with capacity 413.9 MB
17/05/17 11:43:26 INFO SparkEnv:Registering OutputCommitCoordinator
17/05/17 11:43:27 INFO Utils: Successfullystarted service 'SparkUI' on port 4040.
17/05/17 11:43:27 INFO SparkUI: BoundSparkUI to 0.0.0.0, and started at http://192.168.27.144:4040
17/05/17 11:43:27 INFO SparkContext: AddedJARfile:/opt/spark/spark-2.1.1-bin-hadoop2.7/examples/jars/scopt_2.11-3.3.0.jar atspark://192.168.27.144:42970/jars/scopt_2.11-3.3.0.jar with timestamp1494992607195
17/05/17 11:43:27 INFO SparkContext: AddedJARfile:/opt/spark/spark-2.1.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.1.jarat spark://192.168.27.144:42970/jars/spark-examples_2.11-2.1.1.jar withtimestamp 1494992607196
17/05/17 11:43:27 INFO Executor: Startingexecutor ID driver on host localhost
17/05/17 11:43:27 INFO Utils: Successfullystarted service 'org.apache.spark.network.netty.NettyBlockTransferService' onport 43732.
17/05/17 11:43:27 INFONettyBlockTransferService: Server created on 192.168.27.144:43732
17/05/17 11:43:27 INFO BlockManager: Usingorg.apache.spark.storage.RandomBlockReplicationPolicy for block replicationpolicy
17/05/17 11:43:27 INFO BlockManagerMaster:Registering BlockManager BlockManagerId(driver, 192.168.27.144, 43732, None)
17/05/17 11:43:27 INFOBlockManagerMasterEndpoint: Registering block manager 192.168.27.144:43732 with413.9 MB RAM, BlockManagerId(driver, 192.168.27.144, 43732, None)
17/05/17 11:43:27 INFO BlockManagerMaster:Registered BlockManager BlockManagerId(driver, 192.168.27.144, 43732, None)
17/05/17 11:43:27 INFO BlockManager:Initialized BlockManager: BlockManagerId(driver, 192.168.27.144, 43732, None)
17/05/17 11:43:28 INFO SharedState:Warehouse path is 'file:/opt/spark/spark-2.1.1-bin-hadoop2.7/spark-warehouse/'.
17/05/17 11:43:29 INFO SparkContext:Starting job: reduce at SparkPi.scala:38
17/05/17 11:43:29 INFO DAGScheduler: Gotjob 0 (reduce at SparkPi.scala:38) with 10 output partitions
17/05/17 11:43:29 INFO DAGScheduler: Finalstage: ResultStage 0 (reduce at SparkPi.scala:38)
17/05/17 11:43:29 INFO DAGScheduler:Parents of final stage: List()
17/05/17 11:43:29 INFO DAGScheduler:Missing parents: List()
17/05/17 11:43:29 INFO DAGScheduler: SubmittingResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has nomissing parents
17/05/17 11:43:29 INFO MemoryStore: Blockbroadcast_0 stored as values in memory (estimated size 1832.0 B, free 413.9 MB)
17/05/17 11:43:30 INFO MemoryStore: Blockbroadcast_0_piece0 stored as bytes in memory (estimated size 1167.0 B, free413.9 MB)
17/05/17 11:43:30 INFO BlockManagerInfo:Added broadcast_0_piece0 in memory on 192.168.27.144:43732 (size: 1167.0 B,free: 413.9 MB)
17/05/17 11:43:30 INFO SparkContext:Created broadcast 0 from broadcast at DAGScheduler.scala:996
17/05/17 11:43:30 INFO DAGScheduler:Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map atSparkPi.scala:34)
17/05/17 11:43:30 INFO TaskSchedulerImpl:Adding task set 0.0 with 10 tasks
17/05/17 11:43:30 INFO TaskSetManager:Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0,PROCESS_LOCAL, 6090 bytes)
17/05/17 11:43:30 INFO Executor: Runningtask 0.0 in stage 0.0 (TID 0)
17/05/17 11:43:30 INFO Executor: Fetchingspark://192.168.27.144:42970/jars/scopt_2.11-3.3.0.jar with timestamp1494992607195
17/05/17 11:43:30 INFOTransportClientFactory: Successfully created connection to/192.168.27.144:42970 after 129 ms (0 ms spent in bootstraps)
17/05/17 11:43:30 INFO Utils: Fetchingspark://192.168.27.144:42970/jars/scopt_2.11-3.3.0.jar to/tmp/spark-c05c16db-967b-4f7c-91bd-61358c6e8fd7/userFiles-475afa39-559a-43f1-9b42-42e4c68c0562/fetchFileTemp3940062650819619408.tmp
17/05/17 11:43:31 INFO Executor: Addingfile:/tmp/spark-c05c16db-967b-4f7c-91bd-61358c6e8fd7/userFiles-475afa39-559a-43f1-9b42-42e4c68c0562/scopt_2.11-3.3.0.jarto class loader
17/05/17 11:43:31 INFO Executor: Fetchingspark://192.168.27.144:42970/jars/spark-examples_2.11-2.1.1.jar with timestamp1494992607196
17/05/17 11:43:31 INFO Utils: Fetchingspark://192.168.27.144:42970/jars/spark-examples_2.11-2.1.1.jar to/tmp/spark-c05c16db-967b-4f7c-91bd-61358c6e8fd7/userFiles-475afa39-559a-43f1-9b42-42e4c68c0562/fetchFileTemp2400538401087766507.tmp
17/05/17 11:43:31 INFO Executor: Addingfile:/tmp/spark-c05c16db-967b-4f7c-91bd-61358c6e8fd7/userFiles-475afa39-559a-43f1-9b42-42e4c68c0562/spark-examples_2.11-2.1.1.jarto class loader
17/05/17 11:43:31 INFO Executor: Finishedtask 0.0 in stage 0.0 (TID 0). 1114 bytes result sent to driver
17/05/17 11:43:31 INFO TaskSetManager:Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1,PROCESS_LOCAL, 6090 bytes)
17/05/17 11:43:31 INFO Executor: Runningtask 1.0 in stage 0.0 (TID 1)
17/05/17 11:43:31 INFO TaskSetManager:Finished task 0.0 in stage 0.0 (TID 0) in 1594 ms on localhost (executordriver) (1/10)
17/05/17 11:43:31 INFO Executor: Finishedtask 1.0 in stage 0.0 (TID 1). 1114 bytes result sent to driver
17/05/17 11:43:31 INFO TaskSetManager:Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2,PROCESS_LOCAL, 6090 bytes)
17/05/17 11:43:31 INFO Executor: Runningtask 2.0 in stage 0.0 (TID 2)
17/05/17 11:43:31 INFO TaskSetManager:Finished task 1.0 in stage 0.0 (TID 1) in 239 ms on localhost (executor driver)(2/10)
17/05/17 11:43:32 INFO Executor: Finishedtask 2.0 in stage 0.0 (TID 2). 1041 bytes result sent to driver
17/05/17 11:43:32 INFO TaskSetManager:Starting task 3.0 in stage 0.0 (TID 3, localhost, executor driver, partition 3,PROCESS_LOCAL, 6090 bytes)
17/05/17 11:43:32 INFO Executor: Runningtask 3.0 in stage 0.0 (TID 3)
17/05/17 11:43:32 INFO TaskSetManager:Finished task 2.0 in stage 0.0 (TID 2) in 135 ms on localhost (executor driver)(3/10)
17/05/17 11:43:32 INFO Executor: Finishedtask 3.0 in stage 0.0 (TID 3). 1041 bytes result sent to driver
17/05/17 11:43:32 INFO TaskSetManager:Starting task 4.0 in stage 0.0 (TID 4, localhost, executor driver, partition 4,PROCESS_LOCAL, 6090 bytes)
17/05/17 11:43:32 INFO Executor: Runningtask 4.0 in stage 0.0 (TID 4)
17/05/17 11:43:32 INFO TaskSetManager:Finished task 3.0 in stage 0.0 (TID 3) in 133 ms on localhost (executor driver)(4/10)
17/05/17 11:43:32 INFO Executor: Finishedtask 4.0 in stage 0.0 (TID 4). 1041 bytes result sent to driver
17/05/17 11:43:32 INFO TaskSetManager:Starting task 5.0 in stage 0.0 (TID 5, localhost, executor driver, partition 5,PROCESS_LOCAL, 6090 bytes)
17/05/17 11:43:32 INFO TaskSetManager:Finished task 4.0 in stage 0.0 (TID 4) in 102 ms on localhost (executor driver)(5/10)
17/05/17 11:43:32 INFO Executor: Runningtask 5.0 in stage 0.0 (TID 5)
17/05/17 11:43:32 INFO Executor: Finishedtask 5.0 in stage 0.0 (TID 5). 1041 bytes result sent to driver
17/05/17 11:43:32 INFO TaskSetManager:Starting task 6.0 in stage 0.0 (TID 6, localhost, executor driver, partition 6,PROCESS_LOCAL, 6090 bytes)
17/05/17 11:43:32 INFO TaskSetManager:Finished task 5.0 in stage 0.0 (TID 5) in 114 ms on localhost (executor driver)(6/10)
17/05/17 11:43:32 INFO Executor: Runningtask 6.0 in stage 0.0 (TID 6)
17/05/17 11:43:32 INFO Executor: Finishedtask 6.0 in stage 0.0 (TID 6). 1114 bytes result sent to driver
17/05/17 11:43:32 INFO TaskSetManager:Starting task 7.0 in stage 0.0 (TID 7, localhost, executor driver, partition 7,PROCESS_LOCAL, 6090 bytes)
17/05/17 11:43:32 INFO TaskSetManager:Finished task 6.0 in stage 0.0 (TID 6) in 95 ms on localhost (executor driver)(7/10)
17/05/17 11:43:32 INFO Executor: Runningtask 7.0 in stage 0.0 (TID 7)
17/05/17 11:43:32 INFO Executor: Finishedtask 7.0 in stage 0.0 (TID 7). 1041 bytes result sent to driver
17/05/17 11:43:32 INFO TaskSetManager:Starting task 8.0 in stage 0.0 (TID 8, localhost, executor driver, partition 8,PROCESS_LOCAL, 6090 bytes)
17/05/17 11:43:32 INFO TaskSetManager:Finished task 7.0 in stage 0.0 (TID 7) in 117 ms on localhost (executor driver)(8/10)
17/05/17 11:43:32 INFO Executor: Runningtask 8.0 in stage 0.0 (TID 8)
17/05/17 11:43:32 INFO Executor: Finishedtask 8.0 in stage 0.0 (TID 8). 1041 bytes result sent to driver
17/05/17 11:43:32 INFO TaskSetManager:Starting task 9.0 in stage 0.0 (TID 9, localhost, executor driver, partition 9,PROCESS_LOCAL, 6090 bytes)
17/05/17 11:43:32 INFO TaskSetManager:Finished task 8.0 in stage 0.0 (TID 8) in 107 ms on localhost (executor driver)(9/10)
17/05/17 11:43:32 INFO Executor: Runningtask 9.0 in stage 0.0 (TID 9)
17/05/17 11:43:32 INFO Executor: Finishedtask 9.0 in stage 0.0 (TID 9). 1041 bytes result sent to driver
17/05/17 11:43:32 INFO TaskSetManager:Finished task 9.0 in stage 0.0 (TID 9) in 88 ms on localhost (executor driver)(10/10)
17/05/17 11:43:32 INFO TaskSchedulerImpl:Removed TaskSet 0.0, whose tasks have all completed, from pool
17/05/17 11:43:32 INFO DAGScheduler:ResultStage 0 (reduce at SparkPi.scala:38) finished in 2.589 s
17/05/17 11:43:32 INFO DAGScheduler: Job 0finished: reduce at SparkPi.scala:38, took 3.388028 s
Pi is roughly 3.1393111393111393
17/05/17 11:43:32 INFO SparkUI: StoppedSpark web UI at http://192.168.27.144:4040
17/05/17 11:43:32 INFOMapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/05/17 11:43:32 INFO MemoryStore:MemoryStore cleared
17/05/17 11:43:32 INFO BlockManager:BlockManager stopped
17/05/17 11:43:32 INFO BlockManagerMaster:BlockManagerMaster stopped
17/05/17 11:43:32 INFOOutputCommitCoordinator$OutputCommitCoordinatorEndpoint:OutputCommitCoordinator stopped!
17/05/17 11:43:32 INFO SparkContext:Successfully stopped SparkContext
17/05/17 11:43:33 INFO ShutdownHookManager:Shutdown hook called
17/05/17 11:43:33 INFO ShutdownHookManager:Deleting directory /tmp/spark-c05c16db-967b-4f7c-91bd-61358c6e8fd7
[root@hserver1 spark-2.1.1-bin-hadoop2.7]#
4.2 启动Spark Shell命令行窗口
进入到主目录,也就是执行下面的命令:
cd /opt/spark/spark-2.1.1-bin-hadoop2.7
执行命令,启动脚本:
./bin/spark-shell
如图: