1.环境配置
hadoop-2.2.0
spark-1.1.0
maven-3.3.9
2.spark安装问题
hadoop2.2.x版本支持spark1.1.0版本以及更低。而在目前spark官网上已经下载不到spark1.1.0版本(目前已经发布了spark2.0版本)。为了在机器上使用spark,选择了csdn上下载了spark1.1.0的源码,自己尝试编译。
编译的方法有两种:sbt以maven。在使用自带的sbt方法时出现了问题,顾使用maven来进行编译。
3.maven的安装配置
1.去官网下载maven,地址:
http://apache.fayea.com/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
2.在secureCRT中使用rz命令上传安装包。解压到当前路径
tar -zxvf apache-maven-3.3.9-bin.tar.gz
3.配置环境变量
vim /etc/profile
末行添加
export MAVEN_HOME=/root/shumi/app/apache-maven-3.3.9
export PATH = $MAVEN_HOME/bin
4.更新
source /etc/profile
4.使用maven编译spark-1.1.0
这篇英文文档有很详细的操作步骤。
方法为:
1.声明缓冲区大小,以防编译时因缓存区太小出错
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
2.进入spark源码解压路径,进行maven编译,如果失败再重新进行一次。提示:编译时间较长,需要下载较多文件。在下是两次才编译成功,耗时两个小时以上。
mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package
当编译完成后出现以下提示代表编译成功。
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [31:01 min]
[INFO] Spark Project Core ................................. SUCCESS [34:54 min]
[INFO] Spark Project Bagel ................................ SUCCESS [ 24.186 s]
[INFO] Spark Project GraphX ............................... SUCCESS [05:28 min]
[INFO] Spark Project Streaming ............................ SUCCESS [01:23 min]
[INFO] Spark Project ML Library ........................... SUCCESS [07:17 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 14.561 s]
[INFO] Spark Project Catalyst ............................. SUCCESS [01:53 min]
[INFO] Spark Project SQL .................................. SUCCESS [03:42 min]
[INFO] Spark Project Hive ................................. SUCCESS [10:55 min]
[INFO] Spark Project REPL ................................. SUCCESS [ 47.498 s]
[INFO] Spark Project YARN Parent POM ...................... SUCCESS [ 54.200 s]
[INFO] Spark Project YARN Stable API ...................... SUCCESS [ 38.566 s]
[INFO] Spark Project Assembly ............................. SUCCESS [01:49 min]
[INFO] Spark Project External Twitter ..................... SUCCESS [ 38.634 s]
[INFO] Spark Project External Kafka ....................... SUCCESS [01:56 min]
[INFO] Spark Project External Flume Sink .................. SUCCESS [02:59 min]
[INFO] Spark Project External Flume ....................... SUCCESS [ 33.139 s]
[INFO] Spark Project External ZeroMQ ...................... SUCCESS [01:03 min]
[INFO] Spark Project External MQTT ........................ SUCCESS [ 33.530 s]
[INFO] Spark Project Examples ............................. SUCCESS [06:50 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:56 h
[INFO] Finished at: 2016-08-16T12:13:28+08:00
[INFO] Final Memory: 67M/1052M
[INFO] ------------------------------------------------------------------------
3.测试
配置好spark的环境变量,同maven在/etc/profile中修改。随后在任意路径下输入:
spark-shell --master local[2]
有:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/08/16 12:23:12 INFO SecurityManager: Changing view acls to: root,
16/08/16 12:23:12 INFO SecurityManager: Changing modify acls to: root,
16/08/16 12:23:12 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, ); users with modify permissions: Set(root, )
16/08/16 12:23:12 INFO HttpServer: Starting HTTP Server
16/08/16 12:23:12 INFO Utils: Successfully started service 'HTTP class server' on port 54751.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.1.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_79)
Type in expressions to have them evaluated.
Type :help for more information.
16/08/16 12:23:17 INFO SecurityManager: Changing view acls to: root,
16/08/16 12:23:17 INFO SecurityManager: Changing modify acls to: root,
16/08/16 12:23:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, ); users with modify permissions: Set(root, )
16/08/16 12:23:17 INFO Slf4jLogger: Slf4jLogger started
16/08/16 12:23:17 INFO Remoting: Starting remoting
16/08/16 12:23:17 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@ClouderaManager:34053]
16/08/16 12:23:17 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@ClouderaManager:34053]
16/08/16 12:23:17 INFO Utils: Successfully started service 'sparkDriver' on port 34053.
16/08/16 12:23:17 INFO SparkEnv: Registering MapOutputTracker
16/08/16 12:23:17 INFO SparkEnv: Registering BlockManagerMaster
16/08/16 12:23:17 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20160816122317-8f76
16/08/16 12:23:17 INFO Utils: Successfully started service 'Connection manager for block manager' on port 36934.
16/08/16 12:23:17 INFO ConnectionManager: Bound socket to port 36934 with id = ConnectionManagerId(ClouderaManager,36934)
16/08/16 12:23:17 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
16/08/16 12:23:17 INFO BlockManagerMaster: Trying to register BlockManager
16/08/16 12:23:17 INFO BlockManagerMasterActor: Registering block manager ClouderaManager:36934 with 265.4 MB RAM
16/08/16 12:23:17 INFO BlockManagerMaster: Registered BlockManager
16/08/16 12:23:18 INFO HttpFileServer: HTTP File server directory is /tmp/spark-323bacc0-6447-4a52-9f4f-52fa271ea365
16/08/16 12:23:18 INFO HttpServer: Starting HTTP Server
16/08/16 12:23:18 INFO Utils: Successfully started service 'HTTP file server' on port 56937.
16/08/16 12:23:18 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/08/16 12:23:18 INFO SparkUI: Started SparkUI at http://ClouderaManager:4040
16/08/16 12:23:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/16 12:23:18 INFO Executor: Using REPL class URI: http://192.168.10.206:54751
16/08/16 12:23:18 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@ClouderaManager:34053/user/HeartbeatReceiver
16/08/16 12:23:18 INFO SparkILoop: Created spark context..
Spark context available as sc.
scala>
表示安装成功。