Docker 搭建Spark 依赖sequenceiq/spark:1.6镜像

时间:2021-06-14 21:40:22

使用Docker-Hub中Spark排行最高的sequenceiq/spark:1.6.0。

操作:

拉取镜像:

[root@localhost home]# docker pull sequenceiq/spark:1.6.
Trying to pull repository docker.io/sequenceiq/spark ...

启动容器:

[root@localhost home]# docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/sequenceiq/spark 1.6. 40a687b3cdcc years ago 2.88 GB
docker.io/sequenceiq/hadoop-docker 2.6. 140b265bd62a years ago 1.62 GB
[root@localhost home]# docker run -dit -p : -p : -p : -h sandbox sequenceiq/spark:1.6. bash

进入容器内部:

[root@localhost home]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
75e3d67806bc sequenceiq/spark:1.6. "/etc/bootstrap.sh..." seconds ago Up seconds /tcp, -/tcp, 0.0.0.0:->/tcp, 0.0.0.0:->/tcp, /tcp, /tcp, /tcp, /tcp, /tcp, /tcp, /tcp, 0.0.0.0:->/tcp thirsty_gates
[root@localhost home]# docker exec -it /bin/bash

Spark:

YARN-client(单机)模式

在YARN-client模式中,驱动程序在客户机进程中运行,应用程序master仅用于请求来自yarn的资源。

bash-4.1# spark-shell --master yarn-client --driver-memory 1g --executor-memory 1g --executor-cores
// :: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
// :: INFO spark.SecurityManager: Changing view acls to: root
// :: INFO spark.SecurityManager: Changing modify acls to: root
// :: INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
// :: INFO spark.HttpServer: Starting HTTP Server
// :: INFO server.Server: jetty-.y.z-SNAPSHOT
// :: INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:
// :: INFO util.Utils: Successfully started service 'HTTP class server' on port .
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.
/_/ Using Scala version 2.10. (Java HotSpot(TM) -Bit Server VM, Java 1.7.0_51)
Type in expressions to have them evaluated.
Type :help for more information.
// :: INFO spark.SparkContext: Running Spark version 1.6.
// :: INFO spark.SecurityManager: Changing view acls to: root
// :: INFO spark.SecurityManager: Changing modify acls to: root
// :: INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
// :: INFO util.Utils: Successfully started service 'sparkDriver' on port .
// :: INFO slf4j.Slf4jLogger: Slf4jLogger started
// :: INFO Remoting: Starting remoting
// :: INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@172.17.0.2:32811]
// :: INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port .
// :: INFO spark.SparkEnv: Registering MapOutputTracker
// :: INFO spark.SparkEnv: Registering BlockManagerMaster
// :: INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-8c30cc1c-dfea-4ebf-94b9-c45ff3a1b849
// :: INFO storage.MemoryStore: MemoryStore started with capacity 517.4 MB
// :: INFO spark.SparkEnv: Registering OutputCommitCoordinator
// :: INFO server.Server: jetty-.y.z-SNAPSHOT
// :: INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:
// :: INFO util.Utils: Successfully started service 'SparkUI' on port .
// :: INFO ui.SparkUI: Started SparkUI at http://172.17.0.2:4040
// :: INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:
// :: INFO yarn.Client: Requesting a new application from cluster with NodeManagers
// :: INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster ( MB per container)
// :: INFO yarn.Client: Will allocate AM container, with MB memory including MB overhead
// :: INFO yarn.Client: Setting up container launch context for our AM
// :: INFO yarn.Client: Setting up the launch environment for our AM container
// :: INFO yarn.Client: Preparing resources for our AM container
// :: WARN yarn.Client: Failed to cleanup staging dir .sparkStaging/application_1534228565880_0001
java.net.ConnectException: Call From sandbox/172.17.0.2 to sandbox: failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:)
at java.lang.reflect.Constructor.newInstance(Constructor.java:)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:)
at org.apache.hadoop.ipc.Client.call(Client.java:)
at org.apache.hadoop.ipc.Client.call(Client.java:)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:)
at com.sun.proxy.$Proxy21.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:)
at java.lang.reflect.Method.invoke(Method.java:)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:)
at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:)
at org.apache.hadoop.hdfs.DistributedFileSystem$.doCall(DistributedFileSystem.java:)
at org.apache.hadoop.hdfs.DistributedFileSystem$.doCall(DistributedFileSystem.java:)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:)
at org.apache.spark.deploy.yarn.Client.cleanupStagingDir(Client.scala:)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:)
at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:)
at $line3.$read$$iwC$$iwC.<init>(<console>:)
at $line3.$read$$iwC.<init>(<console>:)
at $line3.$read.<init>(<console>:)
at $line3.$read$.<init>(<console>:)
at $line3.$read$.<clinit>(<console>)
at $line3.$eval$.<init>(<console>:)
at $line3.$eval$.<clinit>(<console>)
at $line3.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:)
at java.lang.reflect.Method.invoke(Method.java:)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$(SparkIMain.scala:)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:)
at org.apache.spark.repl.SparkILoop.reallyInterpret$(SparkILoop.scala:)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$.apply(SparkILoopInit.scala:)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$.apply(SparkILoopInit.scala:)
at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$$$anonfun$apply$mcZ$sp$.apply$mcV$sp(SparkILoop.scala:)
at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:)
at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$.apply$mcZ$sp(SparkILoop.scala:)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$.apply(SparkILoop.scala:)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$.apply(SparkILoop.scala:)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:)
at org.apache.spark.repl.Main$.main(Main.scala:)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:)
at java.lang.reflect.Method.invoke(Method.java:)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$(SparkSubmit.scala:)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:)
at org.apache.hadoop.ipc.Client$Connection.access$(Client.java:)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:)
at org.apache.hadoop.ipc.Client.call(Client.java:)
... more
// :: ERROR spark.SparkContext: Error initializing SparkContext.

特么的,居然失败,尝试了几次都不行,也不知道网上其他人怎么搞的,一样的操作。

将git里面的docker拉去下来,重新docker build,还是报错,错误信息:

DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it. safemode: Call From 70b4a57bb473/172.17.0.2 to 70b4a57bb473: failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

这明显是hadoop里面的safemode未关闭,但是dockerfile里面已经操作命令关闭了,不知道哪里出问题了。

参考:

https://www.jianshu.com/p/4801bb7ab9e0

https://www.cnblogs.com/ybst/p/9050660.html

https://github.com/sequenceiq/docker-spark

https://blog.csdn.net/farawayzheng_necas/article/details/54341036

https://blog.csdn.net/yeasy/article/details/48654965

https://blog.csdn.net/hanss2/article/details/78505446

http://wgliang.github.io/pages/spark-on-docker.html