I am running spark job on emr and using datastax connector to connect to cassandra cluster. I am facing issues with the guava jar please find the details as below I am using below cassandra deps
我在emr上运行spark job并使用datastax连接器连接到cassandra集群。我正面临着番石榴罐的问题,请在下面找到我在cassandra deps下面使用的详细信息
cqlsh 5.0.1 | Cassandra 3.0.1 | CQL spec 3.3.1
Running spark job on EMR 4.4 with below maven deps
使用以下maven deps在EMR 4.4上运行spark工作
org.apache.spark spark-streaming_2.10 1.5.0
org.apache.spark spark-streaming_2.10 1.5.0
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId><dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.5.0</version>
</dependency>
<artifactId>spark-streaming-kinesis-asl_2.10</artifactId>
<version>1.5.0</version>
</dependency>
facing issues when i submit spark job as below
当我提交火花工作时遇到问题如下
ava.lang.ExceptionInInitializerError
at com.datastax.spark.connector.cql.DefaultConnectionFactory$.clusterBuilder(CassandraConnectionFactory.scala:35)
at com.datastax.spark.connector.cql.DefaultConnectionFactory$.createCluster(CassandraConnectionFactory.scala:87)
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:153)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
at ampush.event.process.core.CassandraServiceManagerImpl.getAdMetaInfo(CassandraServiceManagerImpl.java:158)
at ampush.event.config.metric.processor.ScheduledEventAggregator$4.call(ScheduledEventAggregator.java:308)
at ampush.event.config.metric.processor.ScheduledEventAggregator$4.call(ScheduledEventAggregator.java:290)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:222)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:222)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Detected Guava issue #1635 which indicates that a version of Guava less than 16.01 is in use. This introduces codec resolution issues and potentially other incompatibility issues in the driver. Please upgrade to Guava 16.01 or later.
at com.datastax.driver.core.SanityChecks.checkGuava(SanityChecks.java:62)
at com.datastax.driver.core.SanityChecks.check(SanityChecks.java:36)
at com.datastax.driver.core.Cluster.<clinit>(Cluster.java:67)
... 23 more
please let me know how to manage guava deps here ?
请告诉我如何在这里管理番石榴装?
Thanks
谢谢
6 个解决方案
#1
#2
2
Just add in your POM's <dependencies>
block something like this:
只需在POM的
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>19.0</version>
</dependency>
(or any version > 16.0.1 that you prefer)
(或您喜欢的任何版本> 16.0.1)
#3
2
I've had the same problem, and resolved it by using the maven Shade plugin to shade the guava version that the Cassandra connector brings in.
我遇到了同样的问题,并通过使用maven Shade插件来解决它,以遮蔽Cassandra连接器带来的番石榴版本。
I needed to exclude the Optional, Present and Absent classes explicitly because I was running into issues with Spark trying to cast from the non-shaded Guava Present type to the shaded Optional type. I'm not sure if this will cause any problems later on, but it seems to be working for me for now.
我需要明确地排除Optional,Present和Absent类,因为我遇到了Spark尝试从非着色Guava Present类型转换为带阴影的Optional类型的问题。我不确定这是否会在以后引起任何问题,但它现在似乎对我有用。
You can add this to the <plugins>
section in your pom.xml:
您可以将其添加到pom.xml中的
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>
shade
</goal>
</goals>
</execution>
</executions>
<configuration>
<minimizeJar>true</minimizeJar>
<shadedArtifactAttached>true</shadedArtifactAttached>
<shadedClassifierName>fat</shadedClassifierName>
<relocations>
<relocation>
<pattern>com.google</pattern>
<shadedPattern>shaded.guava</shadedPattern>
<includes>
<include>com.google.**</include>
</includes>
<excludes>
<exclude>com.google.common.base.Optional</exclude>
<exclude>com.google.common.base.Absent</exclude>
<exclude>com.google.common.base.Present</exclude>
</excludes>
</relocation>
</relocations>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</plugin>
#4
0
I was able to get around this by adding the guava 16.0.1 jar externally and then specifying the class-path on Spark submit with help of below configuration values:
我能够通过外部添加guava 16.0.1 jar来解决这个问题,然后在下面的配置值的帮助下指定Spark提交的类路径:
--conf "spark.driver.extraClassPath=/guava-16.0.1.jar" --conf "spark.executor.extraClassPath=/guava-16.0.1.jar"
--conf“spark.driver.extraClassPath = / guava-16.0.1.jar”--conf“spark.executor.extraClassPath = / guava-16.0.1.jar”
Hope this helps someone with similar error !
希望这可以帮助有类似错误的人!
#5
0
Thanks Adrian for your response.
感谢Adrian的回复。
I am on a little of a different architecture than everybody else on the thread but the Guava problem is still the same. I am using spark 2.2 with mesosphere. In our development environment we use sbt-native-packager to produce our docker images to pass into mesos.
我的线程与其他人在一个不同的架构上,但番石榴问题仍然是一样的。我正在使用带有中间层的火花2.2。在我们的开发环境中,我们使用sbt-native-packager生成我们的docker镜像以传递到mesos中。
Turns out, we needed to have a different guava for the spark submit executors than we need for the code that we run on the driver. This worked for me.
事实证明,我们需要为spark提交执行程序提供不同的guava,而不是我们在驱动程序上运行的代码。这对我有用。
build.sbt
build.sbt
....
libraryDependencies ++= Seq(
"com.google.guava" % "guava" % "19.0" force(),
"org.apache.hadoop" % "hadoop-aws" % "2.7.3" excludeAll (
ExclusionRule(organization = "org.apache.hadoop", name = "hadoop-common"), //this is for s3a
ExclusionRule(organization = "com.google.guava", name= "guava" )),
"org.apache.spark" %% "spark-core" % "2.1.0" excludeAll (
ExclusionRule("org.glassfish.jersey.bundles.repackaged", name="jersey-guava"),
ExclusionRule(organization = "com.google.guava", name= "guava" )) ,
"com.github.scopt" %% "scopt" % "3.7.0" excludeAll (
ExclusionRule("org.glassfish.jersey.bundles.repackaged", name="jersey-guava"),
ExclusionRule(organization = "com.google.guava", name= "guava" )) ,
"com.datastax.spark" %% "spark-cassandra-connector" % "2.0.6",
...
dockerCommands ++= Seq(
...
Cmd("RUN rm /opt/spark/dist/jars/guava-14.0.1.jar"),
Cmd("RUN wget -q http://central.maven.org/maven2/com/google/guava/guava/23.0/guava-23.0.jar -O /opt/spark/dist/jars/guava-23.0.jar")
...
When I tried to replace guava 14 on the executors with guava 16.0.1 or 19, it still wouldn't work. Spark submit just died. My fat jar which is actually the guava that is in use for my application in the driver I forced to be 19, but my spark submit executor I had to replace to be 23. I did try replacing to 16 and 19, but spark just died there too.
当我试图用番石榴16.0.1或19替换执行器上的番石榴14时,它仍然无法工作。 Spark提交刚刚去世。我的肥胖罐实际上是我用于驱动器中的应用程序的番石榴我*19岁,但我的火花提交执行者我必须更换为23.我确实尝试更换为16和19,但火花刚刚死亡那里也是。
Sorry for diverting, but every time after all my google searches this one came up every time. I hope this helps other SBT/mesos folks too.
很抱歉转移,但每次谷歌搜索后,每次都会出现这个问题。我希望这也有助于其他SBT / mesos人。
#6
0
I was facing the the same issue while retrieving records from Cassandra table using Spark (java) on Spark submit.
我在使用Spark提交的Spark(java)从Cassandra表中检索记录时遇到了同样的问题。
Please check your guava jar version used by Hadoop and Spark in cluster using find command and change it accordingly.
请使用find命令检查Hadoop和Spark在集群中使用的guava jar版本,并相应地进行更改。
find / -name "guav*.jar"
Otherwise add guava jar externally during spark-submit for driver and executer spark.driver.extraClassPath and spark.executor.extraClassPath respectively.
否则,在spark-submit期间为驱动程序和执行程序spark.driver.extraClassPath和spark.executor.extraClassPath分别在外部添加guava jar。
spark-submit --class com.my.spark.MySparkJob --master local --conf 'spark.yarn.executor.memoryOverhead=2048' --conf 'spark.cassandra.input.consistency.level=ONE' --conf 'spark.cassandra.output.consistency.level=ONE' --conf 'spark.dynamicAllocation.enabled=false' --conf "spark.driver.extraClassPath=lib/guava-19.0.jar" --conf "spark.executor.extraClassPath=lib/guava-19.0.jar" --total-executor-cores 15 --executor-memory 15g --jars $(echo lib/*.jar | tr ' ' ',') target/my-sparkapp.jar
It's working for me. Hope you can try it.
它对我有用。希望你能尝试一下。
#1
6
Another solution, Go to directory
另一个解决方案,转到目录
spark/jars
火花/瓶
. Rename guava-14.0.1.jar
then copy guava-19.0.jar
like this picture:
。重命名guava-14.0.1.jar,然后像这张图片一样复制guava-19.0.jar:
#2
2
Just add in your POM's <dependencies>
block something like this:
只需在POM的
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>19.0</version>
</dependency>
(or any version > 16.0.1 that you prefer)
(或您喜欢的任何版本> 16.0.1)
#3
2
I've had the same problem, and resolved it by using the maven Shade plugin to shade the guava version that the Cassandra connector brings in.
我遇到了同样的问题,并通过使用maven Shade插件来解决它,以遮蔽Cassandra连接器带来的番石榴版本。
I needed to exclude the Optional, Present and Absent classes explicitly because I was running into issues with Spark trying to cast from the non-shaded Guava Present type to the shaded Optional type. I'm not sure if this will cause any problems later on, but it seems to be working for me for now.
我需要明确地排除Optional,Present和Absent类,因为我遇到了Spark尝试从非着色Guava Present类型转换为带阴影的Optional类型的问题。我不确定这是否会在以后引起任何问题,但它现在似乎对我有用。
You can add this to the <plugins>
section in your pom.xml:
您可以将其添加到pom.xml中的
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>
shade
</goal>
</goals>
</execution>
</executions>
<configuration>
<minimizeJar>true</minimizeJar>
<shadedArtifactAttached>true</shadedArtifactAttached>
<shadedClassifierName>fat</shadedClassifierName>
<relocations>
<relocation>
<pattern>com.google</pattern>
<shadedPattern>shaded.guava</shadedPattern>
<includes>
<include>com.google.**</include>
</includes>
<excludes>
<exclude>com.google.common.base.Optional</exclude>
<exclude>com.google.common.base.Absent</exclude>
<exclude>com.google.common.base.Present</exclude>
</excludes>
</relocation>
</relocations>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</plugin>
#4
0
I was able to get around this by adding the guava 16.0.1 jar externally and then specifying the class-path on Spark submit with help of below configuration values:
我能够通过外部添加guava 16.0.1 jar来解决这个问题,然后在下面的配置值的帮助下指定Spark提交的类路径:
--conf "spark.driver.extraClassPath=/guava-16.0.1.jar" --conf "spark.executor.extraClassPath=/guava-16.0.1.jar"
--conf“spark.driver.extraClassPath = / guava-16.0.1.jar”--conf“spark.executor.extraClassPath = / guava-16.0.1.jar”
Hope this helps someone with similar error !
希望这可以帮助有类似错误的人!
#5
0
Thanks Adrian for your response.
感谢Adrian的回复。
I am on a little of a different architecture than everybody else on the thread but the Guava problem is still the same. I am using spark 2.2 with mesosphere. In our development environment we use sbt-native-packager to produce our docker images to pass into mesos.
我的线程与其他人在一个不同的架构上,但番石榴问题仍然是一样的。我正在使用带有中间层的火花2.2。在我们的开发环境中,我们使用sbt-native-packager生成我们的docker镜像以传递到mesos中。
Turns out, we needed to have a different guava for the spark submit executors than we need for the code that we run on the driver. This worked for me.
事实证明,我们需要为spark提交执行程序提供不同的guava,而不是我们在驱动程序上运行的代码。这对我有用。
build.sbt
build.sbt
....
libraryDependencies ++= Seq(
"com.google.guava" % "guava" % "19.0" force(),
"org.apache.hadoop" % "hadoop-aws" % "2.7.3" excludeAll (
ExclusionRule(organization = "org.apache.hadoop", name = "hadoop-common"), //this is for s3a
ExclusionRule(organization = "com.google.guava", name= "guava" )),
"org.apache.spark" %% "spark-core" % "2.1.0" excludeAll (
ExclusionRule("org.glassfish.jersey.bundles.repackaged", name="jersey-guava"),
ExclusionRule(organization = "com.google.guava", name= "guava" )) ,
"com.github.scopt" %% "scopt" % "3.7.0" excludeAll (
ExclusionRule("org.glassfish.jersey.bundles.repackaged", name="jersey-guava"),
ExclusionRule(organization = "com.google.guava", name= "guava" )) ,
"com.datastax.spark" %% "spark-cassandra-connector" % "2.0.6",
...
dockerCommands ++= Seq(
...
Cmd("RUN rm /opt/spark/dist/jars/guava-14.0.1.jar"),
Cmd("RUN wget -q http://central.maven.org/maven2/com/google/guava/guava/23.0/guava-23.0.jar -O /opt/spark/dist/jars/guava-23.0.jar")
...
When I tried to replace guava 14 on the executors with guava 16.0.1 or 19, it still wouldn't work. Spark submit just died. My fat jar which is actually the guava that is in use for my application in the driver I forced to be 19, but my spark submit executor I had to replace to be 23. I did try replacing to 16 and 19, but spark just died there too.
当我试图用番石榴16.0.1或19替换执行器上的番石榴14时,它仍然无法工作。 Spark提交刚刚去世。我的肥胖罐实际上是我用于驱动器中的应用程序的番石榴我*19岁,但我的火花提交执行者我必须更换为23.我确实尝试更换为16和19,但火花刚刚死亡那里也是。
Sorry for diverting, but every time after all my google searches this one came up every time. I hope this helps other SBT/mesos folks too.
很抱歉转移,但每次谷歌搜索后,每次都会出现这个问题。我希望这也有助于其他SBT / mesos人。
#6
0
I was facing the the same issue while retrieving records from Cassandra table using Spark (java) on Spark submit.
我在使用Spark提交的Spark(java)从Cassandra表中检索记录时遇到了同样的问题。
Please check your guava jar version used by Hadoop and Spark in cluster using find command and change it accordingly.
请使用find命令检查Hadoop和Spark在集群中使用的guava jar版本,并相应地进行更改。
find / -name "guav*.jar"
Otherwise add guava jar externally during spark-submit for driver and executer spark.driver.extraClassPath and spark.executor.extraClassPath respectively.
否则,在spark-submit期间为驱动程序和执行程序spark.driver.extraClassPath和spark.executor.extraClassPath分别在外部添加guava jar。
spark-submit --class com.my.spark.MySparkJob --master local --conf 'spark.yarn.executor.memoryOverhead=2048' --conf 'spark.cassandra.input.consistency.level=ONE' --conf 'spark.cassandra.output.consistency.level=ONE' --conf 'spark.dynamicAllocation.enabled=false' --conf "spark.driver.extraClassPath=lib/guava-19.0.jar" --conf "spark.executor.extraClassPath=lib/guava-19.0.jar" --total-executor-cores 15 --executor-memory 15g --jars $(echo lib/*.jar | tr ' ' ',') target/my-sparkapp.jar
It's working for me. Hope you can try it.
它对我有用。希望你能尝试一下。