I want to create my first scala program using the scala example HBaseTest2.scala
, provided in Sparkd 1.4.1. The goal is to connect to HBase and do some basic stuff, such as counting rows or scan rows. However, when I tried to execute the program, I got an error. It seems that Spark couldn't find the class HBaseConfiguration
. Assuming the we're located a the root path of my project HBaseTest2 /usr/local/Cellar/spark/programs/HBaseTest2
. Here are some details for the exception :
我想使用scala示例HBaseTest2创建我的第一个scala程序。scala,在Sparkd 1.4.1中提供。目标是连接到HBase并做一些基本的事情,比如计算行数或扫描行数。然而,当我试图执行程序时,我得到了一个错误。似乎Spark没有找到类HBaseConfiguration。假设我们位于项目HBaseTest2 /usr/ localar / cellar / spark/programs/hbasetest2的根路径。以下是例外的一些细节:
/ src / main / scala /com/orange/spark/examples/HBaseTest2.scala
package com.orange.spark.examples
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor, TableName}
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.spark._
object HBaseTest2 {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("HBaseTest2")
val sc = new SparkContext(sparkConf)
val tableName = "personal-cloud-test"
// please ensure HBASE_CONF_DIR is on classpath of spark driver
// e.g: set it through spark.driver.extraClassPath property
// in spark-defaults.conf or through --driver-class-path
// command line option of spark-submit
val conf = HBaseConfiguration.create()
// Other options for configuring scan behavior are available. More information available at
// http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html
conf.set(TableInputFormat.INPUT_TABLE, tableName)
// Initialize hBase table if necessary
val admin = new HBaseAdmin(conf)
if (!admin.isTableAvailable(tableName)) {
val tableDesc = new HTableDescriptor(TableName.valueOf(tableName))
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
I've added dependencies in this file to ensure all classes called are included in the jar file.
name := "HBaseTest2"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.1"
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-core" % "1.2.1",
"org.apache.hbase" % "hbase" % "",
"org.apache.hbase" % "hbase-client" % "",
"org.apache.hbase" % "hbase-common" % "",
"org.apache.hbase" % "hbase-server" % ""
Run application
MacBook-Pro-de-Mincong:spark-1.4.1 minconghuang$ bin/spark-submit \
--class "com.orange.spark.examples.HBaseTest2" \
--master local[4] \
15/08/18 12:06:17 INFO storage.BlockManagerMaster: Registered BlockManager
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
at com.orange.spark.examples.HBaseTest2$.main(HBaseTest2.scala:21)
at com.orange.spark.examples.HBaseTest2.main(HBaseTest2.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 11 more
15/08/18 12:06:17 INFO spark.SparkContext: Invoking stop() from shutdown hook
The problem might come from the HBase configuration as mentioned in HBaseTest2.scala
line 16 :
这个问题可能来自HBaseTest2中提到的HBase配置。scala 16行:
// please ensure HBASE_CONF_DIR is on classpath of spark driver
// e.g: set it through spark.driver.extraClassPath property
// in spark-defaults.conf or through --driver-class-path
// command line option of spark-submit//请确保HBASE_CONF_DIR在spark驱动程序// e的类路径中。司机:把它放在火花区。extraClassPath属性//在sparkdefaults -default中。conf或through——驱动程序类路径/命令行选项的spark提交
But I don't know how to configure it... I've added the HBASE_CONF_DIR
in my command line. The CLASSPATH is now /usr/local/Cellar/hadoop/hbase-
. Nothing happened... T_T So what should I do to get this fixed ? I can add/delete details if needed. Thanks a lot !!
但是我不知道如何配置它…我已经将HBASE_CONF_DIR添加到命令行中的类路径中。类路径现在是/usr/local/ cellar /hadoop/hbase-。什么也没发生……T_T那么我该怎么做才能把它固定下来呢?如果需要,我可以添加/删除细节。非常感谢! !
2 个解决方案
Have you tried
sparkConf.set("spark.driver.extraClassPath", "/usr/local/Cellar/hadoop/hbase-")
The problem came from class-path-setting as mentioned in HBaseTest2.scala line 33 :
问题来自于HBaseTest2中提到的类路径设置。scala 33行:
// please ensure HBASE_CONF_DIR is on classpath of spark driver
// e.g: set it through spark.driver.extraClassPath property
// in spark-defaults.conf or through --driver-class-path
// command line option of spark-submit//请确保HBASE_CONF_DIR在spark驱动程序// e的类路径中。司机:把它放在火花区。extraClassPath属性//在sparkdefaults -default中。conf或through——驱动程序类路径/命令行选项的spark提交
As I'm using a MAC OS X, setting is different from Linux. When I tried echo $CLASSPATH
, it returned empty. It seems that Mac doesn't use the CLASSPATH to do the driver job. So I need to add all jar files through spark.driver.extraClassPath in spark-defaults.conf file. My collegue did the same way in Linux. I think there's a better way to handle it elegantly, but we didn't find out. Please share if you know the answer. Thanks.
当我使用MAC OS X时,设置与Linux不同。当我尝试echo $CLASSPATH时,它返回为空。似乎Mac没有使用类路径来完成驱动工作。所以我需要通过spark.driver添加所有jar文件。extraClassPath spark-defaults。conf文件。我的同事在Linux上也是如此。我认为有更好的方法来优雅地处理它,但是我们没有发现。如果你知道答案,请分享。谢谢。
Mac / Linux
add all external jars in conf/spark-defaults.conf
Mac / Linux将所有外部jar添加到conf/spark-default .conf中。
spark.driver.extraClassPath /path/to/a.jar:/path/to/b.jar:/path/to/c.jar
Have you tried
sparkConf.set("spark.driver.extraClassPath", "/usr/local/Cellar/hadoop/hbase-")
The problem came from class-path-setting as mentioned in HBaseTest2.scala line 33 :
问题来自于HBaseTest2中提到的类路径设置。scala 33行:
// please ensure HBASE_CONF_DIR is on classpath of spark driver
// e.g: set it through spark.driver.extraClassPath property
// in spark-defaults.conf or through --driver-class-path
// command line option of spark-submit//请确保HBASE_CONF_DIR在spark驱动程序// e的类路径中。司机:把它放在火花区。extraClassPath属性//在sparkdefaults -default中。conf或through——驱动程序类路径/命令行选项的spark提交
As I'm using a MAC OS X, setting is different from Linux. When I tried echo $CLASSPATH
, it returned empty. It seems that Mac doesn't use the CLASSPATH to do the driver job. So I need to add all jar files through spark.driver.extraClassPath in spark-defaults.conf file. My collegue did the same way in Linux. I think there's a better way to handle it elegantly, but we didn't find out. Please share if you know the answer. Thanks.
当我使用MAC OS X时,设置与Linux不同。当我尝试echo $CLASSPATH时,它返回为空。似乎Mac没有使用类路径来完成驱动工作。所以我需要通过spark.driver添加所有jar文件。extraClassPath spark-defaults。conf文件。我的同事在Linux上也是如此。我认为有更好的方法来优雅地处理它,但是我们没有发现。如果你知道答案,请分享。谢谢。
Mac / Linux
add all external jars in conf/spark-defaults.conf
Mac / Linux将所有外部jar添加到conf/spark-default .conf中。
spark.driver.extraClassPath /path/to/a.jar:/path/to/b.jar:/path/to/c.jar