I've been trying to develop a Spark program using the Apache Spark Framework.
I want to instantiate HiveContext
without any clusters.
Is it possible to use HiveContext
and run it locally via the Eclipse Scala IDE without using any cluster?
我一直在尝试使用Apache Spark Framework开发Spark程序。我想在没有任何集群的情况下实例化HiveContext。是否可以使用HiveContext并在不使用任何集群的情况下通过Eclipse Scala IDE在本地运行它?
1 个解决方案
#1
0
Simply is it possible? Sure... (emphasis added)
有可能吗?当然......(重点补充)
To use a
HiveContext
, you do not need to have an existing Hive setup, and all of the data sources available to aSQLContext
are still available.要使用HiveContext,您不需要现有的Hive设置,并且SQLContext可用的所有数据源仍然可用。
However, you need to compile some additional code.
但是,您需要编译一些其他代码。
HiveContext
is only packaged separately to avoid including all of Hive’s dependencies in the default Spark build. If these dependencies are not a problem for your application then usingHiveContext
is recommendedHiveContext仅单独打包,以避免在默认的Spark构建中包含所有Hive的依赖项。如果这些依赖项对您的应用程序不是问题,那么建议使用HiveContext
But, if you are just writing Spark without any cluster, there is nothing holding you to Spark 1.x, and you should instead be using Spark 2.x which has a SparkSession
as the entrypoint for SQL-related things.
但是,如果你只是在没有任何集群的情况下编写Spark,那么没有任何东西可以阻止你使用Spark 1.x,你应该使用Spark 2.x,它有一个SparkSession作为SQL相关事物的入口点。
Eclipse IDE shouldn't matter. You could also use IntelliJ... or no IDE and spark-submit
any JAR file containing some Spark code...
Eclipse IDE无关紧要。你也可以使用IntelliJ ...或者没有IDE和spark-submit包含一些Spark代码的任何JAR文件......
#1
0
Simply is it possible? Sure... (emphasis added)
有可能吗?当然......(重点补充)
To use a
HiveContext
, you do not need to have an existing Hive setup, and all of the data sources available to aSQLContext
are still available.要使用HiveContext,您不需要现有的Hive设置,并且SQLContext可用的所有数据源仍然可用。
However, you need to compile some additional code.
但是,您需要编译一些其他代码。
HiveContext
is only packaged separately to avoid including all of Hive’s dependencies in the default Spark build. If these dependencies are not a problem for your application then usingHiveContext
is recommendedHiveContext仅单独打包,以避免在默认的Spark构建中包含所有Hive的依赖项。如果这些依赖项对您的应用程序不是问题,那么建议使用HiveContext
But, if you are just writing Spark without any cluster, there is nothing holding you to Spark 1.x, and you should instead be using Spark 2.x which has a SparkSession
as the entrypoint for SQL-related things.
但是,如果你只是在没有任何集群的情况下编写Spark,那么没有任何东西可以阻止你使用Spark 1.x,你应该使用Spark 2.x,它有一个SparkSession作为SQL相关事物的入口点。
Eclipse IDE shouldn't matter. You could also use IntelliJ... or no IDE and spark-submit
any JAR file containing some Spark code...
Eclipse IDE无关紧要。你也可以使用IntelliJ ...或者没有IDE和spark-submit包含一些Spark代码的任何JAR文件......