SAP Vora 1.2 -阅读来自HANA的Vora表

时间:2021-04-16 16:53:04

!!! UPDATE !!!

! ! !更新! ! !

Finally after hours of looking into documentation I found the issue. It turns out that I lacked some parameters in Yarn configuration.

经过数小时的文档调查,我终于找到了问题所在。原来我在纱线配置上缺少一些参数。

This is what I did:

这就是我所做的:

  1. Open the yarn-site.xml file in an editor or log in to Ambari web UI and select Yarn>Config. Locate the property "yarn.nodemanager.aux-services" and add "spark_shuffle" to its current value. The new property name should be "mapreduce_shuffle,spark_shuffle".
  2. 打开yarn-site。编辑器中的xml文件或登录到Ambari web UI并选择纱线>配置。“yarn.nodemanager定位属性。并将“spark_shuffle”添加到当前值。这个新的属性名称应该是“mapreduce_shuffle,spark_shuffle”。
  3. Add or edit the property "yarn.nodemanager.aux-services.spark_shuffle.class", and set it to "org.apache.spark.network.yarn.YarnShuffleService".
  4. 添加或编辑属性“yarn.nodemanager.aux-services.spark_shuffle”。将其设置为“org.apache.spark.network.yarnshuffleservice”。
  5. Copy the spark--yarn-shuffle.jar file (downloaded in the step Install Spark Assembly Files and Dependent Libraries) from Spark to Hadoop-Yarn class path in all the node manager hosts. Typically this folder is located in /usr/hdp//hadoop-yarn/lib.
  6. 火花——yarn-shuffle副本。jar文件(在步骤中下载安装Spark组装文件和相关库)从Spark到hadoop线程类路径,在所有节点管理器主机中。通常这个文件夹位于/usr/hdp//hadoop-yarn/lib中。
  7. Restart Yarn and the node manager
  8. 重新启动纱线和节点管理器

!!!!!!!!!!!

! ! ! ! ! ! ! ! ! ! !

I'm using SAP Vora 1.2 Developer Edition with newest Spark Controller (HANASPARKCTRL00P_5-70001262.RPM). I loaded a table into Vora in spark-shell. I can see the table in SAP HANA Studio in "spark_velocity" folder. I can load the table as Virtual Table. The problem is that I cannot select, or preview the data in the table, because of the error:

我正在使用SAP Vora 1.2最新的Spark控制器开发版本(HANASPARKCTRL00P_5-70001262.RPM)。我把一张桌子装进沃拉的亮壳里。我可以在SAP HANA Studio的“spark_velocity”文件夹中看到这个表。我可以将表作为虚拟表加载。问题是由于错误,我无法选择或预览表格中的数据:

Error: SAP DBTech JDBC: [403]: internal error: Error opening the cursor for the remote database for query "SELECT "SPARK_testtable"."a1", "SPARK_testtable"."a2", "SPARK_testtable"."a3" FROM "spark_velocity"."testtable" "SPARK_testtable" LIMIT 200 "

错误:SAP DBTech: JDBC[403]:内部错误:为查询“选择”SPARK_testtable”打开远程数据库的光标时出错。“a1”、“SPARK_testtable”。“a2”、“SPARK_testtable”。从“a3 spark_velocity”。“测试表"SPARK_testtable" LIMIT 200 "

Here is my hanaes-site.xml file:

这是我的hanaes-site。xml文件:

<configuration>
    <!--  You can either copy the assembly jar into HDFS or to lib/external directory.
    Please maintain appropriate value here-->
    <property>
        <name>sap.hana.es.spark.yarn.jar</name>
        <value>file:///usr/sap/spark/controller/lib/external/spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar</value>
        <final>true</final>
    </property>
    <property>
        <name>sap.hana.es.server.port</name>
        <value>7860</value>
        <final>true</final>
    </property>
    <!--  Required if you are copying your files into HDFS-->
     <property>
         <name>sap.hana.es.lib.location</name>
         <value>hdfs:///sap/hana/spark/libs/thirdparty/</value>
         <final>true</final>
     </property>
     -->
    <!--Required property if using controller for DLM scenarios-->
    <!--
    <property>
        <name>sap.hana.es.warehouse.dir</name>
        <value>/sap/hana/hanaes/warehouse</value>
        <final>true</final>
    </property>
-->
    <property>
        <name>sap.hana.es.driver.host</name>
        <value>ip-10-0-0-[censored].ec2.internal</value>
        <final>true</final>
    </property>
    <!-- Change this value to vora when connecting to Vora store -->
    <property>
        <name>sap.hana.hadoop.datastore</name>
        <value>vora</value>
        <final>true</final>
    </property>

    <!-- // When running against a kerberos protected cluster, please maintain appropriate values
    <property>
        <name>spark.yarn.keytab</name>
        <value>/usr/sap/spark/controller/conf/hanaes.keytab</value>
        <final>true</final>
    </property>
    <property>
        <name>spark.yarn.principal</name>
        <value>hanaes@PAL.SAP.CORP</value>
        <final>true</final>
    </property>
-->
    <!-- To enable Secure Socket communication, please maintain appropriate values in the follwing section-->
    <property>
        <name>sap.hana.es.ssl.keystore</name>
        <value></value>
        <final>false</final>
    </property>
    <property>
        <name>sap.hana.es.ssl.clientauth.required</name>
        <value>true</value>
        <final>true</final>
    </property>
    <property>
        <name>sap.hana.es.ssl.verify.hostname</name>
        <value>true</value>
        <final>true</final>
    </property>
    <property>
        <name>sap.hana.es.ssl.keystore.password</name>
        <value></value>
        <final>true</final>
    </property>
    <property>
        <name>sap.hana.es.ssl.truststore</name>
        <value></value>
        <final>true</final>
    </property>
    <property>
        <name>sap.hana.es.ssl.truststore.password</name>
        <value></value>
        <final>true</final>
    </property>
    <property>
        <name>sap.hana.es.ssl.enabled</name>
        <value>false</value>
        <final>true</final>
    </property>

    <property>
        <name>spark.executor.instances</name>
        <value>10</value>
        <final>true</final>
    </property>
    <property>
        <name>spark.executor.memory</name>
        <value>5g</value>
        <final>true</final>
    </property>
    <!-- Enable the following section if you want to enable dynamic allocation-->
    <!--
    <property>
        <name>spark.dynamicAllocation.enabled</name>
        <value>true</value>
        <final>true</final>
    </property>

    <property>
        <name>spark.dynamicAllocation.minExecutors</name>
        <value>10</value>
        <final>true</final>
    </property>
    <property>
        <name>spark.dynamicAllocation.maxExecutors</name>
        <value>20</value>
        <final>true</final>
    </property>
    <property>
    <name>spark.shuffle.service.enabled</name>
    <value>true</value>
    <final>true</final>
   </property>
<property>
         <name>sap.hana.ar.provider</name>
         <value>com.sap.hana.aws.extensions.AWSResolver</value>
         <final>true</final>
     </property>
<property>
        <name>spark.vora.hosts</name>
        <value>ip-10-0-0-[censored].ec2.internal:2022,ip-10-0-0-[censored].ec2.internal:2022,ip-10-0-0-[censored].ec2.internal:2022</value>
        <final>true</final>
     </property>
     <property>
        <name>spark.vora.zkurls</name>
        <value>ip-10-0-0-[censored].ec2.internal:2181,ip-10-0-0-[censored].ec2.internal:2181,ip-10-0-0-[censored].ec2.internal:2181</value>
        <final>true</final>
     </property>
</configuration>

ls /usr/sap/spark/controller/lib/external/

ls /usr/sap/spark/controller/lib/external/

spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar

hdfs dfs -ls /sap/hana/spark/libs/thirdparty

hdfs dfs - ls / sap hana /火花/ libs /第三方

Found 4 items
-rwxrwxrwx   3 hdfs hdfs     366565 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/datanucleus-api-jdo-4.2.1.jar
-rwxrwxrwx   3 hdfs hdfs    2006182 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/datanucleus-core-4.1.2.jar
-rwxrwxrwx   3 hdfs hdfs    1863315 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/datanucleus-rdbms-4.1.2.jar
-rwxrwxrwx   3 hdfs hdfs     627814 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/joda-time-2.9.3.jar

ls /usr/hdp/

ls /usr/hdp/

2.3.4.0-3485  2.3.4.7-4  current

vi /var/log/hanaes/hana_controller.log

vi /var/log/hanaes/hana_controller.log

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/sap/spark/controller/lib/spark-sap-datasources-1.2.33-assembly.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/sap/spark/controller/lib/external/spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.4.0-3485/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/05/12 07:02:38 INFO HanaESConfig: Loaded HANA Extended Store Configuration
Found Spark Libraries. Proceeding with Current Class Path
16/05/12 07:02:39 INFO Server: Starting Spark Controller
16/05/12 07:03:11 INFO CommandRouter: Connecting to Vora Engine
16/05/12 07:03:11 INFO CommandRouter: Initialized Router
16/05/12 07:03:11 INFO CommandRouter: Server started
16/05/12 07:03:43 INFO CommandHandler: Getting BROWSE data/user/17401406272892502037-4985062628452729323_f17e36cf-0003-0015-452e-800c700001ee
16/05/12 07:03:48 INFO CommandHandler: Getting BROWSE data/user/17401406272892502037-4985062628452729329_f17e36cf-0003-0015-452e-800c700001f4
16/05/12 07:03:48 INFO VoraClientFactory: returning a Vora catalog client of this Vora catalog server: master.i-14371789.cluster:2204
16/05/12 07:03:48 INFO CBinder: searching for compat-sap-c++.so at /opt/rh/SAP/lib64/compat-sap-c++.so
16/05/12 07:03:48 WARN CBinder: could not find compat-sap-c++.so
16/05/12 07:03:48 INFO CBinder: searching for libpam.so.0 at /lib64/libpam.so.0
16/05/12 07:03:48 INFO CBinder: loading libpam.so.0 from /lib64/libpam.so.0
16/05/12 07:03:48 INFO CBinder: loading library libprotobuf.so
16/05/12 07:03:48 INFO CBinder: loading library libprotoc.so
16/05/12 07:03:48 INFO CBinder: loading library libtbbmalloc.so
16/05/12 07:03:48 INFO CBinder: loading library libtbb.so
16/05/12 07:03:48 INFO CBinder: loading library libv2runtime.so
16/05/12 07:03:48 INFO CBinder: loading library libv2net.so
16/05/12 07:03:48 INFO CBinder: loading library libv2catalog_connector.so
16/05/12 07:03:48 INFO CatalogFactory: returning a Vora catalog client of this Vora catalog server: master.i-14371789.cluster:2204
16/05/12 07:11:56 INFO CommandHandler: Getting BROWSE data/user/17401406272892502037-4985062628452729335_f17e36cf-0003-0015-452e-800c700001fa
16/05/12 07:11:56 INFO Utils: freeing the buffer
16/05/12 07:11:56 INFO Utils: freeing the buffer
16/05/12 07:12:02 INFO Utils: freeing the buffer
16/05/12 07:12:02 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/12 07:12:02 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/12 07:12:02 INFO CatalogFactory: returning a Vora catalog client of this Vora catalog server: master.i-14371789.cluster:2204
16/05/12 07:12:02 INFO Utils: freeing the buffer
16/05/12 07:12:02 INFO DefaultSource: Creating VoraRelation testtable using an existing catalog table
16/05/12 07:12:02 INFO Utils: freeing the buffer
16/05/12 07:12:11 INFO Utils: freeing the buffer
16/05/12 07:14:15 ERROR RequestOrchestrator: Result set was not fetched by connected Client. Hence cancelled the execution
16/05/12 07:14:15 ERROR RequestOrchestrator: org.apache.spark.SparkException: Job 0 cancelled part of cancelled job group f17e36cf-0003-0015-452e-800c70000216
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
        at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1229)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply$mcVI$sp(DAGScheduler.scala:681)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply(DAGScheduler.scala:681)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply(DAGScheduler.scala:681)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
        at org.apache.spark.scheduler.DAGScheduler.handleJobGroupCancelled(DAGScheduler.scala:681)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1475)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:902)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:900)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
        at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:900)
        at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$2$$anonfun$applyOrElse$7.apply(CommandRouter.scala:383)
        at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$2$$anonfun$applyOrElse$7.apply(CommandRouter.scala:362)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$2.applyOrElse(CommandRouter.scala:362)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
        at com.sap.hana.spark.network.CommandHandler.aroundReceive(CommandRouter.scala:204)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
        at akka.dispatch.Mailbox.run(Mailbox.scala:220)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Also strange is this error:

同样奇怪的是这个错误:

16/05/12 07:03:48 INFO CBinder: searching for compat-sap-c++.so at /opt/rh/SAP/lib64/compat-sap-c++.so
    16/05/12 07:03:48 WARN CBinder: could not find compat-sap-c++.so

Because I have this file in the location:

因为我在这个位置有这个文件:

ls /opt/rh/SAP/lib64/

ls / opt / rh / SAP / lib64 /

compat-sap-c++.so

After changing com.sap.hana.aws.extensions.AWSResolver into com.sap.hana.spark.aws.extensions.AWSResolver now the log file looks different:

改变com.sap.hana.aws.extensions之后。AWSResolver com.sap.hana.spark.aws.extensions。AWSResolver现在日志文件看起来不同了:

    16/05/17 10:04:08 INFO CommandHandler: Getting BROWSE data/user/9110494231822270485-5373255807276155190_7e6efa3c-0003-0015-4a91-a3b020000139
16/05/17 10:04:13 INFO CommandHandler: Getting BROWSE data/user/9110494231822270485-5373255807276155196_7e6efa3c-0003-0015-4a91-a3b02000013f
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/17 10:04:29 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO DefaultSource: Creating VoraRelation testtable using an existing catalog table
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO ConfigurableHostMapper: Load Strategy: RELAXEDLOCAL (default)
16/05/17 10:04:29 INFO HdfsBlockRetriever: Length of HDFS file (/user/vora/test.csv): 10 bytes.
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO ConfigurableHostMapper: Load Strategy: RELAXEDLOCAL (default)
16/05/17 10:04:29 INFO TableLoader: Loading table [testtable]
16/05/17 10:04:29 INFO ConfigurableHostMapper: Load Strategy: RELAXEDLOCAL (default)
16/05/17 10:04:29 INFO TableLoader: Initialized 1 loading threads. Waiting until finished... -- 0.00 s
16/05/17 10:04:29 INFO TableLoader: [secondary2.i-a5361638.cluster:2202] Host mapping (Ranges: 1/1 Size: 0.00 MB)
16/05/17 10:04:29 INFO VoraJdbcClient: [secondary2.i-a5361638.cluster:2202] MultiLoad: MULTIFILE
16/05/17 10:04:29 INFO TableLoader: [secondary2.i-a5361638.cluster:2202] Host finished:
    Raw ranges: 1/1
    Size:       0.00 MB
    Time:       0.29 s
    Throughput: 0.00 MB/s
16/05/17 10:04:29 INFO TableLoader: Finished 1 loading threads. -- 0.29 s
16/05/17 10:04:29 INFO TableLoader: Updated catalog -- 0.01 s
16/05/17 10:04:29 INFO TableLoader: Table load statistics:
    Name: testtable
    Size: 0.00 MB
    Hosts: 1
    Time: 0.30 s
    Cluster throughput: 0.00 MB/s
    Avg throughput per host: 0.00 MB/s
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO TableLoader: Loaded table [testtable] -- 0.37 s
16/05/17 10:04:38 INFO Utils: freeing the buffer
16/05/17 10:06:43 ERROR RequestOrchestrator: Result set was not fetched by connected Client. Hence cancelled the execution
16/05/17 10:06:43 ERROR RequestOrchestrator: org.apache.spark.SparkException: Job 1 cancelled part of cancelled job group 7e6efa3c-0003-0015-4a91-a3b02000015b
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
        at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1229)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply$mcVI$sp(DAGScheduler.scala:681)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply(DAGScheduler.scala:681)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled$1.apply(DAGScheduler.scala:681)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
        at org.apache.spark.scheduler.DAGScheduler.handleJobGroupCancelled(DAGScheduler.scala:681)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1475)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:902)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:900)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
        at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:900)
        at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$2$$anonfun$applyOrElse$7.apply(CommandRouter.scala:383)
        at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$2$$anonfun$applyOrElse$7.apply(CommandRouter.scala:362)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$2.applyOrElse(CommandRouter.scala:362)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
        at com.sap.hana.spark.network.CommandHandler.aroundReceive(CommandRouter.scala:204)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
        at akka.dispatch.Mailbox.run(Mailbox.scala:220)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

I is still "not fetched by the client", but now it looks that vora loaded the table.

我仍然“不是由客户端获取的”,但是现在看起来vora已经加载了这个表。

Anyone, some ideas how to fix it? The same error appears when I try to read Hive tables insted of Vora.

有人知道怎么修复吗?当我试图读取Vora嵌入的Hive表时,也会出现同样的错误。

Error: SAP DBTech JDBC: [403]: internal error: Error opening the cursor for the remote database for query "SELECT "vora_conn_testtable"."a1", "vora_conn_testtable"."a2", "vora_conn_testtable"."a3" FROM "spark_velocity"."testtable" "vora_conn_testtable" LIMIT 200 "

错误:SAP DBTech: JDBC[403]:内部错误:为查询“选择”vora_conn_testtable”打开远程数据库的光标时出错。“a1”、“vora_conn_testtable”。“a2”、“vora_conn_testtable”。从“a3 spark_velocity”。“testtable“”vora_conn_testtable“限制200”

4 个解决方案

#1


1  

I've faced the same issue and solved right now! Its cause is that HANA cannot understand worker node's host names. Spark controller send HANA worker node names, which has Spark RDDs. If HANA doesn't understand their host names, HANA cannot get result and the error occurs.

我也遇到过同样的问题,现在就解决了!其原因是HANA无法理解工作节点的主机名。Spark控制器发送HANA worker节点名,它有Spark RDDs。如果HANA不理解它们的主机名,则HANA无法获得结果并发生错误。

Please check hosts file on HANA.

请查看HANA上的主机文件。

#2


0  

The log shows error Result set was not fetched by connected Client. Hence cancelled the execution. Client in this context is HANA trying to fetch from Vora.

日志显示错误结果集不是由连接的客户端获取的。因此取消了执行。在此上下文中,客户端是HANA试图从Vora获取。

The error could be caused by a connection problem between HANA and Vora.

这个错误可能是由HANA和Vora之间的连接问题引起的。

  1. The hanaes-site.xml shows sap.hana.ar.provider=com.sap.hana.aws.extensions.AWSResolver. This looks like a typo. Assuming you use the aws.resolver-1.5.8.jar that is included in the lib directory after deploying HANASPARKCTRL00P_5-70001262.RPM the correct path should be com.sap.hana.spark.aws.extensions.AWSResolver. See PDF document attached to SAP Note 2273047 - SAP HANA Spark Controller SPS 11 (Compatible with Spark 1.5.2)
  2. hanaes-site。xml显示sap.hana.ar.provider = com.sap.hana.aws.extensions.AWSResolver。这看起来像是打印错误。假设你用的是aws。在部署HANASPARKCTRL00P_5-70001262之后包含在lib目录中的jar。正确的路径应该是com. sapa . spark.awsextension . awsresolver。参见SAP附注2273047 - SAP HANA火花控制器SPS 11(与Spark 1.5.2兼容)
  3. Ensure that the necessary ports are open: See HANA Admin Guide -> 9.2.3.3 Spark Controller Configuration Parameters -> ports 56000-58000 on all Spark executor nodes
  4. 确保必要的端口是打开的:请参阅HANA管理指南-> 9.2.3.3 Spark Controller配置参数->端口56000-58000在所有Spark executor节点上

If issues still occur, you could you check Spark executor logs for problems:

如果仍然出现问题,您可以检查Spark executor日志,查看问题:

  1. Start the Spark Controller and reproduce the issue/error.
  2. 启动火花控制器并重新生成问题/错误。
  3. Navigate to the Yarn ResoureManager UI at http://:8088 (Ambari provides a Quick Link via Ambari -> Yarn -> Quick Links -> Resource Manager UI)
  4. 浏览纱线ResoureManager UI (http://://:8088) (Ambari通过Ambari ->纱线->快速链接->资源管理器UI提供快速链接)
  5. In the Yarn ResourceManager UI click on the 'ApplicationMaster' link in column 'Tracking UI' of your running Spark Controller application
  6. 在纱线资源管理器UI中,单击正在运行的星火控制器应用程序的“跟踪UI”列中的“应用程序管理员”链接
  7. On the Spark UI, click on tab 'Executors'. Then for each of the executors, click on ‘stdout’ and ‘stderr’ and check for errors
  8. 在Spark UI上,单击“执行器”。然后,对于每个执行器,单击“stdout”和“stderr”并检查错误

Unrelated: Those parameters are deprecated with Vora 1.2 and you can remove them from hanaes-site.xml: spark.vora.hosts, spark.vora.zkurls

不相关:Vora 1.2不支持这些参数,您可以从hanaes-site中删除它们。xml:spark.vora。主机,spark.vora.zkurls

#3


0  

Finally after hours of looking into documentation I found the issue. It turns out that I lacked some parameters in Yarn configuration (don't know why this affected HANA-Vora connection).

经过数小时的文档调查,我终于找到了问题所在。事实证明,我在纱线配置中缺少一些参数(不知道为什么这影响了HANA-Vora连接)。

This is what I did:

这就是我所做的:

Open the yarn-site.xml file in an editor or log in to Ambari web UI and select Yarn>Config. Locate the property "yarn.nodemanager.aux-services" and add "spark_shuffle" to its current value. The new property name should be "mapreduce_shuffle,spark_shuffle". Add or edit the property "yarn.nodemanager.aux-services.spark_shuffle.class", and set it to "org.apache.spark.network.yarn.YarnShuffleService". Copy the spark--yarn-shuffle.jar file from Spark to Hadoop-Yarn class path in all the node manager hosts. Typically this folder is located in /usr/hdp//hadoop-yarn/lib. Restart Yarn and the node manager

打开yarn-site。编辑器中的xml文件或登录到Ambari web UI并选择纱线>配置。“yarn.nodemanager定位属性。并将“spark_shuffle”添加到当前值。新的属性名应该是“mapreduce_shuffle,spark_shuffle”。添加或编辑属性“yarn.nodemanager.aux-services.spark_shuffle”。将其设置为“org.apache.spark.network.yarnshuffleservice”。火花——yarn-shuffle副本。在所有节点管理器中,从Spark到hadoop - thread类路径的jar文件。通常这个文件夹位于/usr/hdp//hadoop-yarn/lib中。重新启动纱线和节点管理器

#4


0  

I struggled with this issue for couple of days and this is caused by ports being blocked on the Spark controller. We are running this environment on AWS and I was able to resolve the error by updating the Spark host's security groups and opening ports 7800-7899, after which HANA was able to see the HIVE tables in SDA.

我花了几天时间处理这个问题,这是由于Spark控制器上的端口被阻塞造成的。我们正在AWS上运行这个环境,通过更新Spark主机的安全组并打开端口7800-7899,我能够解决这个错误,之后HANA能够看到SDA中的HIVE表。

Hope this helps someone, someday :)

希望这对某个人有帮助,有一天:

#1


1  

I've faced the same issue and solved right now! Its cause is that HANA cannot understand worker node's host names. Spark controller send HANA worker node names, which has Spark RDDs. If HANA doesn't understand their host names, HANA cannot get result and the error occurs.

我也遇到过同样的问题,现在就解决了!其原因是HANA无法理解工作节点的主机名。Spark控制器发送HANA worker节点名,它有Spark RDDs。如果HANA不理解它们的主机名,则HANA无法获得结果并发生错误。

Please check hosts file on HANA.

请查看HANA上的主机文件。

#2


0  

The log shows error Result set was not fetched by connected Client. Hence cancelled the execution. Client in this context is HANA trying to fetch from Vora.

日志显示错误结果集不是由连接的客户端获取的。因此取消了执行。在此上下文中,客户端是HANA试图从Vora获取。

The error could be caused by a connection problem between HANA and Vora.

这个错误可能是由HANA和Vora之间的连接问题引起的。

  1. The hanaes-site.xml shows sap.hana.ar.provider=com.sap.hana.aws.extensions.AWSResolver. This looks like a typo. Assuming you use the aws.resolver-1.5.8.jar that is included in the lib directory after deploying HANASPARKCTRL00P_5-70001262.RPM the correct path should be com.sap.hana.spark.aws.extensions.AWSResolver. See PDF document attached to SAP Note 2273047 - SAP HANA Spark Controller SPS 11 (Compatible with Spark 1.5.2)
  2. hanaes-site。xml显示sap.hana.ar.provider = com.sap.hana.aws.extensions.AWSResolver。这看起来像是打印错误。假设你用的是aws。在部署HANASPARKCTRL00P_5-70001262之后包含在lib目录中的jar。正确的路径应该是com. sapa . spark.awsextension . awsresolver。参见SAP附注2273047 - SAP HANA火花控制器SPS 11(与Spark 1.5.2兼容)
  3. Ensure that the necessary ports are open: See HANA Admin Guide -> 9.2.3.3 Spark Controller Configuration Parameters -> ports 56000-58000 on all Spark executor nodes
  4. 确保必要的端口是打开的:请参阅HANA管理指南-> 9.2.3.3 Spark Controller配置参数->端口56000-58000在所有Spark executor节点上

If issues still occur, you could you check Spark executor logs for problems:

如果仍然出现问题,您可以检查Spark executor日志,查看问题:

  1. Start the Spark Controller and reproduce the issue/error.
  2. 启动火花控制器并重新生成问题/错误。
  3. Navigate to the Yarn ResoureManager UI at http://:8088 (Ambari provides a Quick Link via Ambari -> Yarn -> Quick Links -> Resource Manager UI)
  4. 浏览纱线ResoureManager UI (http://://:8088) (Ambari通过Ambari ->纱线->快速链接->资源管理器UI提供快速链接)
  5. In the Yarn ResourceManager UI click on the 'ApplicationMaster' link in column 'Tracking UI' of your running Spark Controller application
  6. 在纱线资源管理器UI中,单击正在运行的星火控制器应用程序的“跟踪UI”列中的“应用程序管理员”链接
  7. On the Spark UI, click on tab 'Executors'. Then for each of the executors, click on ‘stdout’ and ‘stderr’ and check for errors
  8. 在Spark UI上,单击“执行器”。然后,对于每个执行器,单击“stdout”和“stderr”并检查错误

Unrelated: Those parameters are deprecated with Vora 1.2 and you can remove them from hanaes-site.xml: spark.vora.hosts, spark.vora.zkurls

不相关:Vora 1.2不支持这些参数,您可以从hanaes-site中删除它们。xml:spark.vora。主机,spark.vora.zkurls

#3


0  

Finally after hours of looking into documentation I found the issue. It turns out that I lacked some parameters in Yarn configuration (don't know why this affected HANA-Vora connection).

经过数小时的文档调查,我终于找到了问题所在。事实证明,我在纱线配置中缺少一些参数(不知道为什么这影响了HANA-Vora连接)。

This is what I did:

这就是我所做的:

Open the yarn-site.xml file in an editor or log in to Ambari web UI and select Yarn>Config. Locate the property "yarn.nodemanager.aux-services" and add "spark_shuffle" to its current value. The new property name should be "mapreduce_shuffle,spark_shuffle". Add or edit the property "yarn.nodemanager.aux-services.spark_shuffle.class", and set it to "org.apache.spark.network.yarn.YarnShuffleService". Copy the spark--yarn-shuffle.jar file from Spark to Hadoop-Yarn class path in all the node manager hosts. Typically this folder is located in /usr/hdp//hadoop-yarn/lib. Restart Yarn and the node manager

打开yarn-site。编辑器中的xml文件或登录到Ambari web UI并选择纱线>配置。“yarn.nodemanager定位属性。并将“spark_shuffle”添加到当前值。新的属性名应该是“mapreduce_shuffle,spark_shuffle”。添加或编辑属性“yarn.nodemanager.aux-services.spark_shuffle”。将其设置为“org.apache.spark.network.yarnshuffleservice”。火花——yarn-shuffle副本。在所有节点管理器中,从Spark到hadoop - thread类路径的jar文件。通常这个文件夹位于/usr/hdp//hadoop-yarn/lib中。重新启动纱线和节点管理器

#4


0  

I struggled with this issue for couple of days and this is caused by ports being blocked on the Spark controller. We are running this environment on AWS and I was able to resolve the error by updating the Spark host's security groups and opening ports 7800-7899, after which HANA was able to see the HIVE tables in SDA.

我花了几天时间处理这个问题,这是由于Spark控制器上的端口被阻塞造成的。我们正在AWS上运行这个环境,通过更新Spark主机的安全组并打开端口7800-7899,我能够解决这个错误,之后HANA能够看到SDA中的HIVE表。

Hope this helps someone, someday :)

希望这对某个人有帮助,有一天: