RImpala:大数据时查询失败

时间:2022-07-27 15:26:29
check1<-rimpala.query("select * from sum2")
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  : 
  java.sql.SQLException: Method not supported

dim(sum2) is 49501 rows and 18 columns.

dim(sum2)是49501行和18列。

check1<-rimpala.query("select *from sum3")

dim(sum3) is 102 rows and 6 columns.

dim(sum3)是102行和6列。

It worked with smaller sample size.

它适用于较小的样本量。

sorry that I cant reproduce example to this. Is anyone encounter the same problem with larger data size? Any idea to solve this? Thanks.

对不起,我无法重现这个例子。是否有人遇到数据量更大的同样问题?有什么想法解决这个问题?谢谢。

3 个解决方案

#1


1  

As noted elsewhere on *, RImpala does not implement executeUpdate and so cannot run any query that modifies state. I suspect you hit your error not by running a larger SELECT query but rather because you tried to insert, update, or delete some data.

正如*上其他地方所述,RImpala没有实现executeUpdate,因此无法运行任何修改状态的查询。我怀疑你不是通过运行更大的SELECT查询而是因为你试图插入,更新或删除一些数据而遇到错误。

If you'd like to use Impala from R, I'd recommend using dplyrimpaladb.

如果您想使用R中的Impala,我建议您使用dplyrimpaladb。

#2


0  

RImpala (v0.1.6) build is updated with the support to execute DDL queries using executeUpdate.

RImpala(v0.1.6)构建更新时支持使用executeUpdate执行DDL查询。

The latest build contains the following fixes / additions:

最新版本包含以下修复/添加:

  1. Support for DDL query execution.
  2. 支持DDL查询执行。
  3. fetchSize parameter in query function to state the number of records that can be retrieved in one round trip read from Impala.
  4. 查询函数中的fetchSize参数,用于说明从Impala读取的一次往返中可以检索的记录数。
  5. Fix for query failing when NULL values are being returned.
  6. 修复了返回NULL值时查询失败的问题。
  7. Compatiblity with CDH 5.x.x
  8. 兼容CDH 5.x.x.

You can run DDL queries using the query function as illustrated below:

您可以使用查询功能运行DDL查询,如下所示:

rimpala.query(Q="drop table sample_table",isDDL="true")

You can also specify the fetchSize in the query function to aid reading large data efficiently.

您还可以在查询函数中指定fetchSize以帮助有效地读取大数据。

rimpala.query(Q="select * from sample_table",fetchSize="10000")

Please find the latest build in Cran : http://cran.r-project.org/web/packages/RImpala/index.html

请在Cran中找到最新版本:http://cran.r-project.org/web/packages/RImpala/index.html

Source Code : https://github.com/Mu-Sigma/RImpala

源代码:https://github.com/Mu-Sigma/RImpala

#3


0  

I have the same problem with the RImpala package and recommend to use the RJDBC package:

我对RImpala包有同样的问题,建议使用RJDBC包:

library(RJDBC)
drv <- JDBC(driverClass = "org.apache.hive.jdbc.HiveDriver",
          classPath = list.files("path_to_jars",pattern="jar$",full.names=T),
          identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://localhost:21050/;auth=noSasl")
check1 <- dbGetQuery(conn, "select *from sum3")

I used these jar files an evenything works as expected: https://downloads.cloudera.com/impala-jdbc/impala-jdbc-0.5-2.zip

我使用这些jar文件,按照预期工作:https://downloads.cloudera.com/impala-jdbc/impala-jdbc-0.5-2.zip

For more information and a speed comparison look at this blog post: http://datascience.la/r-and-impala-its-better-to-kiss-than-using-java/

有关更多信息和速度比较,请查看此博客文章:http://datascience.la/r-and-impala-its-better-to-kiss-than-using-java/

#1


1  

As noted elsewhere on *, RImpala does not implement executeUpdate and so cannot run any query that modifies state. I suspect you hit your error not by running a larger SELECT query but rather because you tried to insert, update, or delete some data.

正如*上其他地方所述,RImpala没有实现executeUpdate,因此无法运行任何修改状态的查询。我怀疑你不是通过运行更大的SELECT查询而是因为你试图插入,更新或删除一些数据而遇到错误。

If you'd like to use Impala from R, I'd recommend using dplyrimpaladb.

如果您想使用R中的Impala,我建议您使用dplyrimpaladb。

#2


0  

RImpala (v0.1.6) build is updated with the support to execute DDL queries using executeUpdate.

RImpala(v0.1.6)构建更新时支持使用executeUpdate执行DDL查询。

The latest build contains the following fixes / additions:

最新版本包含以下修复/添加:

  1. Support for DDL query execution.
  2. 支持DDL查询执行。
  3. fetchSize parameter in query function to state the number of records that can be retrieved in one round trip read from Impala.
  4. 查询函数中的fetchSize参数,用于说明从Impala读取的一次往返中可以检索的记录数。
  5. Fix for query failing when NULL values are being returned.
  6. 修复了返回NULL值时查询失败的问题。
  7. Compatiblity with CDH 5.x.x
  8. 兼容CDH 5.x.x.

You can run DDL queries using the query function as illustrated below:

您可以使用查询功能运行DDL查询,如下所示:

rimpala.query(Q="drop table sample_table",isDDL="true")

You can also specify the fetchSize in the query function to aid reading large data efficiently.

您还可以在查询函数中指定fetchSize以帮助有效地读取大数据。

rimpala.query(Q="select * from sample_table",fetchSize="10000")

Please find the latest build in Cran : http://cran.r-project.org/web/packages/RImpala/index.html

请在Cran中找到最新版本:http://cran.r-project.org/web/packages/RImpala/index.html

Source Code : https://github.com/Mu-Sigma/RImpala

源代码:https://github.com/Mu-Sigma/RImpala

#3


0  

I have the same problem with the RImpala package and recommend to use the RJDBC package:

我对RImpala包有同样的问题,建议使用RJDBC包:

library(RJDBC)
drv <- JDBC(driverClass = "org.apache.hive.jdbc.HiveDriver",
          classPath = list.files("path_to_jars",pattern="jar$",full.names=T),
          identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://localhost:21050/;auth=noSasl")
check1 <- dbGetQuery(conn, "select *from sum3")

I used these jar files an evenything works as expected: https://downloads.cloudera.com/impala-jdbc/impala-jdbc-0.5-2.zip

我使用这些jar文件,按照预期工作:https://downloads.cloudera.com/impala-jdbc/impala-jdbc-0.5-2.zip

For more information and a speed comparison look at this blog post: http://datascience.la/r-and-impala-its-better-to-kiss-than-using-java/

有关更多信息和速度比较,请查看此博客文章:http://datascience.la/r-and-impala-its-better-to-kiss-than-using-java/