将xml数据加载到hive表:org.apache.hadoop.hive.ql.metadata.HiveException

时间:2022-08-24 16:19:46

I'm trying to load XML data into Hive but I'm getting an error :

我试图将XML数据加载到Hive中,但是我得到了一个错误:

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":""}

. lang。RuntimeException:org.apache.hadoop.hive.ql.metadata。HiveException:处理行{"xmldata":"}时的Hive运行时错误

The xml file i have used is :

我使用的xml文件是:

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
  <id>11</id>
  <genre>Computer</genre>
  <price>44</price>
</book>
<book>
  <id>44</id>
  <genre>Fantasy</genre>
  <price>5</price>
</book>
</catalog>

The hive query i have used is :

我使用的hive查询是:

1) Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/user/xmlfile.xml' OVERWRITE INTO TABLE xmltable;

2) CREATE VIEW xmlview (id,genre,price)
AS SELECT
xpath(xmldata, '/catalog[1]/book[1]/id'),
xpath(xmldata, '/catalog[1]/book[1]/genre'),
xpath(xmldata, '/catalog[1]/book[1]/price')
FROM xmltable;

3) CREATE TABLE xmlfinal AS SELECT * FROM xmlview;

4) SELECT * FROM xmlfinal WHERE id ='11

Till 2nd query everything is fine but when i executed the 3rd query it's giving me error:

直到第二个查询,一切正常,但当我执行第三个查询时,它会给我错误:

The error is as below:

误差如下:

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error    while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675)
    at org.apache.hadoop.hive.ql.exec

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

So where it's going wrong? Also I'm using the proper xml file.

那么哪里出了问题呢?我还在使用正确的xml文件。

Thanks, Shree

谢谢,Shree

6 个解决方案

#1


4  

Reason for error :

错误原因:

1) case-1 : (your case) - xml content is being fed to hive as line by line.

1) case-1:(您的情况)- xml内容被逐行地输入到hive中。

input xml:

输入xml:

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
  <id>11</id>
  <genre>Computer</genre>
  <price>44</price>
</book>
<book>
  <id>44</id>
  <genre>Fantasy</genre>
  <price>5</price>
</book>
</catalog>  

check in hive :

在蜂巢检查:

select count(*) from xmltable;  // return 13 rows - means each line in individual row with col xmldata  

Reason for err :

犯错的原因:

XML is being read as 13 pieces not at unified. so invalid XML

XML被解读为13个部分而不是统一的。所以无效的XML

2) case-2 : xml content should be fed to hive as singleString - XpathUDFs works refer syntax : All functions follow the form: xpath_(xml_string, xpath_expression_string).* source

2)case-2: xml内容应该作为一个单独的字符串被输入到hive中——XpathUDFs的工作是引用语法:所有函数都遵循表单:xpath_(xml_string, xpath_expression_string)。*来源

input.xml

input.xml

<?xml version="1.0" encoding="UTF-8"?><catalog><book><id>11</id><genre>Computer</genre><price>44</price></book><book><id>44</id><genre>Fantasy</genre><price>5</price></book></catalog>

check in hive:

在蜂巢检查:

select count(*) from xmltable; // returns 1 row - XML is properly read as complete XML.

Means :

意思是:

xmldata   = <?xml version="1.0" encoding="UTF-8"?><catalog><book> ...... </catalog>

then apply your xpathUDF like this

然后像这样应用xpathUDF

select xpath(xmldata, 'xpath_expression_string' ) from xmltable

#2


4  

Find Jar here -- > Brickhouse ,

在这里找到Jar——> Brickhouse,

sample example here --> Example

这里的示例——>示例

similar example in * - here

类似的例子在*中

Solution:

解决方案:

--Load xml data to table
DROP table xmltable;
Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/vijay/data-input.xml' OVERWRITE INTO TABLE xmltable;

-- check contents
SELECT * from xmltable;

-- create view
Drop view  MyxmlView;
CREATE VIEW MyxmlView(id, genre, price) AS
SELECT
 xpath(xmldata, 'catalog/book/id/text()'),
 xpath(xmldata, 'catalog/book/genre/text()'),
 xpath(xmldata, 'catalog/book/price/text()')
FROM xmltable;

-- check view
SELECT id, genre,price FROM MyxmlView;


ADD jar /home/vijay/brickhouse-0.7.0-SNAPSHOT.jar;  --Add brickhouse jar 

CREATE TEMPORARY FUNCTION array_index AS 'brickhouse.udf.collect.ArrayIndexUDF';
CREATE TEMPORARY FUNCTION numeric_range AS 'brickhouse.udf.collect.NumericRange';

SELECT 
   array_index( id, n ) as my_id,
   array_index( genre, n ) as my_genre,
   array_index( price, n ) as my_price
from MyxmlView
lateral view numeric_range( size( id )) MyxmlView as n;

Output:

输出:

hive > SELECT
     >    array_index( id, n ) as my_id,
     >    array_index( genre, n ) as my_genre,
     >    array_index( price, n ) as my_price
     > from MyxmlView
     > lateral view numeric_range( size( id )) MyxmlView as n;
Automatically selecting local only mode for query
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/vijay/.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2014-07-09 05:36:45,220 null map = 0%,  reduce = 0%
2014-07-09 05:36:48,226 null map = 100%,  reduce = 0%
Ended Job = job_local_0001
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
my_id      my_genre      my_price
11      Computer        44
44      Fantasy 5

Time taken: 8.541 seconds, Fetched: 2 row(s)

用时:8.541秒,取:2排(s)

Adding-more-info as requested by Question owner:

根据问题所有者的要求添加更多信息:

将xml数据加载到hive表:org.apache.hadoop.hive.ql.metadata.HiveException将xml数据加载到hive表:org.apache.hadoop.hive.ql.metadata.HiveException

#3


0  

First try to load file my add file path-to-file, that will solve your problem as It is solved in my case

首先尝试加载文件我的添加文件路径到文件,这将解决您的问题,因为它是解决了我的情况

#4


0  

Oracle XML Extensions for Hive can be used to create Hive tables over XML like this. https://docs.oracle.com/cd/E54130_01/doc.26/e54142/oxh_hive.htm#BDCUG691

用于Hive的Oracle XML扩展可以用于像这样在XML上创建Hive表。https://docs.oracle.com/cd/E54130_01/doc.26/e54142/oxh_hive.htm BDCUG691

#5


0  

then follow the below steps to get the solution as like as you want, just change the source data this

然后按照下面的步骤来获得您想要的解决方案,只需更改源数据即可。

 <catalog><book><id>11</id><genre>Computer</genre><price>44</price></book></catalog>
<catalog><book><id>44</id><genre>Fantasy</genre><price>5</price></book></catalog> 

now try below steps:

现在尝试以下步骤:

select xpath(xmldata, '/catalog/book/id/text()')as id,
xpath(xmldata, '/catalog/book/genre/text()')as genre,
xpath(xmldata, '/catalog/book/price/text()')as price FROM xmltable;

now you will get ans as like this:

现在你会得到像这样的ans:

["11"] ["Computer"] ["44"]

[" 11 "](“计算机”)(“44”)

["44"] ["Fantasy"] ["5"]

(“44”)(“幻想”)(“5”)

if you apply xapth_string, xpath_int, xpath_int udfs the you will get ans like

如果您应用xapth_string、xpath_int、xpath_int udfs,您将得到类似的ans

11 computer 44

11计算机44

44 Fantasy 5.

44幻想5。

Thanks

谢谢

#6


0  

Also ensure that the XML file doesn't contain any empty spaces at the end of the last closing tag. In my case, the source file had one, and whenever I loaded the file into hive, my resulting table contained NULLS in them. So whenever I applied an xpath function, the result would have a few of these [] [] [] [] [] []

还要确保XML文件在最后一个结束标记的末尾不包含任何空格。在我的例子中,源文件有一个,每当我将文件装载到hive时,我的结果表中就包含了NULLS。因此,每当我应用一个xpath函数时,结果都会有一些[][][][][][]

Although the xpath_string function worked, the xpath_double and xpath_int functions never did. It kept throwing this exception -

虽然xpath_string函数可以工作,但是xpath_double和xpath_int函数却不能工作。它不断抛出这个异常

Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"line":""}

#1


4  

Reason for error :

错误原因:

1) case-1 : (your case) - xml content is being fed to hive as line by line.

1) case-1:(您的情况)- xml内容被逐行地输入到hive中。

input xml:

输入xml:

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
  <id>11</id>
  <genre>Computer</genre>
  <price>44</price>
</book>
<book>
  <id>44</id>
  <genre>Fantasy</genre>
  <price>5</price>
</book>
</catalog>  

check in hive :

在蜂巢检查:

select count(*) from xmltable;  // return 13 rows - means each line in individual row with col xmldata  

Reason for err :

犯错的原因:

XML is being read as 13 pieces not at unified. so invalid XML

XML被解读为13个部分而不是统一的。所以无效的XML

2) case-2 : xml content should be fed to hive as singleString - XpathUDFs works refer syntax : All functions follow the form: xpath_(xml_string, xpath_expression_string).* source

2)case-2: xml内容应该作为一个单独的字符串被输入到hive中——XpathUDFs的工作是引用语法:所有函数都遵循表单:xpath_(xml_string, xpath_expression_string)。*来源

input.xml

input.xml

<?xml version="1.0" encoding="UTF-8"?><catalog><book><id>11</id><genre>Computer</genre><price>44</price></book><book><id>44</id><genre>Fantasy</genre><price>5</price></book></catalog>

check in hive:

在蜂巢检查:

select count(*) from xmltable; // returns 1 row - XML is properly read as complete XML.

Means :

意思是:

xmldata   = <?xml version="1.0" encoding="UTF-8"?><catalog><book> ...... </catalog>

then apply your xpathUDF like this

然后像这样应用xpathUDF

select xpath(xmldata, 'xpath_expression_string' ) from xmltable

#2


4  

Find Jar here -- > Brickhouse ,

在这里找到Jar——> Brickhouse,

sample example here --> Example

这里的示例——>示例

similar example in * - here

类似的例子在*中

Solution:

解决方案:

--Load xml data to table
DROP table xmltable;
Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/vijay/data-input.xml' OVERWRITE INTO TABLE xmltable;

-- check contents
SELECT * from xmltable;

-- create view
Drop view  MyxmlView;
CREATE VIEW MyxmlView(id, genre, price) AS
SELECT
 xpath(xmldata, 'catalog/book/id/text()'),
 xpath(xmldata, 'catalog/book/genre/text()'),
 xpath(xmldata, 'catalog/book/price/text()')
FROM xmltable;

-- check view
SELECT id, genre,price FROM MyxmlView;


ADD jar /home/vijay/brickhouse-0.7.0-SNAPSHOT.jar;  --Add brickhouse jar 

CREATE TEMPORARY FUNCTION array_index AS 'brickhouse.udf.collect.ArrayIndexUDF';
CREATE TEMPORARY FUNCTION numeric_range AS 'brickhouse.udf.collect.NumericRange';

SELECT 
   array_index( id, n ) as my_id,
   array_index( genre, n ) as my_genre,
   array_index( price, n ) as my_price
from MyxmlView
lateral view numeric_range( size( id )) MyxmlView as n;

Output:

输出:

hive > SELECT
     >    array_index( id, n ) as my_id,
     >    array_index( genre, n ) as my_genre,
     >    array_index( price, n ) as my_price
     > from MyxmlView
     > lateral view numeric_range( size( id )) MyxmlView as n;
Automatically selecting local only mode for query
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/vijay/.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2014-07-09 05:36:45,220 null map = 0%,  reduce = 0%
2014-07-09 05:36:48,226 null map = 100%,  reduce = 0%
Ended Job = job_local_0001
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
my_id      my_genre      my_price
11      Computer        44
44      Fantasy 5

Time taken: 8.541 seconds, Fetched: 2 row(s)

用时:8.541秒,取:2排(s)

Adding-more-info as requested by Question owner:

根据问题所有者的要求添加更多信息:

将xml数据加载到hive表:org.apache.hadoop.hive.ql.metadata.HiveException将xml数据加载到hive表:org.apache.hadoop.hive.ql.metadata.HiveException

#3


0  

First try to load file my add file path-to-file, that will solve your problem as It is solved in my case

首先尝试加载文件我的添加文件路径到文件,这将解决您的问题,因为它是解决了我的情况

#4


0  

Oracle XML Extensions for Hive can be used to create Hive tables over XML like this. https://docs.oracle.com/cd/E54130_01/doc.26/e54142/oxh_hive.htm#BDCUG691

用于Hive的Oracle XML扩展可以用于像这样在XML上创建Hive表。https://docs.oracle.com/cd/E54130_01/doc.26/e54142/oxh_hive.htm BDCUG691

#5


0  

then follow the below steps to get the solution as like as you want, just change the source data this

然后按照下面的步骤来获得您想要的解决方案,只需更改源数据即可。

 <catalog><book><id>11</id><genre>Computer</genre><price>44</price></book></catalog>
<catalog><book><id>44</id><genre>Fantasy</genre><price>5</price></book></catalog> 

now try below steps:

现在尝试以下步骤:

select xpath(xmldata, '/catalog/book/id/text()')as id,
xpath(xmldata, '/catalog/book/genre/text()')as genre,
xpath(xmldata, '/catalog/book/price/text()')as price FROM xmltable;

now you will get ans as like this:

现在你会得到像这样的ans:

["11"] ["Computer"] ["44"]

[" 11 "](“计算机”)(“44”)

["44"] ["Fantasy"] ["5"]

(“44”)(“幻想”)(“5”)

if you apply xapth_string, xpath_int, xpath_int udfs the you will get ans like

如果您应用xapth_string、xpath_int、xpath_int udfs,您将得到类似的ans

11 computer 44

11计算机44

44 Fantasy 5.

44幻想5。

Thanks

谢谢

#6


0  

Also ensure that the XML file doesn't contain any empty spaces at the end of the last closing tag. In my case, the source file had one, and whenever I loaded the file into hive, my resulting table contained NULLS in them. So whenever I applied an xpath function, the result would have a few of these [] [] [] [] [] []

还要确保XML文件在最后一个结束标记的末尾不包含任何空格。在我的例子中,源文件有一个,每当我将文件装载到hive时,我的结果表中就包含了NULLS。因此,每当我应用一个xpath函数时,结果都会有一些[][][][][][]

Although the xpath_string function worked, the xpath_double and xpath_int functions never did. It kept throwing this exception -

虽然xpath_string函数可以工作,但是xpath_double和xpath_int函数却不能工作。它不断抛出这个异常

Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"line":""}