如何在R中读取XML文件(在编码utf-8中)?

时间:2021-12-30 22:51:43

i would like to read in R an XML file with encoding=utf-8 (there are text in Hebrew).

我想在R中读取一个带有encoding = utf-8的XML文件(希伯来语中有文本)。

i know about Package XML, but i have't find in xmlToDataFrame any encoding options.

我知道Package XML,但我没有在xmlToDataFrame中找到任何编码选项。

i've tried:

我试过了:

library(XML)
data <- xmlToDataFrame("G:/G_RBT/Alexey/DB/kupot.xml")

but i get problems with Hebrew, i cant read it. I also tried:

但是我遇到了希伯来语的问题,我无法读懂它。我也尝试过:

data <- xmlParse("G:/G_RBT/Alexey/DB/kupot.xml",encoding="UTF-8")

and still encoding doesn't help.

仍然编码没有帮助。

1 个解决方案

#1


1  

Sometimes you need some manual elbow grease:

有时您需要一些手动弯头润滑脂:

library(XML)
library(httr)

# found this XML with hebrew
tmp <- GET("https://tiktickets.googlecode.com/svn-history/r102/trunk/war/ShowHalls.xml")
doc <- content(tmp, as="text", encoding="UTF-8")
doc <- substr(doc, 2, nchar(doc)) # skip encoding bits at the beginning

doc_x <- xmlParse(doc, encoding="UTF-8")

# do data frame conversion by hand

data.frame(name=xpathSApply(doc_x, "//ShowHall/name", xmlValue, encoding="UTF-8"),
           address=xpathSApply(doc_x, "//ShowHall/address", xmlValue, encoding="UTF-8"),
           phone1=xpathSApply(doc_x, "//ShowHall/phone1", xmlValue, encoding="UTF-8"),
           longitude=xpathSApply(doc_x, "//ShowHall/longitude", xmlValue, encoding="UTF-8"),
           latitude=xpathSApply(doc_x, "//ShowHall/latitude", xmlValue, encoding="UTF-8"))

#1


1  

Sometimes you need some manual elbow grease:

有时您需要一些手动弯头润滑脂:

library(XML)
library(httr)

# found this XML with hebrew
tmp <- GET("https://tiktickets.googlecode.com/svn-history/r102/trunk/war/ShowHalls.xml")
doc <- content(tmp, as="text", encoding="UTF-8")
doc <- substr(doc, 2, nchar(doc)) # skip encoding bits at the beginning

doc_x <- xmlParse(doc, encoding="UTF-8")

# do data frame conversion by hand

data.frame(name=xpathSApply(doc_x, "//ShowHall/name", xmlValue, encoding="UTF-8"),
           address=xpathSApply(doc_x, "//ShowHall/address", xmlValue, encoding="UTF-8"),
           phone1=xpathSApply(doc_x, "//ShowHall/phone1", xmlValue, encoding="UTF-8"),
           longitude=xpathSApply(doc_x, "//ShowHall/longitude", xmlValue, encoding="UTF-8"),
           latitude=xpathSApply(doc_x, "//ShowHall/latitude", xmlValue, encoding="UTF-8"))