当从文件中读取json时,如何正确处理R的库RJSONIO中的转义Unicode字符

时间:2020-11-30 00:27:48

I am using R's RJSONIO to read json from a file. The json contains unicode characters, which get read incorrectly.

我使用R的RJSONIO从文件中读取json。json包含不正确读取的unicode字符。

The code works when the json is passed as string as shown by the author of the R package in the question on * How to correctly deal with escaped Unicode Characters in R e.g. the em dash (—).

当json作为字符串被传递时,代码就会工作,如*上的问题中的R包的作者所示,如何正确处理R中转义的Unicode字符,例如em dash(-)。

However when the json is read from a file, it does not produce the correct unicode representation. As seen below:

但是,当从文件中读取json时,它不会产生正确的unicode表示。如所示:

fromJSON(content="~/MTS/temp")
$query
$query$categorymembers
$query$categorymembers[[1]]
$query$categorymembers[[1]]$ns
[1] 0
$query$categorymembers[[1]]$title
[1] "Banach\023Tarski paradox"

Where ~/MTS/temp contains:

在~ / MTS / temp包含:

{"query":{"categorymembers":[{"ns":0,"title":"Banach\u2013Tarski paradox"}]}}`

1 个解决方案

#1


1  

An alternative package called jsonlite works the way you would expect on my system (OS X) -- but I did verify that RJSONIO does not. This is after I saved your JSON snippet to a file called utext.txt:

一个名为jsonlite的替代包按照您在我的系统(OS X)上所期望的方式工作——但我确实验证了RJSONIO没有。这是在我将JSON片段保存到名为utext.txt的文件之后:

file.show("utext.txt")
## {"query":{"categorymembers":[{"ns":0,"title":"Banach\u2013Tarski paradox"}]}}
jsonlite::fromJSON("~/temp/utext.txt")
## $query
## $query$categorymembers
##   ns                 title
## 1  0 Banach–Tarski paradox

Here is another solution that is a bit more platform-dependent: Encode your Unicode escaped files prior to reading them. (Whether or not your platform has this utility, I do not know, but even for Windows you can probably find it.)

这是另一个与平台相关的解决方案:在读取Unicode文件之前对其进行编码。(我不知道你的平台是否有这个功能,但即使是Windows,你也能找到它。)

My system locale encoding is UTF-8 (OS X standard), so when I run the command line utility native2ascii I can encode it as UTF-8, and then read it into R, where my locale is set to en_GB.UTF-8.

我的系统语言环境编码是UTF-8 (OS X标准),所以当我运行命令行实用程序native2ascii时,我可以将它编码为UTF-8,然后将它读入R,其中我的语言环境设置为en_GB.UTF-8。

From a Terminal/shell:

从一个终端/壳:

native2ascii -reverse ~/temp/utext.txt ~/temp/utextUTF8.txt

Then in R:

然后在R:

RJSONIO::fromJSON("~/temp/utextUTF8.txt")
## $query
## $query$categorymembers
## $query$categorymembers[[1]]
## $query$categorymembers[[1]]$ns
## [1] 0
## 
## $query$categorymembers[[1]]$title
## [1] "Banach–Tarski paradox"

Voil\u00e0 problem solved.

你瞧\ u00e0问题解决了。

#1


1  

An alternative package called jsonlite works the way you would expect on my system (OS X) -- but I did verify that RJSONIO does not. This is after I saved your JSON snippet to a file called utext.txt:

一个名为jsonlite的替代包按照您在我的系统(OS X)上所期望的方式工作——但我确实验证了RJSONIO没有。这是在我将JSON片段保存到名为utext.txt的文件之后:

file.show("utext.txt")
## {"query":{"categorymembers":[{"ns":0,"title":"Banach\u2013Tarski paradox"}]}}
jsonlite::fromJSON("~/temp/utext.txt")
## $query
## $query$categorymembers
##   ns                 title
## 1  0 Banach–Tarski paradox

Here is another solution that is a bit more platform-dependent: Encode your Unicode escaped files prior to reading them. (Whether or not your platform has this utility, I do not know, but even for Windows you can probably find it.)

这是另一个与平台相关的解决方案:在读取Unicode文件之前对其进行编码。(我不知道你的平台是否有这个功能,但即使是Windows,你也能找到它。)

My system locale encoding is UTF-8 (OS X standard), so when I run the command line utility native2ascii I can encode it as UTF-8, and then read it into R, where my locale is set to en_GB.UTF-8.

我的系统语言环境编码是UTF-8 (OS X标准),所以当我运行命令行实用程序native2ascii时,我可以将它编码为UTF-8,然后将它读入R,其中我的语言环境设置为en_GB.UTF-8。

From a Terminal/shell:

从一个终端/壳:

native2ascii -reverse ~/temp/utext.txt ~/temp/utextUTF8.txt

Then in R:

然后在R:

RJSONIO::fromJSON("~/temp/utextUTF8.txt")
## $query
## $query$categorymembers
## $query$categorymembers[[1]]
## $query$categorymembers[[1]]$ns
## [1] 0
## 
## $query$categorymembers[[1]]$title
## [1] "Banach–Tarski paradox"

Voil\u00e0 problem solved.

你瞧\ u00e0问题解决了。