在R中编码:如何将这个字符串转换为UTF-8?

时间:2022-02-17 20:14:40

I am using R to read data from an old fame database. This works fine in general but I get unexpected encoding back when reading descriptions. E.g.:

我用R来读取一个旧的名声数据库的数据。这在一般情况下是正常的,但在阅读描述时,我得到了意想不到的编码。例如:

a <- "\U3e34653c"
# is supposed to be 
"ä"

I tried to iconv my self around this problem but despite trying numerous possibilities I was not able to get it displayed in a proper way. my locale: en_US.UTF-8. Is there a way around replacing (sub) such strings?

我试图在这个问题上改变自己,但是尽管尝试了很多可能性,我还是不能正确地展示它。我的语言环境:en_US.UTF-8。是否有办法替代(子)这些字符串?

2 个解决方案

#1


0  

Try opening the files with a different encoding string? As Ricardo suggests, perhaps Latin1? If not maybe some other exotic flavours:

尝试用不同的编码字符串打开文件?正如李嘉图所暗示的那样,也许是拉丁语?如果不是其他的异国风味:

f <- file( "myfile.db" , encoding = "Latin-1" )
dat <- readLines( f )

Can you link to some data?

你能链接到一些数据吗?

#2


0  

I had an identical problem when extracting data from SQL Server (via ODBC and the RODBC package). I solved it by changing the settings on the ODBC driver to treat all strings as unicode.

当从SQL Server(通过ODBC和RODBC包)提取数据时,我遇到了一个相同的问题。我通过更改ODBC驱动程序的设置来解决它,将所有字符串视为unicode。

More specifically, I'm using Actual Technologies ODBC driver for SQL Server and under 'Advanced Language Settings' can specify 'Treat text types as Unicode' with an option for 'Multi-byte text encoding' to be set to UTF-8.

更具体地说,我正在使用针对SQL Server的实际技术ODBC驱动程序,在“高级语言设置”下,可以指定“将文本类型视为Unicode”,并将“多字节文本编码”设置为UTF-8。

#1


0  

Try opening the files with a different encoding string? As Ricardo suggests, perhaps Latin1? If not maybe some other exotic flavours:

尝试用不同的编码字符串打开文件?正如李嘉图所暗示的那样,也许是拉丁语?如果不是其他的异国风味:

f <- file( "myfile.db" , encoding = "Latin-1" )
dat <- readLines( f )

Can you link to some data?

你能链接到一些数据吗?

#2


0  

I had an identical problem when extracting data from SQL Server (via ODBC and the RODBC package). I solved it by changing the settings on the ODBC driver to treat all strings as unicode.

当从SQL Server(通过ODBC和RODBC包)提取数据时,我遇到了一个相同的问题。我通过更改ODBC驱动程序的设置来解决它,将所有字符串视为unicode。

More specifically, I'm using Actual Technologies ODBC driver for SQL Server and under 'Advanced Language Settings' can specify 'Treat text types as Unicode' with an option for 'Multi-byte text encoding' to be set to UTF-8.

更具体地说,我正在使用针对SQL Server的实际技术ODBC驱动程序,在“高级语言设置”下,可以指定“将文本类型视为Unicode”,并将“多字节文本编码”设置为UTF-8。