I am using R to read data from an old fame database. This works fine in general but I get unexpected encoding back when reading descriptions. E.g.:
我用R来读取一个旧的名声数据库的数据。这在一般情况下是正常的,但在阅读描述时,我得到了意想不到的编码。例如:
a <- "\U3e34653c"
# is supposed to be
"ä"
I tried to iconv
my self around this problem but despite trying numerous possibilities I was not able to get it displayed in a proper way. my locale: en_US.UTF-8. Is there a way around replacing (sub) such strings?
我试图在这个问题上改变自己,但是尽管尝试了很多可能性,我还是不能正确地展示它。我的语言环境:en_US.UTF-8。是否有办法替代(子)这些字符串?
2 个解决方案
#1
0
Try opening the files with a different encoding string? As Ricardo suggests, perhaps Latin1? If not maybe some other exotic flavours:
尝试用不同的编码字符串打开文件?正如李嘉图所暗示的那样,也许是拉丁语?如果不是其他的异国风味:
f <- file( "myfile.db" , encoding = "Latin-1" )
dat <- readLines( f )
Can you link to some data?
你能链接到一些数据吗?
#2
0
I had an identical problem when extracting data from SQL Server (via ODBC and the RODBC package). I solved it by changing the settings on the ODBC driver to treat all strings as unicode.
当从SQL Server(通过ODBC和RODBC包)提取数据时,我遇到了一个相同的问题。我通过更改ODBC驱动程序的设置来解决它,将所有字符串视为unicode。
More specifically, I'm using Actual Technologies ODBC driver for SQL Server and under 'Advanced Language Settings' can specify 'Treat text types as Unicode' with an option for 'Multi-byte text encoding' to be set to UTF-8.
更具体地说,我正在使用针对SQL Server的实际技术ODBC驱动程序,在“高级语言设置”下,可以指定“将文本类型视为Unicode”,并将“多字节文本编码”设置为UTF-8。
#1
0
Try opening the files with a different encoding string? As Ricardo suggests, perhaps Latin1? If not maybe some other exotic flavours:
尝试用不同的编码字符串打开文件?正如李嘉图所暗示的那样,也许是拉丁语?如果不是其他的异国风味:
f <- file( "myfile.db" , encoding = "Latin-1" )
dat <- readLines( f )
Can you link to some data?
你能链接到一些数据吗?
#2
0
I had an identical problem when extracting data from SQL Server (via ODBC and the RODBC package). I solved it by changing the settings on the ODBC driver to treat all strings as unicode.
当从SQL Server(通过ODBC和RODBC包)提取数据时,我遇到了一个相同的问题。我通过更改ODBC驱动程序的设置来解决它,将所有字符串视为unicode。
More specifically, I'm using Actual Technologies ODBC driver for SQL Server and under 'Advanced Language Settings' can specify 'Treat text types as Unicode' with an option for 'Multi-byte text encoding' to be set to UTF-8.
更具体地说,我正在使用针对SQL Server的实际技术ODBC驱动程序,在“高级语言设置”下,可以指定“将文本类型视为Unicode”,并将“多字节文本编码”设置为UTF-8。