I am reading data from an old proprietary database. Unfortunately I end up (only for some strings) with Encoding(mychar_vector)
returning "unknown"
. Unfortunately I am using a wrapper around a closed source c hli
(host language interface), so there's probably not much I can do about that – if so I am glad to be proven wrong here...
我正在从旧的专有数据库中读取数据。不幸的是,我最终(仅针对某些字符串)使用Encoding(mychar_vector)返回“unknown”。不幸的是我在一个封闭源c hli(宿主语言界面)周围使用了一个包装器,所以我可能做的不多 - 如果是这样的话我很高兴在这里被证明是错的......
However, looking at the string vector except for a few replacements I had to make (see my related question) using gsub
the strings look ok. I would love to re-gain control of the encoding. Is there a way to forcefully set the encoding to UTF-8? I tried to
但是,看看字符串向量除了一些替换,我必须使用gsub(参见我的相关问题),字符串看起来不错。我希望重新获得对编码的控制权。有没有办法强制将编码设置为UTF-8?我尝试过了
Encoding(mychar_vector) <- "UTF-8"
# or
mychar_vector <- enc2utf8(mychar_vector)
But none of this worked out. Just got "unknown"
in return immediately after checking. Also looked into iconv
but there is obviously no way converting from "unknown" to UTF-8 as there is no mapping.
但这一切都没有成功。检查后立即得到“未知”的回报。也查看了iconv,但显然没有办法从“未知”转换为UTF-8,因为没有映射。
Is there a way to tell R, that only UTF-8 characters are involved and thus the encoding can be set to UTF-8. Note that some of the elements of the vector are already UTF-8.
有没有办法告诉R,只涉及UTF-8字符,因此编码可以设置为UTF-8。请注意,向量的某些元素已经是UTF-8。
2 个解决方案
#1
0
When I have dealt with files that are not UTF-8 encoded properly, I have used iconv with great success to forcefully convert the file by simply running a bash script in my rmarkdown notebook:
当我处理不正确UTF-8编码的文件时,我已经使用iconv取得了巨大成功,只需在我的rmarkdown笔记本中运行bash脚本即可强制转换文件:
iconv -c -t UTF-8 myfile.txt > Ratebeer-myfile.txt
You could also try this where file is your original file, and file-iconv is the modified file:
您也可以尝试使用file作为原始文件,file-iconv是修改后的文件:
#iconv −f iso−8859−1 −t UTF−8 file.txt > file-iconv.txt
Verify the encoding with:
验证编码:
file -I file-iconv.txt
Let me know if this helps or not.
如果这有帮助,请告诉我。
#2
0
If you can query the datasource in a way to return delimited table-like input, instead of a string, you can use read.table. It allows an explicit encoding parameter. This common usage works well.:
如果您可以以某种方式查询数据源以返回分隔的类表输入,而不是字符串,则可以使用read.table。它允许显式编码参数。这种常见用法效果很好:
read.table(filesource, header = TRUE, stringsAsFactors = FALSE, encoding = "UTF-8")
#1
0
When I have dealt with files that are not UTF-8 encoded properly, I have used iconv with great success to forcefully convert the file by simply running a bash script in my rmarkdown notebook:
当我处理不正确UTF-8编码的文件时,我已经使用iconv取得了巨大成功,只需在我的rmarkdown笔记本中运行bash脚本即可强制转换文件:
iconv -c -t UTF-8 myfile.txt > Ratebeer-myfile.txt
You could also try this where file is your original file, and file-iconv is the modified file:
您也可以尝试使用file作为原始文件,file-iconv是修改后的文件:
#iconv −f iso−8859−1 −t UTF−8 file.txt > file-iconv.txt
Verify the encoding with:
验证编码:
file -I file-iconv.txt
Let me know if this helps or not.
如果这有帮助,请告诉我。
#2
0
If you can query the datasource in a way to return delimited table-like input, instead of a string, you can use read.table. It allows an explicit encoding parameter. This common usage works well.:
如果您可以以某种方式查询数据源以返回分隔的类表输入,而不是字符串,则可以使用read.table。它允许显式编码参数。这种常见用法效果很好:
read.table(filesource, header = TRUE, stringsAsFactors = FALSE, encoding = "UTF-8")