将重音字符转换为ascii字符

时间:2021-06-05 20:19:52

What is the optimal way to to remove German (or French) accents from a vector of 16 million string variables.

从1600万个字符串变量的向量中删除德语(或法语)重音的最佳方法是什么。

e.g., 'Sjögren's syndrome' into 'Sjogren's syndrome'

例如,'Sjögren综合征'进入'干燥综合症'

Converstion of single character into a single character is better then transliteration such as

将单个字符转换为单个字符比音译更好

ä => ae ö => oe ü => ue.

ä=>aeö=>oeü=> ue。

e.g., using regular expression would be one option but is there something better (R package for this)?

例如,使用正则表达式将是一个选项,但有更好的东西(R包为此)?

gsub('ü','u',gsub('ö','o',"Sjögren's syndrome ( über) "))

gsub('ü','u',gsub('ö','o',“Sjögren综合症(über)”))

There are SO solutions for non-R platforms but not a good one for R.

对于非R平台有SO解决方案但对R来说不是很好的解决方案。

2 个解决方案

#1


21  

Use iconv to convert to ASCII with transliteration (if supported):

使用iconv通过音译转换为ASCII(如果支持):

iconv(c("über","Sjögren's"),to="ASCII//TRANSLIT")
[1] "uber"      "Sjogren's"

#2


15  

One of the linked answers suggest

其中一个相关答案表明

library(stringi)
stri_trans_general("Zażółć gęślą jaźń", "Latin-ASCII")

[1] "Zazolc gesla jazn"

#1


21  

Use iconv to convert to ASCII with transliteration (if supported):

使用iconv通过音译转换为ASCII(如果支持):

iconv(c("über","Sjögren's"),to="ASCII//TRANSLIT")
[1] "uber"      "Sjogren's"

#2


15  

One of the linked answers suggest

其中一个相关答案表明

library(stringi)
stri_trans_general("Zażółć gęślą jaźń", "Latin-ASCII")

[1] "Zazolc gesla jazn"