This question already has an answer here:
这个问题在这里已有答案:
- Replace multiple letters with accents with gsub 10 answers
- 用gsub 10答案替换带重音的多个字母
I have some strings in R in UTF-8 encoding that contain accents. E.g. string="Hølmer"
or string="Elizalde-González"
我在R中使用UTF-8编码包含一些包含重音符号的字符串。例如。 string =“Hølmer”或string =“Elizalde-González”
Is there any nice function in R to replace the accented characters in these strings by their unaccented counterpart? I saw some solutions in PHP here, but how do I do this in R?
R中是否有任何不错的功能可以替换这些字符串中的重音符号?我在这里看到了PHP中的一些解决方案,但是我如何在R中执行此操作?
E.g. the PHP code
例如。 PHP代码
$unwanted_array = array( 'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U',
'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c',
'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o',
'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y' );
$str = strtr( $str, $unwanted_array );
seemed quite nice - but how would I do this in R?
看起来很不错 - 但我怎么会在R中这样做呢?
2 个解决方案
#1
36
The below answers are basically taken from elsewhere. The key is getting your unwanted_array
in the right format. You might want it as a list
:
以下答案基本上取自其他地方。关键是以正确的格式获取您的unwanted_array。您可能希望它作为列表:
unwanted_array = list( 'Š'='S', 'š'='s', 'Ž'='Z', 'ž'='z', 'À'='A', 'Á'='A', 'Â'='A', 'Ã'='A', 'Ä'='A', 'Å'='A', 'Æ'='A', 'Ç'='C', 'È'='E', 'É'='E',
'Ê'='E', 'Ë'='E', 'Ì'='I', 'Í'='I', 'Î'='I', 'Ï'='I', 'Ñ'='N', 'Ò'='O', 'Ó'='O', 'Ô'='O', 'Õ'='O', 'Ö'='O', 'Ø'='O', 'Ù'='U',
'Ú'='U', 'Û'='U', 'Ü'='U', 'Ý'='Y', 'Þ'='B', 'ß'='Ss', 'à'='a', 'á'='a', 'â'='a', 'ã'='a', 'ä'='a', 'å'='a', 'æ'='a', 'ç'='c',
'è'='e', 'é'='e', 'ê'='e', 'ë'='e', 'ì'='i', 'í'='i', 'î'='i', 'ï'='i', 'ð'='o', 'ñ'='n', 'ò'='o', 'ó'='o', 'ô'='o', 'õ'='o',
'ö'='o', 'ø'='o', 'ù'='u', 'ú'='u', 'û'='u', 'ý'='y', 'ý'='y', 'þ'='b', 'ÿ'='y' )
You can do this easily with iconv
or chartr
:
您可以使用iconv或chartr轻松完成此操作:
> iconv(string, to='ASCII//TRANSLIT')
[1] "Holmer"
> chartr(paste(names(unwanted_array), collapse=''),
paste(unwanted_array, collapse=''),
string)
[1] "Holmer"
Otherwise you have to loop through all of replacements because mapply
or similar wouldn't account for symbols already replaced by previous gsub
operations.:
否则你必须循环遍历所有替换,因为mapply或类似不会考虑已被先前gsub操作替换的符号:
# the loop:
out <- string
for(i in seq_along(unwanted_array))
out <- gsub(names(unwanted_array)[i],unwanted_array[i],out)
The result:
结果:
> out
[1] "Holmer"
#2
9
Another option is to use gsubfn
package:
另一个选择是使用gsubfn包:
library(gsubfn)
string="Hølmer"
gsubfn(paste(names(unwanted_array),collapse='|'), unwanted_array,string)
[1] "Holmer"
#1
36
The below answers are basically taken from elsewhere. The key is getting your unwanted_array
in the right format. You might want it as a list
:
以下答案基本上取自其他地方。关键是以正确的格式获取您的unwanted_array。您可能希望它作为列表:
unwanted_array = list( 'Š'='S', 'š'='s', 'Ž'='Z', 'ž'='z', 'À'='A', 'Á'='A', 'Â'='A', 'Ã'='A', 'Ä'='A', 'Å'='A', 'Æ'='A', 'Ç'='C', 'È'='E', 'É'='E',
'Ê'='E', 'Ë'='E', 'Ì'='I', 'Í'='I', 'Î'='I', 'Ï'='I', 'Ñ'='N', 'Ò'='O', 'Ó'='O', 'Ô'='O', 'Õ'='O', 'Ö'='O', 'Ø'='O', 'Ù'='U',
'Ú'='U', 'Û'='U', 'Ü'='U', 'Ý'='Y', 'Þ'='B', 'ß'='Ss', 'à'='a', 'á'='a', 'â'='a', 'ã'='a', 'ä'='a', 'å'='a', 'æ'='a', 'ç'='c',
'è'='e', 'é'='e', 'ê'='e', 'ë'='e', 'ì'='i', 'í'='i', 'î'='i', 'ï'='i', 'ð'='o', 'ñ'='n', 'ò'='o', 'ó'='o', 'ô'='o', 'õ'='o',
'ö'='o', 'ø'='o', 'ù'='u', 'ú'='u', 'û'='u', 'ý'='y', 'ý'='y', 'þ'='b', 'ÿ'='y' )
You can do this easily with iconv
or chartr
:
您可以使用iconv或chartr轻松完成此操作:
> iconv(string, to='ASCII//TRANSLIT')
[1] "Holmer"
> chartr(paste(names(unwanted_array), collapse=''),
paste(unwanted_array, collapse=''),
string)
[1] "Holmer"
Otherwise you have to loop through all of replacements because mapply
or similar wouldn't account for symbols already replaced by previous gsub
operations.:
否则你必须循环遍历所有替换,因为mapply或类似不会考虑已被先前gsub操作替换的符号:
# the loop:
out <- string
for(i in seq_along(unwanted_array))
out <- gsub(names(unwanted_array)[i],unwanted_array[i],out)
The result:
结果:
> out
[1] "Holmer"
#2
9
Another option is to use gsubfn
package:
另一个选择是使用gsubfn包:
library(gsubfn)
string="Hølmer"
gsubfn(paste(names(unwanted_array),collapse='|'), unwanted_array,string)
[1] "Holmer"