使用unique()和==匹配重音字符与非重音字符

时间:2021-01-01 20:23:27

I'm putting together some tables that look almost the same, except that some characters appear accented in some and non-accented in others. For instance, "André" sometimes reads "Andre", "Flávio" and "Flavio", etc. I need to consider all variations as equal, but unique() considers them as different. I thought about changing all accented to non accented, and then using unique(), but I thought that maybe there is another, faster option.

我正在整理一些看起来几乎相同的表,除了一些字符在某些字符中显示重音而在其他字符中没有重音。例如,“André”有时会读取“Andre”,“Flávio”和“Flavio”等。我需要将所有变体视为相同,但unique()将它们视为不同。我想改变所有重音非重音,然后使用unique(),但我想也许还有另一种更快的选择。

Later I need to make the same accent-insensitive comparison using == so I'm thinking about removing all accents from a copy of each table, and do the comparison on the copies. Please tell me if there's a different, better approach.

后来我需要使用==进行相同的不区分重音比较,所以我正在考虑从每个表的副本中删除所有重音,并对副本进行比较。请告诉我是否有更好的方法。

1 个解决方案

#1


6  

The approach of removing accents prior to comparison seems appropriate for your purposes. Note that such a facility exists in iconv with the TRANSLIT flag

在比较之前删除重音的方法似乎适合您的目的。请注意,这样的工具存在于带有TRANSLIT标志的iconv中

iconv(c("André","Flávio"),to='ASCII//TRANSLIT')
#> [1] "Andre"  "Flavio"

#1


6  

The approach of removing accents prior to comparison seems appropriate for your purposes. Note that such a facility exists in iconv with the TRANSLIT flag

在比较之前删除重音的方法似乎适合您的目的。请注意,这样的工具存在于带有TRANSLIT标志的iconv中

iconv(c("André","Flávio"),to='ASCII//TRANSLIT')
#> [1] "Andre"  "Flavio"