在比较Perl中的字符串时如何忽略重音?

时间:2021-05-03 00:10:29

I have this quiz application where I match what people type with the right answer. For now, what I do is basically that :

我有这个测验应用程序,我匹配人们输入的正确答案。现在,我所做的基本上是:

if ($input =~ /$answer/i) {
     print "you won";
}

It's nice, as if the answer is "fish" the user can type "a fish" and be counted a good answer.

这很好,好像答案是“鱼”,用户可以输入“鱼”,并被认为是一个很好的答案。

The problem I'm facing is that, well, my users as I are french, and I'd like to be able to accept, say, a user typing "taton", and the answer being "tâton".

我面临的问题是,我的用户是因为我是法国人,而且我希望能够接受用户输入“taton”,答案是“tâton”。

So, what I could do, is :

那么,我能做的是:

use POSIX qw(locale_h);
use locale;
setlocale(LC_TYPE, "fr_FR.ISO8859-15");
setlocale(LC_COLLATE, "fr_FR.ISO8859-15");

And in my check routine, do a :

在我的检查程序中,做一个:

$input = lc($input);
$input =~ tr/àáâãäåçèéêëìíîïñòóôõöùúûüýÿ/aaaaaaceeeeiiiinooooouuuuyy/;

and something likewise with the answer.

还有同样的答案。

I don't like it, because I have to hard code things, and the day I decide I'm leaving the ISO-8859-15 world for the UTF-8 world, I'm doomed.

我不喜欢它,因为我必须硬编码,当我决定我将离开ISO-8859-15世界的UTF-8世界时,我注定要失败。

So, I'm looking for a way to compare strings, that will make "tâton" eq "taton", "maçon" eq "macon" or "macon" =~ /maçon/ be true.

所以,我正在寻找一种方法来比较字符串,这将使“tâton”eq“taton”,“maçon”eq“macon”或“macon”=〜/maçon/ be true。

2 个解决方案

#1


Try the Text::Unaccent module from CPAN (or Text::Unaccent::PurePerl).

尝试CPAN(或Text :: Unaccent :: PurePerl)中的Text :: Unaccent模块。

#2


This does not seem like a proper occasion for invoking regular expressions--you should simply have a list of acceptable answers, plus some filtering to remove nonessential words like "a", "the", and their language-specific equivalents.

这似乎不是调用正则表达式的适当场合 - 您应该只有一个可接受的答案列表,加上一些过滤来删除不必要的单词,如“a”,“the”,以及它们特定于语言的等价物。

Whatever you do, it seems obvious to me that it must be character-encoding-aware and language-aware. Regular expressions are typically neither.

无论你做什么,对我来说似乎很明显它必须是字符编码感知和语言感知。正则表达式通常都不是。

#1


Try the Text::Unaccent module from CPAN (or Text::Unaccent::PurePerl).

尝试CPAN(或Text :: Unaccent :: PurePerl)中的Text :: Unaccent模块。

#2


This does not seem like a proper occasion for invoking regular expressions--you should simply have a list of acceptable answers, plus some filtering to remove nonessential words like "a", "the", and their language-specific equivalents.

这似乎不是调用正则表达式的适当场合 - 您应该只有一个可接受的答案列表,加上一些过滤来删除不必要的单词,如“a”,“the”,以及它们特定于语言的等价物。

Whatever you do, it seems obvious to me that it must be character-encoding-aware and language-aware. Regular expressions are typically neither.

无论你做什么,对我来说似乎很明显它必须是字符编码感知和语言感知。正则表达式通常都不是。