I want to match a string to make sure it contains only letters.
我想匹配一个字符串以确保它只包含字母。
I've got this and it works just fine:
我买了这个,效果很好:
var onlyLetters = /^[a-zA-Z]*$/.test(myString);
BUT
但
Since I speak another language too, I need to allow all letters, not just A-Z. Also for example:
因为我也讲另一种语言,我需要允许所有的字母,而不仅仅是A-Z。还比如:
é ü ö ê å ø
does anyone know if there is a global 'alpha'
term that includes all letters to use with regExp? Or even better, does anyone have some kind of solution?
有没有人知道,如果有一个全局的“alpha”术语,它包含了与regExp一起使用的所有字母?或者更好的是,有人有什么解决办法吗?
Thanks alot
谢谢
EDIT: Just realized that you might also wanna allow '-' and ' ' incase of a double name like: 'Mary-Ann' or 'Mary Ann'
编辑:刚刚意识到你可能也想要“-”和“”的双重名字,比如:“玛丽安”或“玛丽安”
12 个解决方案
#1
26
I don’t know the actual reason for doing this, but if you want to use it as a pre-check for, say, login names oder user nicknames, I’d suggest you enter the characters yourself and don’t use the whole ‘alpha’ characters you’ll find in unicode, because you probably won’t find an optical difference in the following letters:
我不知道实际的理由这样做,但如果你想用它作为提前预支了,说,登录名奥得河用户昵称,我建议你进入角色,不使用整个“α”在unicode字符,你会发现,因为你可能不会找到一个光学的区别在以下字母:
А ≠ A ≠ Α # cyrillic, latin, greek
In such cases it’s better to specify the allowed letters manually if you want to minimise account faking and such.
在这种情况下,最好手工指定允许的字母,如果您想要最小化假账等。
Addition
除了
Well, if it’s for a field which is supposed to be non-unique, I would allow greek as well. I wouldn’t feel well when I force users into changing their name to a latinised version.
如果是一个非唯一的场,我也可以用希腊语。如果我强迫用户把他们的名字改成拉丁语版本,我会觉得不舒服。
But for unique fields like nicknames you need to give your other visitors of the site a hint, that it’s really the nickname they think it is. Bad enough that people will fake accounts with interchanging I and l already. Of course, it’s something that depends on your users; but to be sure I think it’s better to allow basic latin + diacritics only. (Maybe have a look at this list: Latin-derived_alphabet)
但是对于像昵称这样的独特字段,你需要给站点的其他访问者一个提示,那就是他们认为的真正的昵称。很糟糕的是,人们会用I和l交换来伪造账户。当然,这取决于你的用户;但我认为最好只允许基本的拉丁语+发音。(可以看看这个列表:Latin-derived_alphabet)
As an untested suggestion (with ‘-’, ‘_’ and ‘ ’):
作为一个未经测试的建议(用' - ',' _ '和'):
/^[a-zA-Z\-_ ’'‘ÆÐƎƏƐƔIJŊŒẞÞǷȜæðǝəɛɣijŋœĸſßþƿȝĄƁÇĐƊĘĦĮƘŁØƠŞȘŢȚŦŲƯY̨Ƴąɓçđɗęħįƙłøơşșţțŧųưy̨ƴÁÀÂÄǍĂĀÃÅǺĄÆǼǢƁĆĊĈČÇĎḌĐƊÐÉÈĖÊËĚĔĒĘẸƎƏƐĠĜǦĞĢƔáàâäǎăāãåǻąæǽǣɓćċĉčçďḍđɗðéèėêëěĕēęẹǝəɛġĝǧğģɣĤḤĦIÍÌİÎÏǏĬĪĨĮỊIJĴĶƘĹĻŁĽĿʼNŃN̈ŇÑŅŊÓÒÔÖǑŎŌÕŐỌØǾƠŒĥḥħıíìiîïǐĭīĩįịijĵķƙĸĺļłľŀʼnńn̈ňñņŋóòôöǒŏōõőọøǿơœŔŘŖŚŜŠŞȘṢẞŤŢṬŦÞÚÙÛÜǓŬŪŨŰŮŲỤƯẂẀŴẄǷÝỲŶŸȲỸƳŹŻŽẒŕřŗſśŝšşșṣßťţṭŧþúùûüǔŭūũűůųụưẃẁŵẅƿýỳŷÿȳỹƴźżžẓ]$/.test(myString)
Another edit: I have added the apostrophe for people with names like O’Neill or O’Reilly. (And the straight and the reversed apostrophe for people who can’t enter the curly one correctly.)
另一个编辑:我添加了O 'Neill和O 'Reilly这样名字的撇号。(对那些不能正确输入卷一的人来说,这是直的和颠倒的撇号。)
#2
12
var onlyLetters = /^[a-zA-Z\u00C0-\u00ff]+$/.test(myString)
#3
9
You can't do this in JS. It has a very limited regex and normalizer support. You would need to construct a lengthy and unmaintainable character array with all possible latin characters with diacritical marks (I guess there are around 500 different ones). Rather delegate the validation task to the server side which uses another language with more regex capabilties, if necessary with help of ajax.
用JS是不行的。它具有非常有限的regex和规范化支持。您将需要构建一个冗长且不可维护的字符数组,其中包含所有可能的带有区分字符标记的拉丁字符(我猜大约有500个不同的字符)。而是将验证任务委托给服务器端,服务器端使用另一种具有更多regex capabilties的语言,如果需要,可以借助ajax。
In a full fledged regex environment you could just test if the string matches \p{L}+
. Here's a Java example:
在完整的regex环境中,您可以测试字符串是否匹配\p{L}+。这是一个Java示例:
boolean valid = string.matches("\\p{L}+");
Alternatively, you could also normailze the text to get rid of the diacritical marks and check if it contains [A-Za-z]+
only. Here's again a Java example:
或者,您也可以对文本进行规范化,以去掉关键字标记,并检查它是否只包含[A-Za-z]+。这里又是一个Java示例:
string = Normalizer.normalize(string, Form.NFD).replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
boolean valid = string.matches("[A-Za-z]+");
PHP supports similar functions.
PHP支持类似的功能。
#4
8
When I tried to implement @Debilski's solution JavaScript didn't like the extended Latin characters -- I had to code them as JavaScript escapes:
当我试图实现@Debilski的解决方案JavaScript时,我不喜欢扩展的拉丁字符——我不得不将它们编码为JavaScript转义:
// The huge unicode escape string is equal to ÆÐƎƏƐƔIJŊŒẞÞǷȜæðǝəɛɣijŋœĸſßþƿȝĄƁÇĐƊĘĦ
// ĮƘŁØƠŞȘŢȚŦŲƯY̨Ƴąɓçđɗęħįƙłøơşșţțŧųưy̨ƴÁÀÂÄǍĂĀÃÅǺĄÆǼǢƁĆĊĈČÇĎḌĐƊÐÉÈĖÊËĚĔĒĘẸƎ
// ƏƐĠĜǦĞĢƔáàâäǎăāãåǻąæǽǣɓćċĉčçďḍđɗðéèėêëěĕēęẹǝəɛġĝǧğģɣĤḤĦIÍÌİÎÏǏĬĪĨĮỊ
// IJĴĶƘĹĻŁĽĿʼNŃN̈ŇÑŅŊÓÒÔÖǑŎŌÕŐỌØǾƠŒĥḥħıíìiîïǐĭīĩįịijĵķƙĸĺļłľŀʼnńn̈ňñ
// ņŋóòôöǒŏōõőọøǿơœŔŘŖŚŜŠŞȘṢẞŤŢṬŦÞÚÙÛÜǓŬŪŨŰŮŲỤƯẂẀŴẄǷÝỲŶŸȲỸƳŹŻŽẒŕřŗſśŝšşșṣßťţṭ
// ŧþúùûüǔŭūũűůųụưẃẁŵẅƿýỳŷÿȳỹƴźżžẓ
function isAlpha(string) {
var patt = /^[a-zA-Z\u00C6\u00D0\u018E\u018F\u0190\u0194\u0132\u014A\u0152\u1E9E\u00DE\u01F7\u021C\u00E6\u00F0\u01DD\u0259\u025B\u0263\u0133\u014B\u0153\u0138\u017F\u00DF\u00FE\u01BF\u021D\u0104\u0181\u00C7\u0110\u018A\u0118\u0126\u012E\u0198\u0141\u00D8\u01A0\u015E\u0218\u0162\u021A\u0166\u0172\u01AFY\u0328\u01B3\u0105\u0253\u00E7\u0111\u0257\u0119\u0127\u012F\u0199\u0142\u00F8\u01A1\u015F\u0219\u0163\u021B\u0167\u0173\u01B0y\u0328\u01B4\u00C1\u00C0\u00C2\u00C4\u01CD\u0102\u0100\u00C3\u00C5\u01FA\u0104\u00C6\u01FC\u01E2\u0181\u0106\u010A\u0108\u010C\u00C7\u010E\u1E0C\u0110\u018A\u00D0\u00C9\u00C8\u0116\u00CA\u00CB\u011A\u0114\u0112\u0118\u1EB8\u018E\u018F\u0190\u0120\u011C\u01E6\u011E\u0122\u0194\u00E1\u00E0\u00E2\u00E4\u01CE\u0103\u0101\u00E3\u00E5\u01FB\u0105\u00E6\u01FD\u01E3\u0253\u0107\u010B\u0109\u010D\u00E7\u010F\u1E0D\u0111\u0257\u00F0\u00E9\u00E8\u0117\u00EA\u00EB\u011B\u0115\u0113\u0119\u1EB9\u01DD\u0259\u025B\u0121\u011D\u01E7\u011F\u0123\u0263\u0124\u1E24\u0126I\u00CD\u00CC\u0130\u00CE\u00CF\u01CF\u012C\u012A\u0128\u012E\u1ECA\u0132\u0134\u0136\u0198\u0139\u013B\u0141\u013D\u013F\u02BCN\u0143N\u0308\u0147\u00D1\u0145\u014A\u00D3\u00D2\u00D4\u00D6\u01D1\u014E\u014C\u00D5\u0150\u1ECC\u00D8\u01FE\u01A0\u0152\u0125\u1E25\u0127\u0131\u00ED\u00ECi\u00EE\u00EF\u01D0\u012D\u012B\u0129\u012F\u1ECB\u0133\u0135\u0137\u0199\u0138\u013A\u013C\u0142\u013E\u0140\u0149\u0144n\u0308\u0148\u00F1\u0146\u014B\u00F3\u00F2\u00F4\u00F6\u01D2\u014F\u014D\u00F5\u0151\u1ECD\u00F8\u01FF\u01A1\u0153\u0154\u0158\u0156\u015A\u015C\u0160\u015E\u0218\u1E62\u1E9E\u0164\u0162\u1E6C\u0166\u00DE\u00DA\u00D9\u00DB\u00DC\u01D3\u016C\u016A\u0168\u0170\u016E\u0172\u1EE4\u01AF\u1E82\u1E80\u0174\u1E84\u01F7\u00DD\u1EF2\u0176\u0178\u0232\u1EF8\u01B3\u0179\u017B\u017D\u1E92\u0155\u0159\u0157\u017F\u015B\u015D\u0161\u015F\u0219\u1E63\u00DF\u0165\u0163\u1E6D\u0167\u00FE\u00FA\u00F9\u00FB\u00FC\u01D4\u016D\u016B\u0169\u0171\u016F\u0173\u1EE5\u01B0\u1E83\u1E81\u0175\u1E85\u01BF\u00FD\u1EF3\u0177\u00FF\u0233\u1EF9\u01B4\u017A\u017C\u017E\u1E93]+$/;
return patt.test(string);
}
#5
7
This can be tricky, unfortunately JavaScript has pretty poor support for internationalization. To do this check you'll have to create your own character class. This is because for instance, \w
is the same as [0-9A-Z_a-z]
which won't help you much and there isn't anything like [[:alpha:]]
in Javascript. But since it sounds like you're only going to use one other langauge you can probably just add those other characters into your character class.
这可能很棘手,不幸的是JavaScript对国际化的支持很差。要执行此检查,您必须创建自己的字符类。这是因为,例如,\w与[0-9A-Z_a-z]是相同的,这对您没有太大帮助,而且在Javascript中没有任何类似[:alpha:]的东西。但是听起来你只会使用另外一个语言,你可以把其他的字符添加到你的角色类中。
By the way, I think you'll need a ?
or *
in your regexp there if myString can be longer than one character.
顺便问一下,我想你需要a吗?或者*在regexp中,如果myString可以超过一个字符。
The full example,
完整的例子,
/^[a-zA-Zéüöêåø]*$/.test(myString);
/ ^[a-zA-Zeuoeaø]*美元/ test(myString);
#6
6
There should be, but the regex will be localization dependent. Thus, é ü ö ê å ø
won't be filtered if you're on a US localization, for example. To ensure your web site does what you want across all localizations, you should explicitly write out the characters in a form similar to what you are already doing.
应该有,但是regex将依赖于本地化。因此,e u o eø不会过滤后如果你在美国本地化为例。为了确保您的web站点在所有本地化中执行您想要的操作,您应该显式地以类似于您正在执行的操作的形式写出字符。
The only standard one I am aware of though is \w
, which would match all alphanumeric characters. You could do it the "standard" way by running two regex, one to verify \w
matches and another to verify that \d
(all digits) does not match, which would result in a guaranteed alpha-only string. Again, I'd strongly urge you not to use this technique as there's no guarantee what \w
will represent in a given localization, but this does answer your question.
我所知道的唯一标准是\w,它将匹配所有字母数字字符。您可以通过运行两个regex来实现“标准”方法,一个用于验证\w匹配,另一个验证\d(所有数字)不匹配,这将导致一个有保证的字母字符串。同样,我强烈建议您不要使用这种技术,因为不能保证在给定的本地化中\w代表什么,但是这确实回答了您的问题。
#7
5
I don't know anything about Javascript, but if it has proper unicode support, convert your string to a decomposed form, then remove the diacritics from it ([\u0300-\u036f\u1dc0-\u1dff]
). Then your letters will only be ASCII ones.
我对Javascript一无所知,但是如果它有适当的unicode支持,那么将您的字符串转换为分解后的形式,然后从它中删除diacritics ([\u0300-\u036f\u1dc0-\u1dff])。那么你的字母将只会是ASCII码的。
#8
5
You could aways use a blacklist instead of a whitelist. That way you only remove the characters you do not need.
你可以使用黑名单而不是白名单。这样,您只删除不需要的字符。
#9
3
You could use a blacklist - a list of characters to exclude.
您可以使用一个黑名单——一个排除字符的列表。
Also, it is important to verify the input on server-side, not only on client-side! Client-side can be bypassed easily.
此外,重要的是要在服务器端验证输入,而不仅仅是在客户端!客户端很容易被绕过。
#10
1
There are some shortcuts to achive this in other regular expression dialects - see this page. But I don't believe there are any standardised ones in JavaScript - certainly not that would be supported by all browsers.
在其他正则表达式方言中,有一些捷径可以实现这一点——请参见本页面。但是我不相信JavaScript中有任何标准化的——当然不是所有浏览器都支持。
#11
1
I'm using a convertor before checking, but it's still not friendly for all languages. I'm not sure that's possible.
在检查之前,我使用了一个转换器,但是它仍然不适合所有的语言。我不确定这是否可能。
function noExtendedChars( input_name ){
var whitelist = [
['a', 'à','á','â','ä','æ','ã','å','ā'],
['c', 'ç', 'ć', 'č'],
['e', 'è','é','ê','ë','ē','ė','ę'],
['i', 'ï','ï','í','ī','į','î'],
['l', 'ł'],
['n', 'ñ', 'ń'],
['o', 'ô', 'ö', 'ò', 'ó', 'œ', 'ø', 'ō', 'õ' ],
['s', 'ß', 'ś', 'š' ],
['u', 'û', 'ü', 'ù', 'ú', 'ū'],
['y', 'ÿ'],
['z', 'ž', 'ź', 'ż']
];
for( b=0; b < blacklist.length; b++ ){
var r= blacklist[b];
for ( a=1; a < r.length; a++ ){
input_name = input_name.replace( new RegExp( r[a], "gi") , r[0]);
}
}
return input_name;
}
#12
0
var regexp = /\B\#[a-zA-Z\x7f-\xff]+/g;
var result = searchText.match(regexp);
#1
26
I don’t know the actual reason for doing this, but if you want to use it as a pre-check for, say, login names oder user nicknames, I’d suggest you enter the characters yourself and don’t use the whole ‘alpha’ characters you’ll find in unicode, because you probably won’t find an optical difference in the following letters:
我不知道实际的理由这样做,但如果你想用它作为提前预支了,说,登录名奥得河用户昵称,我建议你进入角色,不使用整个“α”在unicode字符,你会发现,因为你可能不会找到一个光学的区别在以下字母:
А ≠ A ≠ Α # cyrillic, latin, greek
In such cases it’s better to specify the allowed letters manually if you want to minimise account faking and such.
在这种情况下,最好手工指定允许的字母,如果您想要最小化假账等。
Addition
除了
Well, if it’s for a field which is supposed to be non-unique, I would allow greek as well. I wouldn’t feel well when I force users into changing their name to a latinised version.
如果是一个非唯一的场,我也可以用希腊语。如果我强迫用户把他们的名字改成拉丁语版本,我会觉得不舒服。
But for unique fields like nicknames you need to give your other visitors of the site a hint, that it’s really the nickname they think it is. Bad enough that people will fake accounts with interchanging I and l already. Of course, it’s something that depends on your users; but to be sure I think it’s better to allow basic latin + diacritics only. (Maybe have a look at this list: Latin-derived_alphabet)
但是对于像昵称这样的独特字段,你需要给站点的其他访问者一个提示,那就是他们认为的真正的昵称。很糟糕的是,人们会用I和l交换来伪造账户。当然,这取决于你的用户;但我认为最好只允许基本的拉丁语+发音。(可以看看这个列表:Latin-derived_alphabet)
As an untested suggestion (with ‘-’, ‘_’ and ‘ ’):
作为一个未经测试的建议(用' - ',' _ '和'):
/^[a-zA-Z\-_ ’'‘ÆÐƎƏƐƔIJŊŒẞÞǷȜæðǝəɛɣijŋœĸſßþƿȝĄƁÇĐƊĘĦĮƘŁØƠŞȘŢȚŦŲƯY̨Ƴąɓçđɗęħįƙłøơşșţțŧųưy̨ƴÁÀÂÄǍĂĀÃÅǺĄÆǼǢƁĆĊĈČÇĎḌĐƊÐÉÈĖÊËĚĔĒĘẸƎƏƐĠĜǦĞĢƔáàâäǎăāãåǻąæǽǣɓćċĉčçďḍđɗðéèėêëěĕēęẹǝəɛġĝǧğģɣĤḤĦIÍÌİÎÏǏĬĪĨĮỊIJĴĶƘĹĻŁĽĿʼNŃN̈ŇÑŅŊÓÒÔÖǑŎŌÕŐỌØǾƠŒĥḥħıíìiîïǐĭīĩįịijĵķƙĸĺļłľŀʼnńn̈ňñņŋóòôöǒŏōõőọøǿơœŔŘŖŚŜŠŞȘṢẞŤŢṬŦÞÚÙÛÜǓŬŪŨŰŮŲỤƯẂẀŴẄǷÝỲŶŸȲỸƳŹŻŽẒŕřŗſśŝšşșṣßťţṭŧþúùûüǔŭūũűůųụưẃẁŵẅƿýỳŷÿȳỹƴźżžẓ]$/.test(myString)
Another edit: I have added the apostrophe for people with names like O’Neill or O’Reilly. (And the straight and the reversed apostrophe for people who can’t enter the curly one correctly.)
另一个编辑:我添加了O 'Neill和O 'Reilly这样名字的撇号。(对那些不能正确输入卷一的人来说,这是直的和颠倒的撇号。)
#2
12
var onlyLetters = /^[a-zA-Z\u00C0-\u00ff]+$/.test(myString)
#3
9
You can't do this in JS. It has a very limited regex and normalizer support. You would need to construct a lengthy and unmaintainable character array with all possible latin characters with diacritical marks (I guess there are around 500 different ones). Rather delegate the validation task to the server side which uses another language with more regex capabilties, if necessary with help of ajax.
用JS是不行的。它具有非常有限的regex和规范化支持。您将需要构建一个冗长且不可维护的字符数组,其中包含所有可能的带有区分字符标记的拉丁字符(我猜大约有500个不同的字符)。而是将验证任务委托给服务器端,服务器端使用另一种具有更多regex capabilties的语言,如果需要,可以借助ajax。
In a full fledged regex environment you could just test if the string matches \p{L}+
. Here's a Java example:
在完整的regex环境中,您可以测试字符串是否匹配\p{L}+。这是一个Java示例:
boolean valid = string.matches("\\p{L}+");
Alternatively, you could also normailze the text to get rid of the diacritical marks and check if it contains [A-Za-z]+
only. Here's again a Java example:
或者,您也可以对文本进行规范化,以去掉关键字标记,并检查它是否只包含[A-Za-z]+。这里又是一个Java示例:
string = Normalizer.normalize(string, Form.NFD).replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
boolean valid = string.matches("[A-Za-z]+");
PHP supports similar functions.
PHP支持类似的功能。
#4
8
When I tried to implement @Debilski's solution JavaScript didn't like the extended Latin characters -- I had to code them as JavaScript escapes:
当我试图实现@Debilski的解决方案JavaScript时,我不喜欢扩展的拉丁字符——我不得不将它们编码为JavaScript转义:
// The huge unicode escape string is equal to ÆÐƎƏƐƔIJŊŒẞÞǷȜæðǝəɛɣijŋœĸſßþƿȝĄƁÇĐƊĘĦ
// ĮƘŁØƠŞȘŢȚŦŲƯY̨Ƴąɓçđɗęħįƙłøơşșţțŧųưy̨ƴÁÀÂÄǍĂĀÃÅǺĄÆǼǢƁĆĊĈČÇĎḌĐƊÐÉÈĖÊËĚĔĒĘẸƎ
// ƏƐĠĜǦĞĢƔáàâäǎăāãåǻąæǽǣɓćċĉčçďḍđɗðéèėêëěĕēęẹǝəɛġĝǧğģɣĤḤĦIÍÌİÎÏǏĬĪĨĮỊ
// IJĴĶƘĹĻŁĽĿʼNŃN̈ŇÑŅŊÓÒÔÖǑŎŌÕŐỌØǾƠŒĥḥħıíìiîïǐĭīĩįịijĵķƙĸĺļłľŀʼnńn̈ňñ
// ņŋóòôöǒŏōõőọøǿơœŔŘŖŚŜŠŞȘṢẞŤŢṬŦÞÚÙÛÜǓŬŪŨŰŮŲỤƯẂẀŴẄǷÝỲŶŸȲỸƳŹŻŽẒŕřŗſśŝšşșṣßťţṭ
// ŧþúùûüǔŭūũűůųụưẃẁŵẅƿýỳŷÿȳỹƴźżžẓ
function isAlpha(string) {
var patt = /^[a-zA-Z\u00C6\u00D0\u018E\u018F\u0190\u0194\u0132\u014A\u0152\u1E9E\u00DE\u01F7\u021C\u00E6\u00F0\u01DD\u0259\u025B\u0263\u0133\u014B\u0153\u0138\u017F\u00DF\u00FE\u01BF\u021D\u0104\u0181\u00C7\u0110\u018A\u0118\u0126\u012E\u0198\u0141\u00D8\u01A0\u015E\u0218\u0162\u021A\u0166\u0172\u01AFY\u0328\u01B3\u0105\u0253\u00E7\u0111\u0257\u0119\u0127\u012F\u0199\u0142\u00F8\u01A1\u015F\u0219\u0163\u021B\u0167\u0173\u01B0y\u0328\u01B4\u00C1\u00C0\u00C2\u00C4\u01CD\u0102\u0100\u00C3\u00C5\u01FA\u0104\u00C6\u01FC\u01E2\u0181\u0106\u010A\u0108\u010C\u00C7\u010E\u1E0C\u0110\u018A\u00D0\u00C9\u00C8\u0116\u00CA\u00CB\u011A\u0114\u0112\u0118\u1EB8\u018E\u018F\u0190\u0120\u011C\u01E6\u011E\u0122\u0194\u00E1\u00E0\u00E2\u00E4\u01CE\u0103\u0101\u00E3\u00E5\u01FB\u0105\u00E6\u01FD\u01E3\u0253\u0107\u010B\u0109\u010D\u00E7\u010F\u1E0D\u0111\u0257\u00F0\u00E9\u00E8\u0117\u00EA\u00EB\u011B\u0115\u0113\u0119\u1EB9\u01DD\u0259\u025B\u0121\u011D\u01E7\u011F\u0123\u0263\u0124\u1E24\u0126I\u00CD\u00CC\u0130\u00CE\u00CF\u01CF\u012C\u012A\u0128\u012E\u1ECA\u0132\u0134\u0136\u0198\u0139\u013B\u0141\u013D\u013F\u02BCN\u0143N\u0308\u0147\u00D1\u0145\u014A\u00D3\u00D2\u00D4\u00D6\u01D1\u014E\u014C\u00D5\u0150\u1ECC\u00D8\u01FE\u01A0\u0152\u0125\u1E25\u0127\u0131\u00ED\u00ECi\u00EE\u00EF\u01D0\u012D\u012B\u0129\u012F\u1ECB\u0133\u0135\u0137\u0199\u0138\u013A\u013C\u0142\u013E\u0140\u0149\u0144n\u0308\u0148\u00F1\u0146\u014B\u00F3\u00F2\u00F4\u00F6\u01D2\u014F\u014D\u00F5\u0151\u1ECD\u00F8\u01FF\u01A1\u0153\u0154\u0158\u0156\u015A\u015C\u0160\u015E\u0218\u1E62\u1E9E\u0164\u0162\u1E6C\u0166\u00DE\u00DA\u00D9\u00DB\u00DC\u01D3\u016C\u016A\u0168\u0170\u016E\u0172\u1EE4\u01AF\u1E82\u1E80\u0174\u1E84\u01F7\u00DD\u1EF2\u0176\u0178\u0232\u1EF8\u01B3\u0179\u017B\u017D\u1E92\u0155\u0159\u0157\u017F\u015B\u015D\u0161\u015F\u0219\u1E63\u00DF\u0165\u0163\u1E6D\u0167\u00FE\u00FA\u00F9\u00FB\u00FC\u01D4\u016D\u016B\u0169\u0171\u016F\u0173\u1EE5\u01B0\u1E83\u1E81\u0175\u1E85\u01BF\u00FD\u1EF3\u0177\u00FF\u0233\u1EF9\u01B4\u017A\u017C\u017E\u1E93]+$/;
return patt.test(string);
}
#5
7
This can be tricky, unfortunately JavaScript has pretty poor support for internationalization. To do this check you'll have to create your own character class. This is because for instance, \w
is the same as [0-9A-Z_a-z]
which won't help you much and there isn't anything like [[:alpha:]]
in Javascript. But since it sounds like you're only going to use one other langauge you can probably just add those other characters into your character class.
这可能很棘手,不幸的是JavaScript对国际化的支持很差。要执行此检查,您必须创建自己的字符类。这是因为,例如,\w与[0-9A-Z_a-z]是相同的,这对您没有太大帮助,而且在Javascript中没有任何类似[:alpha:]的东西。但是听起来你只会使用另外一个语言,你可以把其他的字符添加到你的角色类中。
By the way, I think you'll need a ?
or *
in your regexp there if myString can be longer than one character.
顺便问一下,我想你需要a吗?或者*在regexp中,如果myString可以超过一个字符。
The full example,
完整的例子,
/^[a-zA-Zéüöêåø]*$/.test(myString);
/ ^[a-zA-Zeuoeaø]*美元/ test(myString);
#6
6
There should be, but the regex will be localization dependent. Thus, é ü ö ê å ø
won't be filtered if you're on a US localization, for example. To ensure your web site does what you want across all localizations, you should explicitly write out the characters in a form similar to what you are already doing.
应该有,但是regex将依赖于本地化。因此,e u o eø不会过滤后如果你在美国本地化为例。为了确保您的web站点在所有本地化中执行您想要的操作,您应该显式地以类似于您正在执行的操作的形式写出字符。
The only standard one I am aware of though is \w
, which would match all alphanumeric characters. You could do it the "standard" way by running two regex, one to verify \w
matches and another to verify that \d
(all digits) does not match, which would result in a guaranteed alpha-only string. Again, I'd strongly urge you not to use this technique as there's no guarantee what \w
will represent in a given localization, but this does answer your question.
我所知道的唯一标准是\w,它将匹配所有字母数字字符。您可以通过运行两个regex来实现“标准”方法,一个用于验证\w匹配,另一个验证\d(所有数字)不匹配,这将导致一个有保证的字母字符串。同样,我强烈建议您不要使用这种技术,因为不能保证在给定的本地化中\w代表什么,但是这确实回答了您的问题。
#7
5
I don't know anything about Javascript, but if it has proper unicode support, convert your string to a decomposed form, then remove the diacritics from it ([\u0300-\u036f\u1dc0-\u1dff]
). Then your letters will only be ASCII ones.
我对Javascript一无所知,但是如果它有适当的unicode支持,那么将您的字符串转换为分解后的形式,然后从它中删除diacritics ([\u0300-\u036f\u1dc0-\u1dff])。那么你的字母将只会是ASCII码的。
#8
5
You could aways use a blacklist instead of a whitelist. That way you only remove the characters you do not need.
你可以使用黑名单而不是白名单。这样,您只删除不需要的字符。
#9
3
You could use a blacklist - a list of characters to exclude.
您可以使用一个黑名单——一个排除字符的列表。
Also, it is important to verify the input on server-side, not only on client-side! Client-side can be bypassed easily.
此外,重要的是要在服务器端验证输入,而不仅仅是在客户端!客户端很容易被绕过。
#10
1
There are some shortcuts to achive this in other regular expression dialects - see this page. But I don't believe there are any standardised ones in JavaScript - certainly not that would be supported by all browsers.
在其他正则表达式方言中,有一些捷径可以实现这一点——请参见本页面。但是我不相信JavaScript中有任何标准化的——当然不是所有浏览器都支持。
#11
1
I'm using a convertor before checking, but it's still not friendly for all languages. I'm not sure that's possible.
在检查之前,我使用了一个转换器,但是它仍然不适合所有的语言。我不确定这是否可能。
function noExtendedChars( input_name ){
var whitelist = [
['a', 'à','á','â','ä','æ','ã','å','ā'],
['c', 'ç', 'ć', 'č'],
['e', 'è','é','ê','ë','ē','ė','ę'],
['i', 'ï','ï','í','ī','į','î'],
['l', 'ł'],
['n', 'ñ', 'ń'],
['o', 'ô', 'ö', 'ò', 'ó', 'œ', 'ø', 'ō', 'õ' ],
['s', 'ß', 'ś', 'š' ],
['u', 'û', 'ü', 'ù', 'ú', 'ū'],
['y', 'ÿ'],
['z', 'ž', 'ź', 'ż']
];
for( b=0; b < blacklist.length; b++ ){
var r= blacklist[b];
for ( a=1; a < r.length; a++ ){
input_name = input_name.replace( new RegExp( r[a], "gi") , r[0]);
}
}
return input_name;
}
#12
0
var regexp = /\B\#[a-zA-Z\x7f-\xff]+/g;
var result = searchText.match(regexp);