正则表达式允许一组字符并禁止其他字符

时间:2022-09-28 23:09:11

I want to restrict the users from entering the below special characters in a field:

我想限制用户在字段中输入以下特殊字符:

œçşÇŞ
ğĞščřŠŘŇĚŽĎŤČňěž
ůŮ
İťı
—¿„”*@
Newline
Carriage return

A few more will be added to this list but I will have the complete restricted list eventually.

还有一些将添加到此列表中,但我最终将拥有完整的限制列表。

But he can enter certain foreign characters like äöüÄÖÜÿï etc in addition to alphanumeric chars, usual special chars etc.

但除了字母数字字符,通常的特殊字符等,他还可以输入某些外国字符,如äöüÄÖÜÿï等。

Is there an easy way to build a regex for doing this. Adding so many chars in the not allowed list like

有没有一种简单的方法来构建一个正则表达式来做这件事。在不允许的列表中添加如此多的字符

[^œçşÇŞ ğĞščřŠŘŇĚŽĎŤČňěž ůŮ İ ť ı — ¿ „ ” * @]+

does not seem to work.

似乎没有用。

And I do not have the complete list of allowed characters. It would be too long even if I try to get it and would include all chars like:

而且我没有完整的允许字符列表。即使我试图获得它并且将包括所有字符,它将会太长:

~`!#$%^&()[]{};':",.

along with certain foreign chars.

以及某些外国人的角色。

5 个解决方案

#1


You do not mention what "flavor" of regex you are using. Does the following work?

你没有提到你正在使用的正则表达式的“味道”。以下工作如何?

\A[^œçşÇŞ ğĞščřŠŘŇĚŽĎŤČňěž ůŮ İ ť ı — ¿ „ ” * @]+\z

#2


A regular expression can be built to match the incorrect characters, e.g.:

可以构建正则表达式以匹配不正确的字符,例如:

[œçşÇŞ ğĞščřŠŘŇĚŽĎŤČňěž ůŮ İ ť ı]

(I didn't include all the characters; you get the idea!).

(我没有包括所有角色;你明白了!)。

If any character matches, it's a fail.

如果任何字符匹配,则表示失败。

Or, if you need a regular expression that matches valid input, simply add a caret to the front of the brackets like so:

或者,如果您需要一个匹配有效输入的正则表达式,只需在括号前面添加一个插入符,如下所示:

[^œçşÇŞ ğĞščřŠŘŇĚŽĎŤČňěž ůŮ İ ť ı]*

#3


You COULD use a regular expression for this, but why not just check if any of the disallowed characters are in your string with a builtin method? For example, in the .NET world you could use .Contains().

你可以使用正则表达式,但为什么不检查你的字符串中是否有任何不允许的字符使用内置方法?例如,在.NET世界中,您可以使用.Contains()。

Personally, I would create a list of allowed characters, then just check that your string doesn't have any characters that aren't in your list. Using a whitelist will ensure that you haven't forgotten any "bad" characters as well.

就个人而言,我会创建一个允许的字符列表,然后检查您的字符串是否没有列表中没有的任何字符。使用白名单将确保您没有忘记任何“坏”字符。

#4


A few more will be added to this list but I will have the complete restricted list eventually.

还有一些将添加到此列表中,但我最终将拥有完整的限制列表。

And I do not have the complete list of allowed characters (It would be too long even if I try to get it and would include all chars like ~`!#$%^&()[]{};':",.<> alongwith certain foreign chars)

而且我没有完整的允许字符列表(即使我试图获得它也会太长,并且会包含所有字符,如〜!#$%^&()[] {};':“,. <>以及某些外国人的角色)

You will eventually have the list of disallowed characters and probably not the list of allowed characters? You must have either the list of all allowed characters or the list of all disallowed characters. Else you cannot tell if the input is legal. Further more, if you have one of the lists, you have the second implicitly if the character set is known. Then just implement the shorter one.

你最终会得到不允许的字符列表,可能不是允许的字符列表?您必须具有所有允许字符的列表或所有不允许字符的列表。否则你无法判断输入是否合法。此外,如果您有一个列表,如果已知字符集,则隐式地具有第二个列表。然后只执行较短的一个。

Just guessing, but if you use Unicode, there will probably be much more characters you want to disallow than to allow - think of all the fancy Chinees and Japanes symbols. So I think you should really build a list of allowed characters and use ranges like a-z where posiible.

只是猜测,但如果你使用Unicode,可能会有更多的字符你想要禁止而不是允许 - 想想所有花哨的Chinees和Japanes符号。所以我认为你应该真正建立一个允许的字符列表,并使用像a-z这样的范围。

If you really want to build the list of disallowed characters, you will have to build a regular expression like [^œçşÇŞ ğĞščřŠŘŇĚŽĎŤČňěž ůŮ İ ť ı — ¿ „ ” * @]*. Do not forget to escape the characters if required and use ranges if possible.

如果你真的想要构建不允许的字符列表,你将需要构建一个正则表达式,如[^œçşÇŞğĞščřŠŘŇĚŽĎŤČňěž......Ůť - ¿“”* @] *。如果需要,请不要忘记转义字符,并尽可能使用范围。

Adding so many chars in the not allowed list like [^œçşÇŞ ğĞščřŠŘŇĚŽĎŤČňěž ůŮ İ ť ı — ¿ „ ” *@]+ does not seem to work.

在不允许的列表中添加如此多的字符,如[^œçşÇŞğĞščřŠŘŇĚŽĎŤČňěžůŮİťı - ¿“”* @] +似乎不起作用。

There are spaces in your list. Are they in your code, too? I am not sure, but may be this might be a problem.

您的列表中有空格。它们也在你的代码中吗?我不确定,但可能这可能是一个问题。

#5


It would be best to try and match any character that is not allowed by negating the allowed set. For example, if you only wanted to allow 'a' through 'z', you might do the following.

最好通过否定允许的集来尝试匹配任何不允许的字符。例如,如果您只想允许“a”到“z”,则可以执行以下操作。

[^a-z]

You cannot possibly know all of the characters that are not allowed, but you presumably know the ones that are allowed. So, build a regular expression like the one above that matches only one character that is not in the allowed set. If you get a match, you'll know that the string contains an invalid character.

您不可能知道所有不允许的字符,但您可能知道允许的字符。因此,构建一个类似上面的正则表达式,它只匹配一个不在允许集合中的字符。如果你得到一个匹配,你就会知道该字符串包含一个无效字符。

If you can, try to use built-in character class escape codes if they're available.

如果可以,尝试使用内置字符类转义码(如果可用)。

Find them for Perl RE here, look for "Character Classes and other Special Escapes". It may allow you to have a shorter expression like this one.

在这里找到它们的Perl RE,寻找“角色类和其他特殊躲避”。它可能允许你像这样一个较短的表达。

[^\w\d  ..other individual chars..  ]

#1


You do not mention what "flavor" of regex you are using. Does the following work?

你没有提到你正在使用的正则表达式的“味道”。以下工作如何?

\A[^œçşÇŞ ğĞščřŠŘŇĚŽĎŤČňěž ůŮ İ ť ı — ¿ „ ” * @]+\z

#2


A regular expression can be built to match the incorrect characters, e.g.:

可以构建正则表达式以匹配不正确的字符,例如:

[œçşÇŞ ğĞščřŠŘŇĚŽĎŤČňěž ůŮ İ ť ı]

(I didn't include all the characters; you get the idea!).

(我没有包括所有角色;你明白了!)。

If any character matches, it's a fail.

如果任何字符匹配,则表示失败。

Or, if you need a regular expression that matches valid input, simply add a caret to the front of the brackets like so:

或者,如果您需要一个匹配有效输入的正则表达式,只需在括号前面添加一个插入符,如下所示:

[^œçşÇŞ ğĞščřŠŘŇĚŽĎŤČňěž ůŮ İ ť ı]*

#3


You COULD use a regular expression for this, but why not just check if any of the disallowed characters are in your string with a builtin method? For example, in the .NET world you could use .Contains().

你可以使用正则表达式,但为什么不检查你的字符串中是否有任何不允许的字符使用内置方法?例如,在.NET世界中,您可以使用.Contains()。

Personally, I would create a list of allowed characters, then just check that your string doesn't have any characters that aren't in your list. Using a whitelist will ensure that you haven't forgotten any "bad" characters as well.

就个人而言,我会创建一个允许的字符列表,然后检查您的字符串是否没有列表中没有的任何字符。使用白名单将确保您没有忘记任何“坏”字符。

#4


A few more will be added to this list but I will have the complete restricted list eventually.

还有一些将添加到此列表中,但我最终将拥有完整的限制列表。

And I do not have the complete list of allowed characters (It would be too long even if I try to get it and would include all chars like ~`!#$%^&()[]{};':",.<> alongwith certain foreign chars)

而且我没有完整的允许字符列表(即使我试图获得它也会太长,并且会包含所有字符,如〜!#$%^&()[] {};':“,. <>以及某些外国人的角色)

You will eventually have the list of disallowed characters and probably not the list of allowed characters? You must have either the list of all allowed characters or the list of all disallowed characters. Else you cannot tell if the input is legal. Further more, if you have one of the lists, you have the second implicitly if the character set is known. Then just implement the shorter one.

你最终会得到不允许的字符列表,可能不是允许的字符列表?您必须具有所有允许字符的列表或所有不允许字符的列表。否则你无法判断输入是否合法。此外,如果您有一个列表,如果已知字符集,则隐式地具有第二个列表。然后只执行较短的一个。

Just guessing, but if you use Unicode, there will probably be much more characters you want to disallow than to allow - think of all the fancy Chinees and Japanes symbols. So I think you should really build a list of allowed characters and use ranges like a-z where posiible.

只是猜测,但如果你使用Unicode,可能会有更多的字符你想要禁止而不是允许 - 想想所有花哨的Chinees和Japanes符号。所以我认为你应该真正建立一个允许的字符列表,并使用像a-z这样的范围。

If you really want to build the list of disallowed characters, you will have to build a regular expression like [^œçşÇŞ ğĞščřŠŘŇĚŽĎŤČňěž ůŮ İ ť ı — ¿ „ ” * @]*. Do not forget to escape the characters if required and use ranges if possible.

如果你真的想要构建不允许的字符列表,你将需要构建一个正则表达式,如[^œçşÇŞğĞščřŠŘŇĚŽĎŤČňěž......Ůť - ¿“”* @] *。如果需要,请不要忘记转义字符,并尽可能使用范围。

Adding so many chars in the not allowed list like [^œçşÇŞ ğĞščřŠŘŇĚŽĎŤČňěž ůŮ İ ť ı — ¿ „ ” *@]+ does not seem to work.

在不允许的列表中添加如此多的字符,如[^œçşÇŞğĞščřŠŘŇĚŽĎŤČňěžůŮİťı - ¿“”* @] +似乎不起作用。

There are spaces in your list. Are they in your code, too? I am not sure, but may be this might be a problem.

您的列表中有空格。它们也在你的代码中吗?我不确定,但可能这可能是一个问题。

#5


It would be best to try and match any character that is not allowed by negating the allowed set. For example, if you only wanted to allow 'a' through 'z', you might do the following.

最好通过否定允许的集来尝试匹配任何不允许的字符。例如,如果您只想允许“a”到“z”,则可以执行以下操作。

[^a-z]

You cannot possibly know all of the characters that are not allowed, but you presumably know the ones that are allowed. So, build a regular expression like the one above that matches only one character that is not in the allowed set. If you get a match, you'll know that the string contains an invalid character.

您不可能知道所有不允许的字符,但您可能知道允许的字符。因此,构建一个类似上面的正则表达式,它只匹配一个不在允许集合中的字符。如果你得到一个匹配,你就会知道该字符串包含一个无效字符。

If you can, try to use built-in character class escape codes if they're available.

如果可以,尝试使用内置字符类转义码(如果可用)。

Find them for Perl RE here, look for "Character Classes and other Special Escapes". It may allow you to have a shorter expression like this one.

在这里找到它们的Perl RE,寻找“角色类和其他特殊躲避”。它可能允许你像这样一个较短的表达。

[^\w\d  ..other individual chars..  ]