如何从字符串中删除所有字符

时间:2021-11-03 19:36:40

How can I remove all characters from a string that are not letters using a JavaScript RegEx?

如何使用JavaScript正则表达式从非字母的字符串中删除所有字符?

3 个解决方案

#1


57  

You can use the replace method:

你可以使用替换方法:

'Hey! The #123 sure is fun!'.replace(/[^A-Za-z]+/g, '');
>>> "HeyThesureisfun"

If you wanted to keep spaces:

如果你想保留空间:

'Hey! The #123 sure is fun!'.replace(/[^A-Za-z\s]+/g, '');
>>> "Hey The sure is fun"

The regex /[^a-z\s]/gi is basically saying to match anything not the letter a-z or a space (\s), while doing this globally (the g flag) and ignoring the case of the string (the i flag).

regex ^[a - z \ s]/ gi基本上是说不匹配任何字母a - z(\ s)或一个空间,而这样做全球(g标志)和忽略字符串的情况下(我国旗)。

#2


10  

RegEx instance properties used g , i

RegEx实例属性使用g, i

global : Whether to test the regular expression against all possible matches in a string, or only against the first.

全局:是针对字符串中的所有可能匹配项测试正则表达式,还是只针对第一个匹配项。

ignoreCase : Whether to ignore case while attempting a match in a string.

ignoreCase:在字符串中尝试匹配时是否忽略大小写。

RegEx special characters used [a-z] , +

RegEx特殊字符使用[a-z], +

[^xyz] : A negated or complemented character set. That is, it matches anything that is not enclosed in the brackets. You can specify a range of characters by using a hyphen.

[^ xyz]:否定或补充字符集。也就是说,它匹配任何不包含在方括号。可以使用连字符来指定字符范围。

For example, [abcd] is the same as [a-d]. They match the 'b' in "brisket" and the 'c' in "chop".

例如,[abcd]与[a-d]相同。它们与“胸肉”中的“b”和“chop”中的“c”相匹配。

+ : Matches the preceding item 1 or more times. Equivalent to {1,}.

+:匹配前一项1或以上次数。等价于{ 1,}。

JavaScript string replace method syntax

JavaScript字符串替换方法语法

str.replace(regexp|substr, newSubStr|function[, Non-standard flags]);

str.replace(regexp | substr newSubStr |函数[,非标旗]);

The non-standard flags g & i can be passed in the replace syntax or built into the regex. examples:

可以在替换语法中传递非标准标志g和i,也可以内置到regex中。例子:

var re = /[^a-z]+/gi;   var str = "this is a string";   var newstr = str.replace(re, "");   print(newstr);

var str = "this is a string";   var newstr = str.replace(/[^a-z]+/, "", "gi");   print(newstr);

To match whitespace characters as well \s would be added to the regex [^a-z\s]+.

来匹配空格字符\ s将被添加到regex ^[a - z \ s]+。

JavaScript Reference

JavaScript参考

#3


8  

Regular Expressions in ECMAScript implementations are IMHO best explained at the Mozilla Developer Network (formerly, Mozilla Developer Center) in the RegExp article of the JavaScript Language Reference pp.

在JavaScript语言引用pp的RegExp文章中,在Mozilla开发人员网络(以前是Mozilla Developer Center)中对ECMAScript实现中的正则表达式进行了最好的解释。

However, as noted, the previous answers do not take non-English letters into account, such as umlauts and accented letters. In order not to remove those letters from the string, you have to exclude them from the character range like so:

然而,如前所述,之前的答案没有考虑非英语字母,如umlauts和重读字母。为了不从字符串中删除这些字母,您必须将它们从字符范围中排除,如下所示:

var s = "Victor 1 jagt 2 zwölf 3 Boxkämpfer 4 quer 5 über 6 den 7 Sylter 8 Deich";

s = s.replace(/[^a-zäöüß]+/gi, "");

This approach quickly becomes tedious and hard to maintain, especially if several natural languages need to be considered (and even in proper English there are foreign words like "déjà vu" and "fiancé").

这种方法很快就会变得乏味且难以维护,尤其是在需要考虑多种自然语言的情况下(即使是在合适的英语中,也有“似曾相识”和“未婚夫”这样的外来词)。

Therefore, among other PCRE features, JSX:regexp.js lets you use Regular Expressions that can use Unicode property classes, through the Unicode Character Database (UCD).

因此,在其他PCRE特性中,JSX:regexp。js允许通过Unicode字符数据库(UCD)使用可以使用Unicode属性类的正则表达式。

You would then write¹

你会写¹

var s = "Victor 1 jagt 2 zwölf 3 Boxkämpfer 4 quer 5 über 6 den 7 Sylter 8 Deich";

var rxNotLetter = new jsx.regexp.RegExp("\\P{Ll}+", "gi");

s = s.replace(rxNotLetter, "");

or

var s = "El 1 veloz 2 murciélago 3 hindú 4 comía 5 feliz 6 cardillo 7 y 8 kiwi. La cigüeña tocaba el saxofón detrás del palenque de paja"
      + " – Съешь 1 же 2 ещё 3 этих 4 мягких 5 французских 6 булок, да 7 выпей 8 чаю.";

var rxNotLetterOrWhitespace = new jsx.regexp.RegExp("[^\\p{Ll}\\p{Lu}\\s]+", "g");

s = s.replace(rxNotLetterOrWhitespace, "");

to reduce dependency on the uppercase/lowercase quirks of implementations (and be more extensible), for a RegExp that excludes all non-letter Unicode characters (and white-space in the second example).

为了减少对大写/小写的实现的依赖(并且更可扩展),对于一个RegExp,它排除了所有非字母的Unicode字符(以及第二个示例中的空白)。

Testcase

Testcase

Be sure to provide a version of the Unicode Character Database as well, because it is large, in flux, and therefore not built into regexp.js (JSX contains a verbose text and compacted script version of the UCD; both can be used, and the latter is preferred, by regexp.js). Note that a conforming ECMAScript implementation does not need to support characters beyond the Basic Multilingual Plane (U+0000 to U+FFFF), so jsx.regexp.RegExp currently cannot support those even though they are in the UCD. See the documentation in the source code for details.

请确保也提供Unicode字符数据库的一个版本,因为它很大,而且是不断变化的,因此不内置到regexp中。JSX包含一个详细的文本和已压缩的UCD脚本版本;两者都可以使用,后者是首选,由regexp.js)。注意,符合条件的ECMAScript实现不需要支持基本的多语言平面(U+0000到U+FFFF)以外的字符,所以jsx.regexp是这样的。RegExp目前无法支持那些即使在UCD中。有关详细信息,请参阅源代码中的文档。

¹ Pangrams from Wikipedia, the free encyclopedia.

¹全字母短句从*,*的百科全书。

#1


57  

You can use the replace method:

你可以使用替换方法:

'Hey! The #123 sure is fun!'.replace(/[^A-Za-z]+/g, '');
>>> "HeyThesureisfun"

If you wanted to keep spaces:

如果你想保留空间:

'Hey! The #123 sure is fun!'.replace(/[^A-Za-z\s]+/g, '');
>>> "Hey The sure is fun"

The regex /[^a-z\s]/gi is basically saying to match anything not the letter a-z or a space (\s), while doing this globally (the g flag) and ignoring the case of the string (the i flag).

regex ^[a - z \ s]/ gi基本上是说不匹配任何字母a - z(\ s)或一个空间,而这样做全球(g标志)和忽略字符串的情况下(我国旗)。

#2


10  

RegEx instance properties used g , i

RegEx实例属性使用g, i

global : Whether to test the regular expression against all possible matches in a string, or only against the first.

全局:是针对字符串中的所有可能匹配项测试正则表达式,还是只针对第一个匹配项。

ignoreCase : Whether to ignore case while attempting a match in a string.

ignoreCase:在字符串中尝试匹配时是否忽略大小写。

RegEx special characters used [a-z] , +

RegEx特殊字符使用[a-z], +

[^xyz] : A negated or complemented character set. That is, it matches anything that is not enclosed in the brackets. You can specify a range of characters by using a hyphen.

[^ xyz]:否定或补充字符集。也就是说,它匹配任何不包含在方括号。可以使用连字符来指定字符范围。

For example, [abcd] is the same as [a-d]. They match the 'b' in "brisket" and the 'c' in "chop".

例如,[abcd]与[a-d]相同。它们与“胸肉”中的“b”和“chop”中的“c”相匹配。

+ : Matches the preceding item 1 or more times. Equivalent to {1,}.

+:匹配前一项1或以上次数。等价于{ 1,}。

JavaScript string replace method syntax

JavaScript字符串替换方法语法

str.replace(regexp|substr, newSubStr|function[, Non-standard flags]);

str.replace(regexp | substr newSubStr |函数[,非标旗]);

The non-standard flags g & i can be passed in the replace syntax or built into the regex. examples:

可以在替换语法中传递非标准标志g和i,也可以内置到regex中。例子:

var re = /[^a-z]+/gi;   var str = "this is a string";   var newstr = str.replace(re, "");   print(newstr);

var str = "this is a string";   var newstr = str.replace(/[^a-z]+/, "", "gi");   print(newstr);

To match whitespace characters as well \s would be added to the regex [^a-z\s]+.

来匹配空格字符\ s将被添加到regex ^[a - z \ s]+。

JavaScript Reference

JavaScript参考

#3


8  

Regular Expressions in ECMAScript implementations are IMHO best explained at the Mozilla Developer Network (formerly, Mozilla Developer Center) in the RegExp article of the JavaScript Language Reference pp.

在JavaScript语言引用pp的RegExp文章中,在Mozilla开发人员网络(以前是Mozilla Developer Center)中对ECMAScript实现中的正则表达式进行了最好的解释。

However, as noted, the previous answers do not take non-English letters into account, such as umlauts and accented letters. In order not to remove those letters from the string, you have to exclude them from the character range like so:

然而,如前所述,之前的答案没有考虑非英语字母,如umlauts和重读字母。为了不从字符串中删除这些字母,您必须将它们从字符范围中排除,如下所示:

var s = "Victor 1 jagt 2 zwölf 3 Boxkämpfer 4 quer 5 über 6 den 7 Sylter 8 Deich";

s = s.replace(/[^a-zäöüß]+/gi, "");

This approach quickly becomes tedious and hard to maintain, especially if several natural languages need to be considered (and even in proper English there are foreign words like "déjà vu" and "fiancé").

这种方法很快就会变得乏味且难以维护,尤其是在需要考虑多种自然语言的情况下(即使是在合适的英语中,也有“似曾相识”和“未婚夫”这样的外来词)。

Therefore, among other PCRE features, JSX:regexp.js lets you use Regular Expressions that can use Unicode property classes, through the Unicode Character Database (UCD).

因此,在其他PCRE特性中,JSX:regexp。js允许通过Unicode字符数据库(UCD)使用可以使用Unicode属性类的正则表达式。

You would then write¹

你会写¹

var s = "Victor 1 jagt 2 zwölf 3 Boxkämpfer 4 quer 5 über 6 den 7 Sylter 8 Deich";

var rxNotLetter = new jsx.regexp.RegExp("\\P{Ll}+", "gi");

s = s.replace(rxNotLetter, "");

or

var s = "El 1 veloz 2 murciélago 3 hindú 4 comía 5 feliz 6 cardillo 7 y 8 kiwi. La cigüeña tocaba el saxofón detrás del palenque de paja"
      + " – Съешь 1 же 2 ещё 3 этих 4 мягких 5 французских 6 булок, да 7 выпей 8 чаю.";

var rxNotLetterOrWhitespace = new jsx.regexp.RegExp("[^\\p{Ll}\\p{Lu}\\s]+", "g");

s = s.replace(rxNotLetterOrWhitespace, "");

to reduce dependency on the uppercase/lowercase quirks of implementations (and be more extensible), for a RegExp that excludes all non-letter Unicode characters (and white-space in the second example).

为了减少对大写/小写的实现的依赖(并且更可扩展),对于一个RegExp,它排除了所有非字母的Unicode字符(以及第二个示例中的空白)。

Testcase

Testcase

Be sure to provide a version of the Unicode Character Database as well, because it is large, in flux, and therefore not built into regexp.js (JSX contains a verbose text and compacted script version of the UCD; both can be used, and the latter is preferred, by regexp.js). Note that a conforming ECMAScript implementation does not need to support characters beyond the Basic Multilingual Plane (U+0000 to U+FFFF), so jsx.regexp.RegExp currently cannot support those even though they are in the UCD. See the documentation in the source code for details.

请确保也提供Unicode字符数据库的一个版本,因为它很大,而且是不断变化的,因此不内置到regexp中。JSX包含一个详细的文本和已压缩的UCD脚本版本;两者都可以使用,后者是首选,由regexp.js)。注意,符合条件的ECMAScript实现不需要支持基本的多语言平面(U+0000到U+FFFF)以外的字符,所以jsx.regexp是这样的。RegExp目前无法支持那些即使在UCD中。有关详细信息,请参阅源代码中的文档。

¹ Pangrams from Wikipedia, the free encyclopedia.

¹全字母短句从*,*的百科全书。