在一个混合了多种语言、unicode字母的文本中,c# regex删除不可打印字符和控制字符。

时间:2022-04-11 15:04:19

i would appreciate your help on this, since i do not know which range of characters to use, or if there is a character class like [[:cntrl:]] that i have found in ruby?

我非常感谢您在这方面的帮助,因为我不知道应该使用哪些字符范围,或者如果我在ruby中发现了[[:cntrl:]]这样的字符类?

by means of non printable, i mean delete all characters that are not shown in ie output, when one prints the input string. Please note, i look for a c# regex, i do not have a problem with my code

通过不可打印的方式,我的意思是,当一个人打印输入字符串时,删除所有ie输出中没有显示的字符。请注意,我寻找c# regex,我的代码没有问题

2 个解决方案

#1


13  

You may remove all control and other non-printable characters with

您可以删除所有控件和其他不可打印字符。

s = Regex.Replace(s, @"\p{C}+", string.Empty);

The \p{C} Unicode category class matches all control characters, even those outside the ASCII table because in .NET, Unicode category classes are Unicode-aware by default.

\p{C} Unicode类别类匹配所有控制字符,甚至是ASCII表之外的字符,因为在. net中,Unicode类别类默认是Unicode敏感的。

#2


2  

You can try with :

你可以试试:

string s = "Täkörgåsmrgås";
s = Regex.Replace(s, @"[^\u0000-\u007F]+", string.Empty);


Updated answer after comments:

Documentation about non-printable character: https://en.wikipedia.org/wiki/Control_character

关于不可打印字符的文档:https://en.wikipedia.org/wiki/Control_character

Char.IsControl Method:

Char。IsControl方法:

https://msdn.microsoft.com/en-us/library/system.char.iscontrol.aspx

https://msdn.microsoft.com/en-us/library/system.char.iscontrol.aspx

Maybe you can try:

也许你可以尝试:

string input; // this is your input string
string output = new string(input.Where(c => !char.IsControl(c)).ToArray());

#1


13  

You may remove all control and other non-printable characters with

您可以删除所有控件和其他不可打印字符。

s = Regex.Replace(s, @"\p{C}+", string.Empty);

The \p{C} Unicode category class matches all control characters, even those outside the ASCII table because in .NET, Unicode category classes are Unicode-aware by default.

\p{C} Unicode类别类匹配所有控制字符,甚至是ASCII表之外的字符,因为在. net中,Unicode类别类默认是Unicode敏感的。

#2


2  

You can try with :

你可以试试:

string s = "Täkörgåsmrgås";
s = Regex.Replace(s, @"[^\u0000-\u007F]+", string.Empty);


Updated answer after comments:

Documentation about non-printable character: https://en.wikipedia.org/wiki/Control_character

关于不可打印字符的文档:https://en.wikipedia.org/wiki/Control_character

Char.IsControl Method:

Char。IsControl方法:

https://msdn.microsoft.com/en-us/library/system.char.iscontrol.aspx

https://msdn.microsoft.com/en-us/library/system.char.iscontrol.aspx

Maybe you can try:

也许你可以尝试:

string input; // this is your input string
string output = new string(input.Where(c => !char.IsControl(c)).ToArray());