如何使用sed删除特殊字符

时间:2020-11-26 22:23:04

I would like to find out how to use sed to ONLY remove the space AND the bizarre characters from the following echo command:

我想了解如何使用sed仅从以下echo命令中删除空格和奇怪的字符:

echo -e "A \xd8\xa8"

So I tried:

所以我尝试过:

echo -e "A \xd8\xa8" | sed -r "s/[^[:print:]]//g"

but doesn't remove anything,

但是没有删除任何东西,

echo -e "A \xd8\xa8" | sed -r "s/[^[:alnum:]]//g"

only removes the space

只删除空间

echo -e "A \xd8\xa8" | sed -r "s/[^[:alpha:]]//g"

(same result),

echo -e "A \xd8\xa8" | sed -r "s/[^[:ascii:]]//g"

returns an error (invalid character class name), and

返回错误(无效的字符类名称),和

echo -e "A \xd8\xa8" | sed -r "s/[^\w ]//g"

removes everything...

Expected result: "A"

预期结果:“A”

Any ideas ?

有任何想法吗 ?

thanks!

3 个解决方案

#1


2  

If you want sed to not consider e.g. Arabic characters to be alphabetic (which they are), you need to set a locale that does not consider them thus.

如果你想sed不考虑例如阿拉伯字符是字母(它们是),您需要设置一个不考虑它们的区域设置。

The "C" locale only considers the basic character set, i.e. only [A-Za-z] are alphabetic. I am assuming what you want is to delete everything that's not a character from that range (your question is fuzzy about what you really want):

“C”语言环境仅考虑基本字符集,即仅[A-Za-z]是字母。我假设你想要的是删除那个不是该范围内的角色的所有东西(你的问题很模糊你真正想要的东西):

echo -e "A \xd8\xa8" | LC_CTYPE=C sed -r "s/[^[:alpha:]]//g" | hexdump -C

Output:

00000000  41 0a
00000002

#2


2  

Raw text:

$ echo -e 'A \xd8\xa8' | od -c
0000000   A     330 250  \n
0000005

Remove non-ascii chars:

删除非ascii字符:

$ echo -e 'A \xd8\xa8' | sed 's/[^\x00-\x7F]//g' | od -c
0000000   A      \n
0000003

Remove spaces:

$ echo -e 'A \xd8\xa8' | sed 's/[[:space:]]//g' | od -c
0000000   A 330 250  \n
0000004

Remove non-ascii chars and spaces:

删除非ascii字符和空格:

$ echo -e 'A \xd8\xa8' | sed 's/[^\x00-\x7F]//g; s/[[:space:]]//g' | od -c
0000000   A  \n
0000002

$ echo -e 'A \xd8\xa8' | sed -E 's/[^\x00-\x7F]|[[:space:]]//g' | od -c
0000000   A  \n
0000002

#3


-1  

Try with this:

试试这个:

$ echo -e "A \xd8\xa8 ña ñe ño áÄãç " | sed -r "s/[^a-zA-Z0-9]//g"
Aaeo

An alternative would be to print all ASCII range (but the space character and control characters):

另一种方法是打印所有ASCII范围(但空格字符和控制字符):

$ echo -e "A \xd8\xa8 ña ñe ño áÄãç " | sed -r "s/[^\x21-\x7F]//g"
Aaeo

#1


2  

If you want sed to not consider e.g. Arabic characters to be alphabetic (which they are), you need to set a locale that does not consider them thus.

如果你想sed不考虑例如阿拉伯字符是字母(它们是),您需要设置一个不考虑它们的区域设置。

The "C" locale only considers the basic character set, i.e. only [A-Za-z] are alphabetic. I am assuming what you want is to delete everything that's not a character from that range (your question is fuzzy about what you really want):

“C”语言环境仅考虑基本字符集,即仅[A-Za-z]是字母。我假设你想要的是删除那个不是该范围内的角色的所有东西(你的问题很模糊你真正想要的东西):

echo -e "A \xd8\xa8" | LC_CTYPE=C sed -r "s/[^[:alpha:]]//g" | hexdump -C

Output:

00000000  41 0a
00000002

#2


2  

Raw text:

$ echo -e 'A \xd8\xa8' | od -c
0000000   A     330 250  \n
0000005

Remove non-ascii chars:

删除非ascii字符:

$ echo -e 'A \xd8\xa8' | sed 's/[^\x00-\x7F]//g' | od -c
0000000   A      \n
0000003

Remove spaces:

$ echo -e 'A \xd8\xa8' | sed 's/[[:space:]]//g' | od -c
0000000   A 330 250  \n
0000004

Remove non-ascii chars and spaces:

删除非ascii字符和空格:

$ echo -e 'A \xd8\xa8' | sed 's/[^\x00-\x7F]//g; s/[[:space:]]//g' | od -c
0000000   A  \n
0000002

$ echo -e 'A \xd8\xa8' | sed -E 's/[^\x00-\x7F]|[[:space:]]//g' | od -c
0000000   A  \n
0000002

#3


-1  

Try with this:

试试这个:

$ echo -e "A \xd8\xa8 ña ñe ño áÄãç " | sed -r "s/[^a-zA-Z0-9]//g"
Aaeo

An alternative would be to print all ASCII range (but the space character and control characters):

另一种方法是打印所有ASCII范围(但空格字符和控制字符):

$ echo -e "A \xd8\xa8 ña ñe ño áÄãç " | sed -r "s/[^\x21-\x7F]//g"
Aaeo