I would like to find out how to use sed to ONLY remove the space AND the bizarre characters from the following echo command:
我想了解如何使用sed仅从以下echo命令中删除空格和奇怪的字符:
echo -e "A \xd8\xa8"
So I tried:
所以我尝试过:
echo -e "A \xd8\xa8" | sed -r "s/[^[:print:]]//g"
but doesn't remove anything,
但是没有删除任何东西,
echo -e "A \xd8\xa8" | sed -r "s/[^[:alnum:]]//g"
only removes the space
只删除空间
echo -e "A \xd8\xa8" | sed -r "s/[^[:alpha:]]//g"
(same result),
echo -e "A \xd8\xa8" | sed -r "s/[^[:ascii:]]//g"
returns an error (invalid character class name), and
返回错误(无效的字符类名称),和
echo -e "A \xd8\xa8" | sed -r "s/[^\w ]//g"
removes everything...
Expected result: "A"
预期结果:“A”
Any ideas ?
有任何想法吗 ?
thanks!
3 个解决方案
#1
2
If you want sed
to not consider e.g. Arabic characters to be alphabetic (which they are), you need to set a locale that does not consider them thus.
如果你想sed不考虑例如阿拉伯字符是字母(它们是),您需要设置一个不考虑它们的区域设置。
The "C" locale only considers the basic character set, i.e. only [A-Za-z]
are alphabetic. I am assuming what you want is to delete everything that's not a character from that range (your question is fuzzy about what you really want):
“C”语言环境仅考虑基本字符集,即仅[A-Za-z]是字母。我假设你想要的是删除那个不是该范围内的角色的所有东西(你的问题很模糊你真正想要的东西):
echo -e "A \xd8\xa8" | LC_CTYPE=C sed -r "s/[^[:alpha:]]//g" | hexdump -C
Output:
00000000 41 0a
00000002
#2
2
Raw text:
$ echo -e 'A \xd8\xa8' | od -c
0000000 A 330 250 \n
0000005
Remove non-ascii chars:
删除非ascii字符:
$ echo -e 'A \xd8\xa8' | sed 's/[^\x00-\x7F]//g' | od -c
0000000 A \n
0000003
Remove spaces:
$ echo -e 'A \xd8\xa8' | sed 's/[[:space:]]//g' | od -c
0000000 A 330 250 \n
0000004
Remove non-ascii chars and spaces:
删除非ascii字符和空格:
$ echo -e 'A \xd8\xa8' | sed 's/[^\x00-\x7F]//g; s/[[:space:]]//g' | od -c
0000000 A \n
0000002
$ echo -e 'A \xd8\xa8' | sed -E 's/[^\x00-\x7F]|[[:space:]]//g' | od -c
0000000 A \n
0000002
#3
-1
Try with this:
试试这个:
$ echo -e "A \xd8\xa8 ña ñe ño áÄãç " | sed -r "s/[^a-zA-Z0-9]//g"
Aaeo
An alternative would be to print all ASCII range (but the space character and control characters):
另一种方法是打印所有ASCII范围(但空格字符和控制字符):
$ echo -e "A \xd8\xa8 ña ñe ño áÄãç " | sed -r "s/[^\x21-\x7F]//g"
Aaeo
#1
2
If you want sed
to not consider e.g. Arabic characters to be alphabetic (which they are), you need to set a locale that does not consider them thus.
如果你想sed不考虑例如阿拉伯字符是字母(它们是),您需要设置一个不考虑它们的区域设置。
The "C" locale only considers the basic character set, i.e. only [A-Za-z]
are alphabetic. I am assuming what you want is to delete everything that's not a character from that range (your question is fuzzy about what you really want):
“C”语言环境仅考虑基本字符集,即仅[A-Za-z]是字母。我假设你想要的是删除那个不是该范围内的角色的所有东西(你的问题很模糊你真正想要的东西):
echo -e "A \xd8\xa8" | LC_CTYPE=C sed -r "s/[^[:alpha:]]//g" | hexdump -C
Output:
00000000 41 0a
00000002
#2
2
Raw text:
$ echo -e 'A \xd8\xa8' | od -c
0000000 A 330 250 \n
0000005
Remove non-ascii chars:
删除非ascii字符:
$ echo -e 'A \xd8\xa8' | sed 's/[^\x00-\x7F]//g' | od -c
0000000 A \n
0000003
Remove spaces:
$ echo -e 'A \xd8\xa8' | sed 's/[[:space:]]//g' | od -c
0000000 A 330 250 \n
0000004
Remove non-ascii chars and spaces:
删除非ascii字符和空格:
$ echo -e 'A \xd8\xa8' | sed 's/[^\x00-\x7F]//g; s/[[:space:]]//g' | od -c
0000000 A \n
0000002
$ echo -e 'A \xd8\xa8' | sed -E 's/[^\x00-\x7F]|[[:space:]]//g' | od -c
0000000 A \n
0000002
#3
-1
Try with this:
试试这个:
$ echo -e "A \xd8\xa8 ña ñe ño áÄãç " | sed -r "s/[^a-zA-Z0-9]//g"
Aaeo
An alternative would be to print all ASCII range (but the space character and control characters):
另一种方法是打印所有ASCII范围(但空格字符和控制字符):
$ echo -e "A \xd8\xa8 ña ñe ño áÄãç " | sed -r "s/[^\x21-\x7F]//g"
Aaeo