This question already has an answer here:
这个问题在这里已有答案:
- Remove non-ASCII characters from CSV 11 answers
- 从CSV 11答案中删除非ASCII字符
I am trying to manipulate a text file and remove non-ASCII characters from the text. I don't want to remove the line. I only want to remove the offending characters. I am trying to get the following expression to work:
我试图操纵文本文件并从文本中删除非ASCII字符。我不想删除该行。我只想删除有问题的字符。我试图让以下表达式工作:
sed '/[\x80-\xFF]/d'
sed'/ [\ x80- \ xFF] / d'
1 个解决方案
#1
33
The suggested solutions may fail with specific version of sed, e.g. GNU sed 4.2.1.
建议的解决方案可能会因特定版本的sed而失败,例如: GNU sed 4.2.1。
Using tr
:
使用tr:
tr -cd '[:print:]' < yourfile.txt
This will remove any characters not in [\x20-\x7e]
.
这将删除[\ x20- \ x7e]中没有的任何字符。
If you want to keep e.g. line feeds, just add \n
:
如果你想保持,例如换行,只需添加\ n:
tr -cd '[:print:]\n' < yourfile.txt
If you really want to keep all ASCII characters (even the control codes):
如果你真的想保留所有ASCII字符(甚至是控制代码):
tr -cd '[:print:][:cntrl:]' < yourfile.txt
This will remove any characters not in [\x00-\x7f]
.
这将删除不在[\ x00- \ x7f]中的任何字符。
#1
33
The suggested solutions may fail with specific version of sed, e.g. GNU sed 4.2.1.
建议的解决方案可能会因特定版本的sed而失败,例如: GNU sed 4.2.1。
Using tr
:
使用tr:
tr -cd '[:print:]' < yourfile.txt
This will remove any characters not in [\x20-\x7e]
.
这将删除[\ x20- \ x7e]中没有的任何字符。
If you want to keep e.g. line feeds, just add \n
:
如果你想保持,例如换行,只需添加\ n:
tr -cd '[:print:]\n' < yourfile.txt
If you really want to keep all ASCII characters (even the control codes):
如果你真的想保留所有ASCII字符(甚至是控制代码):
tr -cd '[:print:][:cntrl:]' < yourfile.txt
This will remove any characters not in [\x00-\x7f]
.
这将删除不在[\ x00- \ x7f]中的任何字符。