I want to remove all the non-ASCII characters from a file in place.
我要将所有非ascii字符从一个文件中删除。
I found one solution with tr, but I guess I need to write back that file after modification.
我在tr中找到了一个解决方案,但是我想我需要在修改后回写那个文件。
I need to do it in place with relatively good performance.
我需要在相对较好的性能下进行。
Any suggestions?
有什么建议吗?
11 个解决方案
#1
30
# -i (inplace)
sed -i 's/[\d128-\d255]//g' FILENAME
#2
57
A perl oneliner would do: perl -i.bak -pe 's/[^[:ascii:]]//g' <your file>
perl oneliner会做:perl -i。贝克体育' s /[^[ascii:]]/ / g ' <文件>
-i
says that the file is going to be edited inplace, and the backup is going to be saved with extension .bak
.
-我说文件将会被编辑,并且备份将被保存。
#3
12
sed -i 's/[^[:print:]]//' FILENAME
Also, this acts like dos2unix
另外,这也类似于dos2unix。
#4
11
I found the following solution to be working:
我发现下面的解决方案是:
perl -i.bk -pe 's/[^[:ascii:]]//g;' filename
#5
4
I'm using a very minimal busybox system, in which there is no support for ranges in tr
or POSIX character classes, so I have to do it the crappy old-fashioned way. Here's the solution with sed
, stripping ALL non-printable non-ASCII characters from the file:
我使用的是一个非常小的busybox系统,在这个系统中不支持tr或POSIX字符类的范围,所以我不得不采用老式的方法。下面是sed的解决方案,从文件中删除所有不可打印的非ascii字符:
sed -i 's/[^a-zA-Z 0-9`~!@#$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g' FILE
#6
3
As an alternative to sed or perl you may consider to use ed(1) and POSIX character classes.
作为sed或perl的替代方案,您可以考虑使用ed(1)和POSIX字符类。
Note: ed(1) reads the entire file into memory to edit it in-place, so for really large files you should use sed -i ..., perl -i ...
注意:ed(1)将整个文件读入内存来编辑它,所以对于真正大的文件,您应该使用sed -i…,perl -我…
# see:
# - http://wiki.bash-hackers.org/doku.php?id=howto:edit-ed
# - http://en.wikipedia.org/wiki/Regular_expression#POSIX_character_classes
# test
echo $'aaa \177 bbb \200 \214 ccc \254 ddd\r\n' > testfile
ed -s testfile <<< $',l'
ed -s testfile <<< $'H\ng/[^[:graph:][:space:][:cntrl:]]/s///g\nwq'
ed -s testfile <<< $',l'
#7
3
This worked for me:
这工作对我来说:
sed -i 's/[^[:print:]]//g'
#8
2
awk '{ sub("[^a-zA-Z0-9\"!@#$%^&*|_\[](){}", ""); print }' MYinputfile.txt > pipe_out_to_CONVERTED_FILE.txt
#9
1
I tried all the solutions and nothing worked. The following, however, does:
我尝试了所有的解决方案,但没有成功。然而,以下:
tr -cd '\11\12\15\40-\176'
Which I found here:
我在这里找到:
https://alvinalexander.com/blog/post/linux-unix/how-remove-non-printable-ascii-characters-file-unix
https://alvinalexander.com/blog/post/linux-unix/how-remove-non-printable-ascii-characters-file-unix
My problem needed it in a series of piped programs, not directly from a file, so modify as needed.
我的问题需要它在一系列的管道程序中,而不是直接从一个文件,所以根据需要修改。
#10
1
Try tr
instead of sed
试试tr而不是sed。
tr -cd '[:print:]' < file.txt
#11
-1
I appreciate the tips I found on this site.
我很感激我在这个网站上找到的建议。
But, on my Windows 10, I had to use double quotes for this to work ...
但是,在我的Windows 10上,我不得不使用双引号来工作……
sed -i "s/[\d128-\d255]//g" FILENAME
sed - s /[\ d128 - \ d255]/ / g”文件名
Noticed these things ...
注意到这些事情……
-
For FILENAME the entire path\name needs to be quoted This didn't work --
%TEMP%\"FILENAME"
This did --%TEMP%\FILENAME"
对于文件名,整个路径\名需要被引用,这不起作用——%TEMP%\"文件名",%TEMP%\FILENAME"
-
sed leaves behind temp files in the current directory, named sed*
sed在当前目录下的temp文件后面,命名为sed*。
#1
30
# -i (inplace)
sed -i 's/[\d128-\d255]//g' FILENAME
#2
57
A perl oneliner would do: perl -i.bak -pe 's/[^[:ascii:]]//g' <your file>
perl oneliner会做:perl -i。贝克体育' s /[^[ascii:]]/ / g ' <文件>
-i
says that the file is going to be edited inplace, and the backup is going to be saved with extension .bak
.
-我说文件将会被编辑,并且备份将被保存。
#3
12
sed -i 's/[^[:print:]]//' FILENAME
Also, this acts like dos2unix
另外,这也类似于dos2unix。
#4
11
I found the following solution to be working:
我发现下面的解决方案是:
perl -i.bk -pe 's/[^[:ascii:]]//g;' filename
#5
4
I'm using a very minimal busybox system, in which there is no support for ranges in tr
or POSIX character classes, so I have to do it the crappy old-fashioned way. Here's the solution with sed
, stripping ALL non-printable non-ASCII characters from the file:
我使用的是一个非常小的busybox系统,在这个系统中不支持tr或POSIX字符类的范围,所以我不得不采用老式的方法。下面是sed的解决方案,从文件中删除所有不可打印的非ascii字符:
sed -i 's/[^a-zA-Z 0-9`~!@#$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g' FILE
#6
3
As an alternative to sed or perl you may consider to use ed(1) and POSIX character classes.
作为sed或perl的替代方案,您可以考虑使用ed(1)和POSIX字符类。
Note: ed(1) reads the entire file into memory to edit it in-place, so for really large files you should use sed -i ..., perl -i ...
注意:ed(1)将整个文件读入内存来编辑它,所以对于真正大的文件,您应该使用sed -i…,perl -我…
# see:
# - http://wiki.bash-hackers.org/doku.php?id=howto:edit-ed
# - http://en.wikipedia.org/wiki/Regular_expression#POSIX_character_classes
# test
echo $'aaa \177 bbb \200 \214 ccc \254 ddd\r\n' > testfile
ed -s testfile <<< $',l'
ed -s testfile <<< $'H\ng/[^[:graph:][:space:][:cntrl:]]/s///g\nwq'
ed -s testfile <<< $',l'
#7
3
This worked for me:
这工作对我来说:
sed -i 's/[^[:print:]]//g'
#8
2
awk '{ sub("[^a-zA-Z0-9\"!@#$%^&*|_\[](){}", ""); print }' MYinputfile.txt > pipe_out_to_CONVERTED_FILE.txt
#9
1
I tried all the solutions and nothing worked. The following, however, does:
我尝试了所有的解决方案,但没有成功。然而,以下:
tr -cd '\11\12\15\40-\176'
Which I found here:
我在这里找到:
https://alvinalexander.com/blog/post/linux-unix/how-remove-non-printable-ascii-characters-file-unix
https://alvinalexander.com/blog/post/linux-unix/how-remove-non-printable-ascii-characters-file-unix
My problem needed it in a series of piped programs, not directly from a file, so modify as needed.
我的问题需要它在一系列的管道程序中,而不是直接从一个文件,所以根据需要修改。
#10
1
Try tr
instead of sed
试试tr而不是sed。
tr -cd '[:print:]' < file.txt
#11
-1
I appreciate the tips I found on this site.
我很感激我在这个网站上找到的建议。
But, on my Windows 10, I had to use double quotes for this to work ...
但是,在我的Windows 10上,我不得不使用双引号来工作……
sed -i "s/[\d128-\d255]//g" FILENAME
sed - s /[\ d128 - \ d255]/ / g”文件名
Noticed these things ...
注意到这些事情……
-
For FILENAME the entire path\name needs to be quoted This didn't work --
%TEMP%\"FILENAME"
This did --%TEMP%\FILENAME"
对于文件名,整个路径\名需要被引用,这不起作用——%TEMP%\"文件名",%TEMP%\FILENAME"
-
sed leaves behind temp files in the current directory, named sed*
sed在当前目录下的temp文件后面,命名为sed*。