I have a file in UTF-8 encoding with BOM and want to remove the BOM. Are there any linux command-line tools to remove the BOM from the file?
我有一个带有BOM的UTF-8编码文件,想要删除BOM。是否有任何linux命令行工具从文件中删除BOM?
$ file test.xml
test.xml: XML 1.0 document, UTF-8 Unicode (with BOM) text, with very long lines
3 个解决方案
#1
13
A BOM is Unicode codepoint U+FEFF; the UTF-8 encoding consists of the three hex values 0xEF, 0xBB, 0xBF.
BOM是Unicode代码点U + FEFF; UTF-8编码由三个十六进制值0xEF,0xBB,0xBF组成。
With bash, you can create a UTF-8 BOM with the $''
special quoting form, which implements Unicode escapes: $'\uFEFF'
. So with bash, a reliable way of removing a UTF-8 BOM from the beginning of a text file would be:
使用bash,您可以使用$''特殊引号形式创建UTF-8 BOM,它实现Unicode转义:$'\ uFEFF'。因此,使用bash,从文本文件开头删除UTF-8 BOM的可靠方法是:
sed -i $'1s/^\uFEFF//' file.txt
This will leave the file unchanged if it does not start with a UTF-8 BOM, and otherwise remove the BOM.
如果文件不以UTF-8 BOM开头,这将使文件保持不变,否则将删除BOM。
If you are using some other shell, you might find that "$(printf '\ufeff')"
produces the BOM character (that works with zsh
as well as any shell without a printf
builtin, provided that /usr/bin/printf
is the Gnu version ), but if you want a Posix-compatible version you could use:
如果您正在使用其他shell,您可能会发现“$(printf'\ ufeff')”生成BOM字符(与zsh以及任何没有printf内置的shell一起使用,只要/ usr / bin / printf是Gnu版本),但如果你想要一个兼容Posix的版本,你可以使用:
sed "$(printf '1s/^\357\273\277//)" file.txt
(The -i
in-place edit flag is also a Gnu extension; this version writes the possibly-modified file to stdout.)
(-i就地编辑标志也是Gnu扩展;此版本将可能修改的文件写入stdout。)
#2
7
Using VIM
-
Open file in VIM:
在VIM中打开文件:
vi text.xml
-
Remove BOM encoding:
删除BOM编码:
:set nobomb
-
Save and quit:
保存并退出:
:wq
#3
5
It is possible to remove the BOM from a file with the tail
command:
可以使用tail命令从文件中删除BOM:
tail --bytes=+4 withBOM.txt > withoutBOM.txt
#1
13
A BOM is Unicode codepoint U+FEFF; the UTF-8 encoding consists of the three hex values 0xEF, 0xBB, 0xBF.
BOM是Unicode代码点U + FEFF; UTF-8编码由三个十六进制值0xEF,0xBB,0xBF组成。
With bash, you can create a UTF-8 BOM with the $''
special quoting form, which implements Unicode escapes: $'\uFEFF'
. So with bash, a reliable way of removing a UTF-8 BOM from the beginning of a text file would be:
使用bash,您可以使用$''特殊引号形式创建UTF-8 BOM,它实现Unicode转义:$'\ uFEFF'。因此,使用bash,从文本文件开头删除UTF-8 BOM的可靠方法是:
sed -i $'1s/^\uFEFF//' file.txt
This will leave the file unchanged if it does not start with a UTF-8 BOM, and otherwise remove the BOM.
如果文件不以UTF-8 BOM开头,这将使文件保持不变,否则将删除BOM。
If you are using some other shell, you might find that "$(printf '\ufeff')"
produces the BOM character (that works with zsh
as well as any shell without a printf
builtin, provided that /usr/bin/printf
is the Gnu version ), but if you want a Posix-compatible version you could use:
如果您正在使用其他shell,您可能会发现“$(printf'\ ufeff')”生成BOM字符(与zsh以及任何没有printf内置的shell一起使用,只要/ usr / bin / printf是Gnu版本),但如果你想要一个兼容Posix的版本,你可以使用:
sed "$(printf '1s/^\357\273\277//)" file.txt
(The -i
in-place edit flag is also a Gnu extension; this version writes the possibly-modified file to stdout.)
(-i就地编辑标志也是Gnu扩展;此版本将可能修改的文件写入stdout。)
#2
7
Using VIM
-
Open file in VIM:
在VIM中打开文件:
vi text.xml
-
Remove BOM encoding:
删除BOM编码:
:set nobomb
-
Save and quit:
保存并退出:
:wq
#3
5
It is possible to remove the BOM from a file with the tail
command:
可以使用tail命令从文件中删除BOM:
tail --bytes=+4 withBOM.txt > withoutBOM.txt