If I run the following Perl program:
如果我运行以下Perl程序:
perl -e 'use utf8; print "鸡\n";'
I get this warning:
我得到这个警告:
Wide character in print at -e line 1.
If I run this Perl program:
如果我运行这个Perl程序:
perl -e 'print "鸡\n";'
I do not get a warning.
我没有得到任何警告。
I thought use utf8
was required to use UTF-8 characters in a Perl script. Why does this not work and how can I fix it? I'm using Perl 5.16.2. I have the same issue if this is in a file instead of being a one liner on the command line.
我认为在Perl脚本中使用utf8字符是必需的。为什么它不能工作,我如何修复它?我使用Perl 5.16.2。如果这是在文件中,而不是命令行中的一行,那么我也有同样的问题。
6 个解决方案
#1
93
Without use utf8
Perl interprets your string as a sequence of single byte characters. There are four bytes in your string as you can see from this:
不使用utf8 Perl将字符串解释为单个字节字符的序列。从这里可以看到,你的字符串中有四个字节:
$ perl -E 'say join ":", map { ord } split //, "鸡\n";'233:184:161:10
The first three bytes make up your character, the last one is the line-feed.
前三个字节组成你的字符,最后一个是线提要。
The call to print
sends these four characters to STDOUT. Your console then works out how to display these characters. If your console is set to use UTF8, then it will interpret those three bytes as your single character and that is what is displayed.
对print的调用将这四个字符发送到STDOUT。控制台然后计算出如何显示这些字符。如果您的控制台设置为使用UTF8,那么它将把这三个字节解释为您的单个字符,这就是所显示的内容。
If we add in the utf8
module, things are different. In this case, Perl interprets your string as just two characters.
如果我们添加utf8模块,情况就不一样了。在本例中,Perl将字符串解释为两个字符。
$ perl -Mutf8 -E 'say join ":", map { ord } split //, "鸡\n";'40481:10
By default, Perl's IO layer assumes that it is working with single-byte characters. So when you try to print a multi-byte character, Perl thinks that something is wrong and gives you a warning. As ever, you can get more explanation for this error by including use diagnostics
. It will say this:
默认情况下,Perl的IO层假定它使用的是单字节字符。因此,当您尝试打印一个多字节字符时,Perl会认为有问题,并给出警告。与以往一样,您可以通过包含使用诊断来获得对这个错误的更多解释。它会说:
(S utf8) Perl met a wide character (>255) when it wasn't expecting one. This warning is by default on for I/O (like print). The easiest way to quiet this warning is simply to add the :utf8 layer to the output, e.g. binmode STDOUT, ':utf8'. Another way to turn off the warning is to add no warnings 'utf8'; but that is often closer to cheating. In general, you are supposed to explicitly mark the filehandle with an encoding, see open and perlfunc/binmode.
(utf8) Perl在没有期望的时候遇到了一个广泛的字符(>255)。此警告默认为I/O(如print)。平息这个警告的最简单方法是在输出中添加:utf8层,例如binmode STDOUT, ':utf8'。关闭警告的另一种方法是添加没有警告的“utf8”;但这往往更接近于作弊。通常,您应该显式地使用编码标记filehandle,请参阅open和perlfunc/binmode。
As others have pointed out you need to tell Perl to accept multi-byte output. There are many ways to do this (see the Perl Unicode Tutorial for some examples). One of the simplest ways is to use the -CS
command line flag - which tells the three standard filehandles (STDIN, STDOUT and STDERR) to deal with UTF8.
正如其他人指出的,您需要告诉Perl接受多字节输出。有很多方法可以做到这一点(有关一些示例,请参阅Perl Unicode教程)。最简单的方法之一是使用- cs命令行标志——它告诉三个标准文件句柄(STDIN、STDOUT和STDERR)来处理UTF8。
$ perl -Mutf8 -e 'print "鸡\n";'Wide character in print at -e line 1.鸡
vs
vs
$ perl -Mutf8 -CS -e 'print "鸡\n";'鸡
Unicode is a big and complex area. As you've seen, many simple programs appear to do the right thing, but for the wrong reasons. When you start to fix part of the program, things will often get worse until you've fixed all of the program.
Unicode是一个庞大而复杂的领域。正如您所看到的,许多简单的程序看起来都是正确的,但是出于错误的原因。当你开始修复程序的一部分时,事情往往会变得更糟,直到你修复了所有的程序。
#2
62
All use utf8;
does is tell Perl the source code is encoded using UTF-8. You need to tell Perl how to encode your text:
所有use utf8;是告诉Perl源代码是用UTF-8编码的。您需要告诉Perl如何对文本进行编码:
use open ':std', ':encoding(UTF-8)';
#3
12
Encode all standard output as UTF-8:
将所有标准输出编码为UTF-8:
binmode STDOUT, ":utf8";
#4
11
You can get close to "just do utf8 everywhere" by using the CPAN module utf8::all
.
通过使用CPAN模块utf8::all,您可以接近“在任何地方都做utf8”。
perl -Mutf8::all -e 'print "鸡\n";'
When print
receives something that it can't print (character larger than 255 when no :encoding
layer is provided), it assumes you meant to encode it using UTF-8. It does so, after warning about the problem.
当print接收到不能打印的内容(当提供no:encoding层时大于255的字符)时,它假定您打算使用UTF-8对其进行编码。在对这个问题发出警告之后,它这样做了。
#5
4
You can use this,
你可以使用这个,
perl -CS filename.
It will also terminates that error.
它还将终止这个错误。
#6
1
In Spanish you can find this error when beside of begin using:
在西班牙语中,你可以在开始使用时发现这个错误:
use utf8;
Your editor encoding is in a different encoding. So what you see on the editor is not what Perl does. To solve that error just change the editor encoding to Unicode/UTF-8.
编辑器编码的编码方式不同。因此,您在编辑器上看到的不是Perl所做的。要解决这个错误,只需将编辑器编码更改为Unicode/UTF-8。
#1
93
Without use utf8
Perl interprets your string as a sequence of single byte characters. There are four bytes in your string as you can see from this:
不使用utf8 Perl将字符串解释为单个字节字符的序列。从这里可以看到,你的字符串中有四个字节:
$ perl -E 'say join ":", map { ord } split //, "鸡\n";'233:184:161:10
The first three bytes make up your character, the last one is the line-feed.
前三个字节组成你的字符,最后一个是线提要。
The call to print
sends these four characters to STDOUT. Your console then works out how to display these characters. If your console is set to use UTF8, then it will interpret those three bytes as your single character and that is what is displayed.
对print的调用将这四个字符发送到STDOUT。控制台然后计算出如何显示这些字符。如果您的控制台设置为使用UTF8,那么它将把这三个字节解释为您的单个字符,这就是所显示的内容。
If we add in the utf8
module, things are different. In this case, Perl interprets your string as just two characters.
如果我们添加utf8模块,情况就不一样了。在本例中,Perl将字符串解释为两个字符。
$ perl -Mutf8 -E 'say join ":", map { ord } split //, "鸡\n";'40481:10
By default, Perl's IO layer assumes that it is working with single-byte characters. So when you try to print a multi-byte character, Perl thinks that something is wrong and gives you a warning. As ever, you can get more explanation for this error by including use diagnostics
. It will say this:
默认情况下,Perl的IO层假定它使用的是单字节字符。因此,当您尝试打印一个多字节字符时,Perl会认为有问题,并给出警告。与以往一样,您可以通过包含使用诊断来获得对这个错误的更多解释。它会说:
(S utf8) Perl met a wide character (>255) when it wasn't expecting one. This warning is by default on for I/O (like print). The easiest way to quiet this warning is simply to add the :utf8 layer to the output, e.g. binmode STDOUT, ':utf8'. Another way to turn off the warning is to add no warnings 'utf8'; but that is often closer to cheating. In general, you are supposed to explicitly mark the filehandle with an encoding, see open and perlfunc/binmode.
(utf8) Perl在没有期望的时候遇到了一个广泛的字符(>255)。此警告默认为I/O(如print)。平息这个警告的最简单方法是在输出中添加:utf8层,例如binmode STDOUT, ':utf8'。关闭警告的另一种方法是添加没有警告的“utf8”;但这往往更接近于作弊。通常,您应该显式地使用编码标记filehandle,请参阅open和perlfunc/binmode。
As others have pointed out you need to tell Perl to accept multi-byte output. There are many ways to do this (see the Perl Unicode Tutorial for some examples). One of the simplest ways is to use the -CS
command line flag - which tells the three standard filehandles (STDIN, STDOUT and STDERR) to deal with UTF8.
正如其他人指出的,您需要告诉Perl接受多字节输出。有很多方法可以做到这一点(有关一些示例,请参阅Perl Unicode教程)。最简单的方法之一是使用- cs命令行标志——它告诉三个标准文件句柄(STDIN、STDOUT和STDERR)来处理UTF8。
$ perl -Mutf8 -e 'print "鸡\n";'Wide character in print at -e line 1.鸡
vs
vs
$ perl -Mutf8 -CS -e 'print "鸡\n";'鸡
Unicode is a big and complex area. As you've seen, many simple programs appear to do the right thing, but for the wrong reasons. When you start to fix part of the program, things will often get worse until you've fixed all of the program.
Unicode是一个庞大而复杂的领域。正如您所看到的,许多简单的程序看起来都是正确的,但是出于错误的原因。当你开始修复程序的一部分时,事情往往会变得更糟,直到你修复了所有的程序。
#2
62
All use utf8;
does is tell Perl the source code is encoded using UTF-8. You need to tell Perl how to encode your text:
所有use utf8;是告诉Perl源代码是用UTF-8编码的。您需要告诉Perl如何对文本进行编码:
use open ':std', ':encoding(UTF-8)';
#3
12
Encode all standard output as UTF-8:
将所有标准输出编码为UTF-8:
binmode STDOUT, ":utf8";
#4
11
You can get close to "just do utf8 everywhere" by using the CPAN module utf8::all
.
通过使用CPAN模块utf8::all,您可以接近“在任何地方都做utf8”。
perl -Mutf8::all -e 'print "鸡\n";'
When print
receives something that it can't print (character larger than 255 when no :encoding
layer is provided), it assumes you meant to encode it using UTF-8. It does so, after warning about the problem.
当print接收到不能打印的内容(当提供no:encoding层时大于255的字符)时,它假定您打算使用UTF-8对其进行编码。在对这个问题发出警告之后,它这样做了。
#5
4
You can use this,
你可以使用这个,
perl -CS filename.
It will also terminates that error.
它还将终止这个错误。
#6
1
In Spanish you can find this error when beside of begin using:
在西班牙语中,你可以在开始使用时发现这个错误:
use utf8;
Your editor encoding is in a different encoding. So what you see on the editor is not what Perl does. To solve that error just change the editor encoding to Unicode/UTF-8.
编辑器编码的编码方式不同。因此,您在编辑器上看到的不是Perl所做的。要解决这个错误,只需将编辑器编码更改为Unicode/UTF-8。