I'm modifying a mature CGI application written in Perl and the question of content encoding has come up. The browser reports that the content is iso-8859-1 encoded and the application is declaring iso-8859-1 as the charset in the HTTP headers but doesn't ever seem to actually do the encoding. None of the various encoding techniques described in the perldoc tutorials (Encode, Encoding, Open) are used in the code so I'm a little confused as to how the document is actually being encoded.
我正在修改用Perl编写的成熟的CGI应用程序,并且出现了内容编码的问题。浏览器报告内容是iso-8859-1编码,并且应用程序将iso-8859-1声明为HTTP标头中的字符集,但似乎并没有真正进行编码。 perldoc教程(编码,编码,打开)中描述的各种编码技术都没有在代码中使用,所以我对文档实际编码方式有点困惑。
As mentioned, the application is quite mature and likely predates many of the current encoding methods. Does anyone know of any legacy or deprecated techniques I should be looking for? To what encoding does Perl assume/default to when no direction is provided by the developer?
如上所述,该应用程序非常成熟,可能早于许多当前的编码方法。有谁知道我应该寻找的任何遗留或弃用的技术?当开发人员没有提供方向时,Perl假设/默认为什么编码?
Thanks
4 个解决方案
#1
8
By default Perl handles strings as being byte sequences, so if you read from a file, and print that to STDOUT, it will produce the same byte sequence. If your templates are Latin-1, your output will also be Latin-1.
默认情况下,Perl将字符串处理为字节序列,因此如果从文件中读取并将其打印到STDOUT,它将生成相同的字节序列。如果您的模板是Latin-1,那么您的输出也将是Latin-1。
If you use a string in text string context (like with uc
, lc
and so on) perl assumes Latin-1 semantics, unless the string has been decoded before.
如果在文本字符串上下文中使用字符串(与uc,lc等一样)perl假定为Latin-1语义,除非字符串之前已被解码。
More on Perl, charsets and encodings
更多关于Perl,charsets和编码
#2
2
Perl will not assume anything, but the browser is assuming that encoding based usually on guesswork. The documents are output directly, just as they were written, if none of the encoding techniques is used.
Perl不会假设任何东西,但浏览器假设编码通常基于猜测。如果没有使用任何编码技术,则直接输出文档,就像它们被写入一样。
You can specify the charset in the HTTP Content-Type header.
您可以在HTTP Content-Type标头中指定charset。
#3
1
The first place I'd look is the server configuration. If you aren't setting the content-encoding header in the program, you're likely picking up the server's guess.
我看的第一个地方是服务器配置。如果您没有在程序中设置内容编码标头,那么您很可能会接受服务器的猜测。
Run the script separate from the server to see what its actual output is. When the server gets the output from a CGI program (that's not nph), the server fixes up the header for anything it thinks is missing before it sends it to the client.
独立于服务器运行脚本以查看其实际输出。当服务器从CGI程序(不是nph)获取输出时,服务器会在将其发送给客户端之前修复它认为缺少的任何内容。
#4
0
If the browser reports the content as iso-8859-1, maybe your perl script didn't output the correct headers to specify the charset?
如果浏览器将内容报告为iso-8859-1,那么您的perl脚本可能没有输出正确的标头来指定字符集?
#1
8
By default Perl handles strings as being byte sequences, so if you read from a file, and print that to STDOUT, it will produce the same byte sequence. If your templates are Latin-1, your output will also be Latin-1.
默认情况下,Perl将字符串处理为字节序列,因此如果从文件中读取并将其打印到STDOUT,它将生成相同的字节序列。如果您的模板是Latin-1,那么您的输出也将是Latin-1。
If you use a string in text string context (like with uc
, lc
and so on) perl assumes Latin-1 semantics, unless the string has been decoded before.
如果在文本字符串上下文中使用字符串(与uc,lc等一样)perl假定为Latin-1语义,除非字符串之前已被解码。
More on Perl, charsets and encodings
更多关于Perl,charsets和编码
#2
2
Perl will not assume anything, but the browser is assuming that encoding based usually on guesswork. The documents are output directly, just as they were written, if none of the encoding techniques is used.
Perl不会假设任何东西,但浏览器假设编码通常基于猜测。如果没有使用任何编码技术,则直接输出文档,就像它们被写入一样。
You can specify the charset in the HTTP Content-Type header.
您可以在HTTP Content-Type标头中指定charset。
#3
1
The first place I'd look is the server configuration. If you aren't setting the content-encoding header in the program, you're likely picking up the server's guess.
我看的第一个地方是服务器配置。如果您没有在程序中设置内容编码标头,那么您很可能会接受服务器的猜测。
Run the script separate from the server to see what its actual output is. When the server gets the output from a CGI program (that's not nph), the server fixes up the header for anything it thinks is missing before it sends it to the client.
独立于服务器运行脚本以查看其实际输出。当服务器从CGI程序(不是nph)获取输出时,服务器会在将其发送给客户端之前修复它认为缺少的任何内容。
#4
0
If the browser reports the content as iso-8859-1, maybe your perl script didn't output the correct headers to specify the charset?
如果浏览器将内容报告为iso-8859-1,那么您的perl脚本可能没有输出正确的标头来指定字符集?