如何在PHP中打开名称中包含unicode字符的文件?

时间:2021-09-12 21:40:31

For example I have a filename like this - проба.xml and I am unable to open it from PHP script.

例如,我有一个这样的文件名 - проба.xml,我无法从PHP脚本打开它。

If I setup php script to be in utf-8 than all the text in script is utf-8 thus when I pass this to file_get_contents:

如果我将php脚本设置为utf-8,那么脚本中的所有文本都是utf-8,因此当我将其传递给file_get_contents时:

$fname = "проба.xml";
file_get_contents($fname);

I get error that file does not exist. The reason for this is that in Windows (XP) all file names with non-latin characters are unicode (UTF-16). OK so I tried this:

我得到文件不存在的错误。原因是在Windows(XP)中,所有带有非拉丁字符的文件名都是unicode(UTF-16)。好的,所以我试过这个:

$fname = "проба.xml";
$res = mb_convert_encoding($fname,'UTF-8','UTF-16');
file_get_contents($res);

But the error persists since file_get_contents can not accept unicode strings...

但错误仍然存​​在,因为file_get_contents无法接受unicode字符串...

Any suggestions?

3 个解决方案

#1


UPDATE (July 13 '17)

Although the docs do not seem to mention it, PHP 7.0 and above finally supports Unicode filenames on Windows out of the box. PHP's Filesystem APIs accept and return filenames according to default_charset, which is UTF-8 by default.

虽然文档似乎没有提及它,但PHP 7.0及更高版本最终支持Windows上的Unicode文件名。 PHP的Filesystem API根据default_charset接受并返回文件名,默认情况下为UTF-8。

Refer to bug fix here: https://github.com/php/php-src/commit/3d3f11ede4cc7c83d64cc5edaae7c29ce9c6986f

请参阅此处的错误修复:https://github.com/php/php-src/commit/3d3f11ede4cc7c83d64cc5edaae7c29ce9c6986f


UPDATE (Jan 29 '15)

If you have access to the PHP extensions directory, you can try installing php-wfio.dll at https://github.com/kenjiuno/php-wfio, and refer to files via the wfio:// protocol.

如果您可以访问PHP扩展目录,可以尝试在https://github.com/kenjiuno/php-wfio上安装php-wfio.dll,并通过wfio://协议引用文件。

file_get_contents("wfio://你好.xml");

Original Answer

PHP on Windows uses the Legacy "ANSI APIs" exclusively for local file access, which means PHP uses the System Locale instead of Unicode.

Windows上的PHP使用Legacy“ANSI API”专门用于本地文件访问,这意味着PHP使用系统区域设置而不是Unicode。

To access files whose filenames contain Unicode, you must convert the filename to the specified encoding for the current System Locale. If the filename contains characters that are not representable in the specified encoding, you're out of luck (Update: See section above for a solution). scandir will return gibberish for these files and passing the string back in fopen and equivalents will fail.

要访问文件名包含Unicode的文件,必须将文件名转换为当前系统区域设置的指定编码。如果文件名包含在指定编码中无法表示的字符,则表示运气不佳(更新:请参阅上面的部分以获取解决方案)。 scandir将为这些文件返回乱码,并将字符串传回fopen,等效将失败。

To find the right encoding to use, you can get the system locale by calling <?=setlocale(LC_TYPE,0)?>, and looking up the Code Page Identifier (the number after the .) at the MSDN Article https://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx.

要找到要使用的正确编码,可以通过调用 ,并在MSDN文章https://上查找代码页标识符(。之后的数字)来获取系统区域设置。 msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx。

For example, if the function returns Chinese (Traditional)_HKG.950, this means that the 950 codepage is in use and the filename should be converted to the big-5 encoding. In that case, your code will have to be as follows, if your file is saved in UTF-8 (preferrably without BOM):

例如,如果函数返回中文(繁体)_HKG.950,则表示950代码页正在使用,文件名应转换为big-5编码。在这种情况下,如果您的文件以UTF-8保存(最好没有BOM),您的代码必须如下所示:

$fname = iconv('UTF-8','big-5',"你好.xml");
file_get_contents($fname);

or as follows if you directly save the file as Big-5:

如果直接将文件保存为Big-5,则如下所示:

$fname = "你好.xml";
file_get_contents($fname);

#2


You could try:

你可以尝试:

  • getting the string for the filename from a directory listing using opendir and readdir
  • 使用opendir和readdir从目录列表中获取文件名的字符串

  • passing that string to file_get _contents to see if that will work, or
  • 将该字符串传递给file_get _contents以查看它是否有效,或者

  • try getting the content of the file using fopen, fread and fclose
  • 尝试使用fopen,fread和fclose获取文件的内容

Hope this helps!

希望这可以帮助!

#3


These are conclusions so far:

这些是到目前为止的结论:

  1. PHP 5 can not open filename with unicode characters unless the source filename is unicode.
  2. 除非源文件名是unicode,否则PHP 5无法使用unicode字符打开文件名。

  3. PHP 5 (at least on windows XP) is not able to process PHP source in unicode.
  4. PHP 5(至少在Windows XP上)无法在unicode中处理PHP源代码。

Thus the conclusion this not doable in PHP 5.

因此,这个结论在PHP 5中无法实现。

#1


UPDATE (July 13 '17)

Although the docs do not seem to mention it, PHP 7.0 and above finally supports Unicode filenames on Windows out of the box. PHP's Filesystem APIs accept and return filenames according to default_charset, which is UTF-8 by default.

虽然文档似乎没有提及它,但PHP 7.0及更高版本最终支持Windows上的Unicode文件名。 PHP的Filesystem API根据default_charset接受并返回文件名,默认情况下为UTF-8。

Refer to bug fix here: https://github.com/php/php-src/commit/3d3f11ede4cc7c83d64cc5edaae7c29ce9c6986f

请参阅此处的错误修复:https://github.com/php/php-src/commit/3d3f11ede4cc7c83d64cc5edaae7c29ce9c6986f


UPDATE (Jan 29 '15)

If you have access to the PHP extensions directory, you can try installing php-wfio.dll at https://github.com/kenjiuno/php-wfio, and refer to files via the wfio:// protocol.

如果您可以访问PHP扩展目录,可以尝试在https://github.com/kenjiuno/php-wfio上安装php-wfio.dll,并通过wfio://协议引用文件。

file_get_contents("wfio://你好.xml");

Original Answer

PHP on Windows uses the Legacy "ANSI APIs" exclusively for local file access, which means PHP uses the System Locale instead of Unicode.

Windows上的PHP使用Legacy“ANSI API”专门用于本地文件访问,这意味着PHP使用系统区域设置而不是Unicode。

To access files whose filenames contain Unicode, you must convert the filename to the specified encoding for the current System Locale. If the filename contains characters that are not representable in the specified encoding, you're out of luck (Update: See section above for a solution). scandir will return gibberish for these files and passing the string back in fopen and equivalents will fail.

要访问文件名包含Unicode的文件,必须将文件名转换为当前系统区域设置的指定编码。如果文件名包含在指定编码中无法表示的字符,则表示运气不佳(更新:请参阅上面的部分以获取解决方案)。 scandir将为这些文件返回乱码,并将字符串传回fopen,等效将失败。

To find the right encoding to use, you can get the system locale by calling <?=setlocale(LC_TYPE,0)?>, and looking up the Code Page Identifier (the number after the .) at the MSDN Article https://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx.

要找到要使用的正确编码,可以通过调用 ,并在MSDN文章https://上查找代码页标识符(。之后的数字)来获取系统区域设置。 msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx。

For example, if the function returns Chinese (Traditional)_HKG.950, this means that the 950 codepage is in use and the filename should be converted to the big-5 encoding. In that case, your code will have to be as follows, if your file is saved in UTF-8 (preferrably without BOM):

例如,如果函数返回中文(繁体)_HKG.950,则表示950代码页正在使用,文件名应转换为big-5编码。在这种情况下,如果您的文件以UTF-8保存(最好没有BOM),您的代码必须如下所示:

$fname = iconv('UTF-8','big-5',"你好.xml");
file_get_contents($fname);

or as follows if you directly save the file as Big-5:

如果直接将文件保存为Big-5,则如下所示:

$fname = "你好.xml";
file_get_contents($fname);

#2


You could try:

你可以尝试:

  • getting the string for the filename from a directory listing using opendir and readdir
  • 使用opendir和readdir从目录列表中获取文件名的字符串

  • passing that string to file_get _contents to see if that will work, or
  • 将该字符串传递给file_get _contents以查看它是否有效,或者

  • try getting the content of the file using fopen, fread and fclose
  • 尝试使用fopen,fread和fclose获取文件的内容

Hope this helps!

希望这可以帮助!

#3


These are conclusions so far:

这些是到目前为止的结论:

  1. PHP 5 can not open filename with unicode characters unless the source filename is unicode.
  2. 除非源文件名是unicode,否则PHP 5无法使用unicode字符打开文件名。

  3. PHP 5 (at least on windows XP) is not able to process PHP source in unicode.
  4. PHP 5(至少在Windows XP上)无法在unicode中处理PHP源代码。

Thus the conclusion this not doable in PHP 5.

因此,这个结论在PHP 5中无法实现。