I try to read a CSV and echo the content. But the content displays the characters wrong.
我尝试读取CSV并回显内容。但是内容显示错误的字符。
Mäx Müstermänn -> Mäx Müstermänn
马马马克斯Mustermann - >¤x¼stermA¤nn
Encoding of the CSV file is UTF-8 without BOM (checked with Notepad++).
CSV文件的编码是UTF-8,没有BOM(用Notepad++ +检查)。
This is the content of the CSV file:
这是CSV文件的内容:
"Mäx";"Müstermänn"
“马克斯”;“Mustermann”
My PHP script
我的PHP脚本
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<?php
$handle = fopen ("specialchars.csv","r");
echo '<table border="1"><tr><td>First name</td><td>Last name</td></tr><tr>';
while ($data = fgetcsv ($handle, 1000, ";")) {
$num = count ($data);
for ($c=0; $c < $num; $c++) {
// output data
echo "<td>$data[$c]</td>";
}
echo "</tr><tr>";
}
?>
</body>
</html>
I tried to use setlocale(LC_ALL, 'de_DE.utf8');
as suggested here without success. The content is still wrong displayed.
我尝试使用setlocale(LC_ALL, 'de_DE.utf8');正如这里所说的,没有成功。内容仍然显示错误。
What I'm missing?
我错过什么?
Edit:
编辑:
An echo mb_detect_encoding($data[$c],'UTF-8');
gives me UTF-8 UTF-8.
一个回声mb_detect_encoding($ data[c]美元,“utf - 8”);给我utf - 8 utf - 8。
echo file_get_contents("specialchars.csv");
gives me "Mäx";"Müstermänn"
.
回声file_get_contents(“specialchars.csv”);给了我“妈¤x”;“马¼stermA¤nn”。
And
和
print_r(str_getcsv(reset(explode("\n", file_get_contents("specialchars.csv"))), ';'))
gives me
给我
Array ( [0] => Mäx [1] => Müstermänn )
阵列([0]= >马¤x[1]= >马¼stermA¤nn)
What does it mean?
这是什么意思?
6 个解决方案
#1
39
Try this:
试试这个:
<?php
$handle = fopen ("specialchars.csv","r");
echo '<table border="1"><tr><td>First name</td><td>Last name</td></tr><tr>';
while ($data = fgetcsv ($handle, 1000, ";")) {
$data = array_map("utf8_encode", $data); //added
$num = count ($data);
for ($c=0; $c < $num; $c++) {
// output data
echo "<td>$data[$c]</td>";
}
echo "</tr><tr>";
}
?>
#2
10
Encountered similar problem: parsing CSV file with special characters like é, è, ö etc ...
遇到类似的问题:解析带有特殊字符的CSV文件,如e、e、o等……
The following worked fine for me:
以下方法对我来说很有效:
To represent the characters correctly on the html page, the header was needed :
为了在html页面上正确地表示字符,需要标题:
header('Content-Type: text/html; charset=UTF-8');
In order to parse every character correctly, I used:
为了正确地解析每个字符,我使用:
utf8_encode(fgets($file));
Dont forget to use in all following string operations the 'Multibyte String Functions', like:
不要忘记在以下所有字符串操作中使用“多字节字符串函数”,如:
mb_strtolower($value, 'UTF-8');
#3
6
Try putting this into the top of your file (before any other output):
试着把这个放到文件的顶部(在任何其他输出之前):
<?php
header('Content-Type: text/html; charset=UTF-8');
?>
#4
4
The problem is that the function returns UTF-8 (it can check using mb_detect_encoding), but do not convert, and these characters takes as UTF-8. Тherefore, it's necessary to do the reverse-convert to initial encoding (Windows-1251 or CP1251) using iconv. But since by the fgetcsv returns an array, I suggest to write a custom function: [Sorry for my english]
问题是该函数返回UTF-8(它可以使用mb_detect_encoding),但是不进行转换,这些字符作为UTF-8。Тherefore,有必要做reverse-convert初始编码使用iconv(windows - 1251或CP1251)。但是由于fgetcsv返回一个数组,我建议编写一个自定义函数:
function customfgetcsv(&$handle, $length, $separator = ';'){
if (($buffer = fgets($handle, $length)) !== false) {
return explode($separator, iconv("CP1251", "UTF-8", $buffer));
}
return false;
}
#5
2
Now I got it working (after removing the header
command). I think the problem was that the encoding of the php file was in ISO-8859-1. I set it to UTF-8 without BOM. I thought I already have done that, but perhaps I made an additional undo.
现在我让它工作(在删除header命令之后)。我认为问题是php文件的编码是ISO-8859-1。我把它设为UTF-8,没有BOM。我以为我已经做了,但也许我做了一个额外的撤销。
Furthermore, I used SET NAMES 'utf8'
for the database. Now it is also correct in the database.
此外,我还为数据库使用了SET名称“utf8”。现在它在数据库中也是正确的。
#6
1
In my case the source file has windows-1250 encoding and iconv prints tons of notices about illegal characters in input string...
在我的例子中,源文件有windows-1250编码,iconv打印了大量关于输入字符串中非法字符的通知……
So this solution helped me a lot:
这个解决方案对我帮助很大:
/**
* getting CSV array with UTF-8 encoding
*
* @param resource &$handle
* @param integer $length
* @param string $separator
*
* @return array|false
*/
private function fgetcsvUTF8(&$handle, $length, $separator = ';')
{
if (($buffer = fgets($handle, $length)) !== false)
{
$buffer = $this->autoUTF($buffer);
return str_getcsv($buffer, $separator);
}
return false;
}
/**
* automatic convertion windows-1250 and iso-8859-2 info utf-8 string
*
* @param string $s
*
* @return string
*/
private function autoUTF($s)
{
// detect UTF-8
if (preg_match('#[\x80-\x{1FF}\x{2000}-\x{3FFF}]#u', $s))
return $s;
// detect WINDOWS-1250
if (preg_match('#[\x7F-\x9F\xBC]#', $s))
return iconv('WINDOWS-1250', 'UTF-8', $s);
// assume ISO-8859-2
return iconv('ISO-8859-2', 'UTF-8', $s);
}
Response to @manvel's answer - use str_getcsv instead of explode - because of cases like this:
响应@manvel的答案——使用str_getcsv而不是爆炸式增长——因为这样的情况:
some;nice;value;"and;here;comes;combinated;value";and;some;others
explode will explode string into parts:
爆炸将会把绳子炸成碎片:
some
nice
value
"and
here
comes
combinated
value"
and
some
others
but str_getcsv will explode string into parts:
但是str_getcsv会将字符串分解成以下部分:
some
nice
value
and;here;comes;combinated;value
and
some
others
#1
39
Try this:
试试这个:
<?php
$handle = fopen ("specialchars.csv","r");
echo '<table border="1"><tr><td>First name</td><td>Last name</td></tr><tr>';
while ($data = fgetcsv ($handle, 1000, ";")) {
$data = array_map("utf8_encode", $data); //added
$num = count ($data);
for ($c=0; $c < $num; $c++) {
// output data
echo "<td>$data[$c]</td>";
}
echo "</tr><tr>";
}
?>
#2
10
Encountered similar problem: parsing CSV file with special characters like é, è, ö etc ...
遇到类似的问题:解析带有特殊字符的CSV文件,如e、e、o等……
The following worked fine for me:
以下方法对我来说很有效:
To represent the characters correctly on the html page, the header was needed :
为了在html页面上正确地表示字符,需要标题:
header('Content-Type: text/html; charset=UTF-8');
In order to parse every character correctly, I used:
为了正确地解析每个字符,我使用:
utf8_encode(fgets($file));
Dont forget to use in all following string operations the 'Multibyte String Functions', like:
不要忘记在以下所有字符串操作中使用“多字节字符串函数”,如:
mb_strtolower($value, 'UTF-8');
#3
6
Try putting this into the top of your file (before any other output):
试着把这个放到文件的顶部(在任何其他输出之前):
<?php
header('Content-Type: text/html; charset=UTF-8');
?>
#4
4
The problem is that the function returns UTF-8 (it can check using mb_detect_encoding), but do not convert, and these characters takes as UTF-8. Тherefore, it's necessary to do the reverse-convert to initial encoding (Windows-1251 or CP1251) using iconv. But since by the fgetcsv returns an array, I suggest to write a custom function: [Sorry for my english]
问题是该函数返回UTF-8(它可以使用mb_detect_encoding),但是不进行转换,这些字符作为UTF-8。Тherefore,有必要做reverse-convert初始编码使用iconv(windows - 1251或CP1251)。但是由于fgetcsv返回一个数组,我建议编写一个自定义函数:
function customfgetcsv(&$handle, $length, $separator = ';'){
if (($buffer = fgets($handle, $length)) !== false) {
return explode($separator, iconv("CP1251", "UTF-8", $buffer));
}
return false;
}
#5
2
Now I got it working (after removing the header
command). I think the problem was that the encoding of the php file was in ISO-8859-1. I set it to UTF-8 without BOM. I thought I already have done that, but perhaps I made an additional undo.
现在我让它工作(在删除header命令之后)。我认为问题是php文件的编码是ISO-8859-1。我把它设为UTF-8,没有BOM。我以为我已经做了,但也许我做了一个额外的撤销。
Furthermore, I used SET NAMES 'utf8'
for the database. Now it is also correct in the database.
此外,我还为数据库使用了SET名称“utf8”。现在它在数据库中也是正确的。
#6
1
In my case the source file has windows-1250 encoding and iconv prints tons of notices about illegal characters in input string...
在我的例子中,源文件有windows-1250编码,iconv打印了大量关于输入字符串中非法字符的通知……
So this solution helped me a lot:
这个解决方案对我帮助很大:
/**
* getting CSV array with UTF-8 encoding
*
* @param resource &$handle
* @param integer $length
* @param string $separator
*
* @return array|false
*/
private function fgetcsvUTF8(&$handle, $length, $separator = ';')
{
if (($buffer = fgets($handle, $length)) !== false)
{
$buffer = $this->autoUTF($buffer);
return str_getcsv($buffer, $separator);
}
return false;
}
/**
* automatic convertion windows-1250 and iso-8859-2 info utf-8 string
*
* @param string $s
*
* @return string
*/
private function autoUTF($s)
{
// detect UTF-8
if (preg_match('#[\x80-\x{1FF}\x{2000}-\x{3FFF}]#u', $s))
return $s;
// detect WINDOWS-1250
if (preg_match('#[\x7F-\x9F\xBC]#', $s))
return iconv('WINDOWS-1250', 'UTF-8', $s);
// assume ISO-8859-2
return iconv('ISO-8859-2', 'UTF-8', $s);
}
Response to @manvel's answer - use str_getcsv instead of explode - because of cases like this:
响应@manvel的答案——使用str_getcsv而不是爆炸式增长——因为这样的情况:
some;nice;value;"and;here;comes;combinated;value";and;some;others
explode will explode string into parts:
爆炸将会把绳子炸成碎片:
some
nice
value
"and
here
comes
combinated
value"
and
some
others
but str_getcsv will explode string into parts:
但是str_getcsv会将字符串分解成以下部分:
some
nice
value
and;here;comes;combinated;value
and
some
others