I'm facing a weird problem with inserting hebrew text into mysql.
Basically the problem is :
I have a php script which picks up hebrew text from a csv file then send it to mysql database. The charset of both database and all fields of tables are set to UTF8 and collation to utf8_bin. But when I insert it using mysql, random garbage value appears inside the text which renders it completely useless for output. NOTE : I can still see half of the words appear correctly.
Here is my homework which might help you in understanding :
1. As I mentioned the table charset and collation are utf8.
2. I've send header('Content-Type: text/html; charset=utf-8')
3. If I echo out the text, it appears perfectly. When I convert it using utf-8_encode it get converted properly. (eg. שי יפת get converted to ×©× ×פת)
4. When I use utf-8_decode on the converted variable and use echo, it still displays perfectly.
5. I've used these after mysql_connect
我将希伯来文本插入mysql面临一个奇怪的问题。基本上问题是:我有一个PHP脚本从csv文件中获取希伯来文本,然后将其发送到mysql数据库。数据库和表的所有字段的字符集都设置为UTF8,排序规则设置为utf8_bin。但是当我使用mysql插入它时,文本中会出现随机垃圾值,这使得它对输出完全没用。注意:我仍然可以看到一半的单词正确显示。这是我的作业可能有助于你理解:1。正如我提到的表格charset和整理是utf8。 2.我发送标题('Content-Type:text / html; charset = utf-8')3。如果我回显文本,它看起来很完美。当我使用utf-8_encode转换它时,它会正确转换。 (例如,שייפת转换为×××××ת)4。当我在转换后的变量上使用utf-8_decode并使用echo时,它仍然可以完美显示。我在mysql_connect之后使用过这些
mysql_query("SET character_set_client = 'utf8';");
mysql_query("SET character_set_result = 'utf8';");
mysql_query("SET NAMES 'utf8'");
mysql_set_charset('utf8');
mysql_query(“SET character_set_client ='utf8';”); mysql_query(“SET character_set_result ='utf8';”); mysql_query(“SET NAMES'utf8'”); mysql_set_charset( 'UTF8');
and even tried this :
mysql_query("SET character_set_results = 'utf8', character_set_client = 'utf8', character_set_connection = 'utf8', character_set_database = 'utf8', character_set_server = 'utf8'", $con)
甚至试过这个:mysql_query(“SET character_set_results ='utf8',character_set_client ='utf8',character_set_connection ='utf8',character_set_database ='utf8',character_set_server ='utf8'”,$ con)
- Added default_charset = "UTF-8" in my php.ini file.
- I am unaware of the encoding used in csv file but when I open it with notepad++ the encoding is utf-8 without BOM.
- Here is a sample of the actual garbage :
original text : שי יפת
text after utf8_encode : ×©× ×פת
text after utf8_decode in same script : שי יפת (perfect)
text send to mysql database : ש×? ×?פת (notice the ? in between)
text if we echo from mysql : ש�? �?פת (the output is close)
- Used addslashes and stripslashes before utf8_encoding. (even tried after no luck)
- Server is on windows running xamp 1.7.4
- Apache 2.2.17
- MySQL 5.5.8 (Community Server)
- PHP 5.3.5 (VC6 X86 32bit)
MySQL 5.5.8(社区服务器)
PHP 5.3.5(VC6 X86 32位)
在我的php.ini文件中添加了default_charset =“UTF-8”。
我不知道csv文件中使用的编码,但是当我用notepad ++打开它时,编码是utf-8而没有BOM。
以下是实际垃圾的示例:原始文本:utf8_encode之后的文字:×××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××× ×?פת(注意中间的?)文本如果我们从mysql回显:ש ? ?פת(输出结束)
在utf8_encoding之前使用了addslashes和stripslashes。 (甚至在没有运气后尝试过)
服务器在Windows上运行xamp 1.7.4 Apache 2.2.17 MySQL 5.5.8(社区服务器)PHP 5.3.5(VC6 X86 32位)
EDIT 1 : Just to clarify that I did searched the site for similar questions and did implemented the suggestions found (SET NAME UTF8 and alot other options etc) but it didn't work out. So please don't mark this question as repeat.
编辑1:只是为了澄清我在网站上搜索了类似的问题并且确实实现了找到的建议(SET NAME UTF8和其他选项等),但它没有成功。所以请不要将此问题标记为重复。
EDIT 2 : Here is the full script :
编辑2:这是完整的脚本:
<?php
header('Content-Type: text/html; charset=utf-8');
if (isset($_GET['filename'])==true)
{
$databasehost = "localhost";
$databasename = "what_csv";
$databaseusername="root";
$databasepassword="";
$databasename= "csv";
$fieldseparator = "\n";
$lineseparator = "@contact\n";
$csvfile = $_GET['filename'];
/********************************/
if(!file_exists($csvfile)) {
echo "File not found. Make sure you specified the correct path.\n";
exit;
}
$file = fopen($csvfile,"r");
if(!$file) {
echo "Error opening data file.\n";
exit;
}
$size = filesize($csvfile);
if(!$size) {
echo "File is empty.\n";
exit;
}
$csvcontent = fread($file,$size);
fclose($file);
$con = @mysql_connect($databasehost,$databaseusername,$databasepassword) or die(mysql_error());
mysql_query( "SET NAMES utf8" );
mysql_set_charset('utf8',$con);
/*
mysql_query("SET character_set_client = 'utf8';");
mysql_query("SET character_set_result = 'utf8';");
mysql_query("SET NAMES 'utf8'");
mysql_set_charset('utf8');
mysql_query("SET character_set_results = 'utf8', character_set_client = 'utf8', character_set_connection = 'utf8', character_set_database = 'utf8', character_set_server = 'utf8'", $con);
*/
@mysql_select_db($databasename) or die(mysql_error());
$lines = 0;
$queries = "";
$linearray = array();
foreach(explode($lineseparator,$csvcontent) as $line) {
$Name="";
$Landline1="";
$Landline2="";
$Mobile="";
$Address="";
$Email="";
$IMEI="temp";
$got_imei=false;
//echo $line.'<br>';
$lines++;
$line = trim($line," \t");
$line = str_replace("\r","",$line);
$linearray = explode($fieldseparator,$line);
//check for values to insert
foreach($linearray as $field)
{
if (is_numeric($field)){ $got_imei=true;$IMEI=trim($field);}
if (stristr($field, 'Name:')) {$Name=trim(str_replace("Name:", "", $field));}
if (stristr($field, 'Landline:')) {$Landline1=trim(str_replace("Landline:", "", $field));}
if (stristr($field, 'Landline2:')) {$Landline2=trim(str_replace("Landline2:", "", $field));}
if (stristr($field, 'Mobile:')) {$Mobile=trim(str_replace("Mobile:", "", $field));}
if (stristr($field, 'Address:')) {$Address=trim(str_replace("Address:", "", $field));}
if (stristr($field, 'Email:')) {$Email=trim(str_replace("Email:", "", $field));}
}
if ($got_imei==true)
{
$query = "UPDATE $databasetable SET imei=$IMEI where imei='temp'";
mysql_query($query);
}
else if (($Name=="") && ($Landline1=="" ) && ($Landline2=="") && ($Mobile=="") && ($Address=="")) {echo "";}
else
{
//$Name = utf8_encode("$Name");
//$Name = addslashes("$Name");
$Name = utf8_encode(mysql_real_escape_string("$Name"));
echo"$Name,$Landline1,$Landline2,$Address,$IMEI<br>";
$query = "insert into $databasetable (imei, name, landline1, landline2, mobile, address, email) values('$IMEI','$Name', '$Landline1','$Landline2','$Mobile', '$Address', '$Email');";
mysql_query($query);
$Name = utf8_decode(($Name));
echo $Name."<br>";
}
}
@mysql_close($con);
echo "Found a total of $lines records in this csv file.\n";
}
?>
<form>
Enter file name <input type="text" name="filename" /><br />
<input type="submit" value="Submit" /><br>
NOTE : File must be present in same directory as this script. Please include full filename, for example filename.csv.
</form>
Here is a sample of csv file :
以下是csv文件的示例:
@contact
Name: שי יפת
Mobile: 0547939898
@IMEI
355310042074173
EDIT 3 :
编辑3:
If I directly enter the string via cmd I get this warning:
如果我通过cmd直接输入字符串,我会收到以下警告:
Warning Code : 1366
Incorrect string value: '\xD7\xA9\xD7\x99 \xD7...' for column 'name' at row 1
Here is something I found on the net that could be related, any help? http://bugs.mysql.com/bug.php?id=30131
以下是我在网上发现的可能相关的内容,有什么帮助吗? http://bugs.mysql.com/bug.php?id=30131
2 个解决方案
#1
1
Use Text/LongText instead of varchar. Also use Collation as utf8_general_ci
使用Text / LongText而不是varchar。也可以使用Collation作为utf8_general_ci
Hope this will help you @Ajit
希望这会帮助你@Ajit
#2
2
I had this problem too. Thees lines solve it:
我也有这个问题。 Thees线解决了它:
mysql_query( "SET NAMES utf8" );
mysql_query( "SET CHARACTER SET utf8" );
Shana Tova
#1
1
Use Text/LongText instead of varchar. Also use Collation as utf8_general_ci
使用Text / LongText而不是varchar。也可以使用Collation作为utf8_general_ci
Hope this will help you @Ajit
希望这会帮助你@Ajit
#2
2
I had this problem too. Thees lines solve it:
我也有这个问题。 Thees线解决了它:
mysql_query( "SET NAMES utf8" );
mysql_query( "SET CHARACTER SET utf8" );
Shana Tova