I have store Arabic data in MS SQL database in a field with Latin collation .. I can read it good using old SqlDrv32.dll driver with odbc. Is there any way to convert this data to Arabic collation with correct data?
我把阿拉伯数据存储在MS SQL数据库中,在一个字段中进行拉丁排序。我可以使用旧的SqlDrv32很好地阅读它。dll与odbc驱动程序。有没有办法把这些数据转换成阿拉伯语的排序?
1 个解决方案
#1
3
What you need to do is make sure that your column is stored using nvarchar
rather than varchar
. The difference is that
您需要做的是确保您的列使用nvarchar而不是varchar存储。所不同的是,
- nvarchar: stores UTF-16 (i.e. 2 bytes per character) unicode
- nvarchar:存储UTF-16(即每个字符2字节)unicode
- varchar: stores Ansi (i.e. 1 byte per character) text
- varchar:存储Ansi(即每个字符1字节)文本
Say you wanted to store the text: "مرحبا"
你想储存文本说:“مرحبا”
If you were to (correctly) store it into an nvarchar
column, the table would contain "مرحبا".
如果你(正确地)将它存储为nvarchar列,表将包含“مرحبا”。
Workaround
But i assume you're not able to change the database to the correct column type, that you're stuck with varchar
, and you just need it to work with a varchar
column in a Latin-1 collation.
但是我假设您无法将数据库更改为正确的列类型,您被困在varchar中,您只需要它与Latin-1排序中的varchar列一起工作。
In that case you will need to use your programming language to interpret the bytes of "مرحبا" as Latin-1 directly:
在这种情况下,您将需要使用编程语言来解释的字节”مرحبا”直接latin - 1:
| Character | Windows 1256 Hex Code | Windows-1252 |
|-----------|-----------------------|--------------|
| م | E3 | ã |
| ر | D1 | Ñ |
| ح | CD | Í |
| ب | C8 | È |
| ا | C7 | Ç |
So in order to save the:
所以为了保存
- in order to save the Windows 1256 text:
مرحبا
- 为了节省Windows 1256文本:مرحبا
- you must interpret it as:
ãÑÍÈÇ
- 你必须把它解释为:aNIEC
And when you save it to the database it will be stored in the varchar
Latin-1 column as ãÑÍÈÇ
.
当你将它保存到数据库时,它将作为aNIEC存储在varchar Latin-1列中。
Then reading it back
Now you also have to load it back. This means you have to take the string:
现在你还得把它装回去。这意味着你必须拿绳子:
ãÑÍÈÇ
aNIEC
which is how it is displayed when you assume it is Windows-1252 encoded, and assume it is Windows-1256 encoded.
这就是当您假设它是Windows-1252编码,并假设它是Windows-1256编码时的显示方式。
But how do i actually convert it?
I don't know the programming language you use, but we can assume a modern kind of language that provides the ability to convert text to encoded byte-sequences.
我不知道您使用的编程语言,但是我们可以假设一种现代语言,它提供了将文本转换为编码字节序列的能力。
In the end your goal is to write something like:
最后,你的目标是写出以下内容:
dbConnection.ExecuteNonQuery("INSERT INTO Users (Name) VALUES (`مرحبا`)");
First you have to convert your string
from "مرحبا" to the Windows-1256 encoded version:
首先你得把你的字符串从“مرحبا”windows - 1256编码版本:
String name = "مرحبا";
//Convert the string into a Windows-1256 encoded byte sequence
Encoding arabic = Encoding.GetEncoding(1256);
byte[] nameAsBytes = arabic.GetBytes(name); //e.g. E3 D1 CD C8 C7
//Convert the byte sequence back to a string, but assume it is Windows-1252
Encoding latin1 = Encoding.GetEncoding(1252);
String latinName = latin1.GetString(nameAsBytes); //e.g. "ãÑÍÈÇ"
Now you can write your SQL statement:
现在可以编写SQL语句:
dbConnection.ExecuteNonQuery("INSERT INTO Users (Name) VALUES ('ãÑÍÈÇ')");
Now when you read the value back from the database, it will come at you as a string
:
现在,当您从数据库中读取返回的值时,它将以字符串的形式出现:
SELECT Name FROM Users
Name
----------
ãÑÍÈÇ
1 row(s) affected
You then have to interpret that text (which is Windows-1252) as Windows-1256:
然后你必须将文本(即Windows-1252)解释为Windows-1256:
String latinName = reader.GetString("Name"); //"ãÑÍÈÇ"
Encoding latin1 = Encoding.GetEncoding(1252);
byte[] nameAsBytes = latin1.GetBytes(latinName);
Encoding arabic = Encoding.GetEncoding(1256);
String name = arabic.GetString(nameAsBytes);
//name is now مرحب
The magic is assuming your text is the Windows-1252 encoding. We can convert helper functions to force text to and from Latin1:
魔术是假设你的文本是Windows-1252编码。我们可以将辅助函数转换为强制文本到或从Latin1:
String ToLatin1(String s)
{
//"مرحب" ==> "ãÑÍÈÇ"
//(1256) ==> (1252)
Encoding from = Encoding.GetEncoding(1256);
byte[] hex = from.GetBytes(s);
Encoding to = Encoding.GetEncoding(1252);
String result = to.GetString(hex);
return result;
}
String FromLatin1(String s)
{
//"ãÑÍÈÇ" ==> "مرحب"
//1252 ==> 1256
Encoding from = Encoding.GetEncoding(1252);
byte[] hex = from.GetBytes(s);
Encoding to = Encoding.GetEncoding(1256);
String result = to.GetString(hex);
return result;
}
So it's something like:
这是类似的:
dbConnection.ExecuteNonQuery("INSERT INTO Users(Name) VALUES (@name)",
ToLatin1("مرحب")
);
and
和
IDataReader rdr = dbConnection.ExecuteReader("SELECT Name FROM Users");
if (rdr.Read())
{
String name = FromLatin1(rdr.GetString("Name"));
}
#1
3
What you need to do is make sure that your column is stored using nvarchar
rather than varchar
. The difference is that
您需要做的是确保您的列使用nvarchar而不是varchar存储。所不同的是,
- nvarchar: stores UTF-16 (i.e. 2 bytes per character) unicode
- nvarchar:存储UTF-16(即每个字符2字节)unicode
- varchar: stores Ansi (i.e. 1 byte per character) text
- varchar:存储Ansi(即每个字符1字节)文本
Say you wanted to store the text: "مرحبا"
你想储存文本说:“مرحبا”
If you were to (correctly) store it into an nvarchar
column, the table would contain "مرحبا".
如果你(正确地)将它存储为nvarchar列,表将包含“مرحبا”。
Workaround
But i assume you're not able to change the database to the correct column type, that you're stuck with varchar
, and you just need it to work with a varchar
column in a Latin-1 collation.
但是我假设您无法将数据库更改为正确的列类型,您被困在varchar中,您只需要它与Latin-1排序中的varchar列一起工作。
In that case you will need to use your programming language to interpret the bytes of "مرحبا" as Latin-1 directly:
在这种情况下,您将需要使用编程语言来解释的字节”مرحبا”直接latin - 1:
| Character | Windows 1256 Hex Code | Windows-1252 |
|-----------|-----------------------|--------------|
| م | E3 | ã |
| ر | D1 | Ñ |
| ح | CD | Í |
| ب | C8 | È |
| ا | C7 | Ç |
So in order to save the:
所以为了保存
- in order to save the Windows 1256 text:
مرحبا
- 为了节省Windows 1256文本:مرحبا
- you must interpret it as:
ãÑÍÈÇ
- 你必须把它解释为:aNIEC
And when you save it to the database it will be stored in the varchar
Latin-1 column as ãÑÍÈÇ
.
当你将它保存到数据库时,它将作为aNIEC存储在varchar Latin-1列中。
Then reading it back
Now you also have to load it back. This means you have to take the string:
现在你还得把它装回去。这意味着你必须拿绳子:
ãÑÍÈÇ
aNIEC
which is how it is displayed when you assume it is Windows-1252 encoded, and assume it is Windows-1256 encoded.
这就是当您假设它是Windows-1252编码,并假设它是Windows-1256编码时的显示方式。
But how do i actually convert it?
I don't know the programming language you use, but we can assume a modern kind of language that provides the ability to convert text to encoded byte-sequences.
我不知道您使用的编程语言,但是我们可以假设一种现代语言,它提供了将文本转换为编码字节序列的能力。
In the end your goal is to write something like:
最后,你的目标是写出以下内容:
dbConnection.ExecuteNonQuery("INSERT INTO Users (Name) VALUES (`مرحبا`)");
First you have to convert your string
from "مرحبا" to the Windows-1256 encoded version:
首先你得把你的字符串从“مرحبا”windows - 1256编码版本:
String name = "مرحبا";
//Convert the string into a Windows-1256 encoded byte sequence
Encoding arabic = Encoding.GetEncoding(1256);
byte[] nameAsBytes = arabic.GetBytes(name); //e.g. E3 D1 CD C8 C7
//Convert the byte sequence back to a string, but assume it is Windows-1252
Encoding latin1 = Encoding.GetEncoding(1252);
String latinName = latin1.GetString(nameAsBytes); //e.g. "ãÑÍÈÇ"
Now you can write your SQL statement:
现在可以编写SQL语句:
dbConnection.ExecuteNonQuery("INSERT INTO Users (Name) VALUES ('ãÑÍÈÇ')");
Now when you read the value back from the database, it will come at you as a string
:
现在,当您从数据库中读取返回的值时,它将以字符串的形式出现:
SELECT Name FROM Users
Name
----------
ãÑÍÈÇ
1 row(s) affected
You then have to interpret that text (which is Windows-1252) as Windows-1256:
然后你必须将文本(即Windows-1252)解释为Windows-1256:
String latinName = reader.GetString("Name"); //"ãÑÍÈÇ"
Encoding latin1 = Encoding.GetEncoding(1252);
byte[] nameAsBytes = latin1.GetBytes(latinName);
Encoding arabic = Encoding.GetEncoding(1256);
String name = arabic.GetString(nameAsBytes);
//name is now مرحب
The magic is assuming your text is the Windows-1252 encoding. We can convert helper functions to force text to and from Latin1:
魔术是假设你的文本是Windows-1252编码。我们可以将辅助函数转换为强制文本到或从Latin1:
String ToLatin1(String s)
{
//"مرحب" ==> "ãÑÍÈÇ"
//(1256) ==> (1252)
Encoding from = Encoding.GetEncoding(1256);
byte[] hex = from.GetBytes(s);
Encoding to = Encoding.GetEncoding(1252);
String result = to.GetString(hex);
return result;
}
String FromLatin1(String s)
{
//"ãÑÍÈÇ" ==> "مرحب"
//1252 ==> 1256
Encoding from = Encoding.GetEncoding(1252);
byte[] hex = from.GetBytes(s);
Encoding to = Encoding.GetEncoding(1256);
String result = to.GetString(hex);
return result;
}
So it's something like:
这是类似的:
dbConnection.ExecuteNonQuery("INSERT INTO Users(Name) VALUES (@name)",
ToLatin1("مرحب")
);
and
和
IDataReader rdr = dbConnection.ExecuteReader("SELECT Name FROM Users");
if (rdr.Read())
{
String name = FromLatin1(rdr.GetString("Name"));
}