MS SQL阿拉伯字母在拉丁排序。

时间:2022-03-02 10:20:01

I have store Arabic data in MS SQL database in a field with Latin collation .. I can read it good using old SqlDrv32.dll driver with odbc. Is there any way to convert this data to Arabic collation with correct data?

我把阿拉伯数据存储在MS SQL数据库中,在一个字段中进行拉丁排序。我可以使用旧的SqlDrv32很好地阅读它。dll与odbc驱动程序。有没有办法把这些数据转换成阿拉伯语的排序?

1 个解决方案

#1


3  

What you need to do is make sure that your column is stored using nvarchar rather than varchar. The difference is that

您需要做的是确保您的列使用nvarchar而不是varchar存储。所不同的是,

  • nvarchar: stores UTF-16 (i.e. 2 bytes per character) unicode
  • nvarchar:存储UTF-16(即每个字符2字节)unicode
  • varchar: stores Ansi (i.e. 1 byte per character) text
  • varchar:存储Ansi(即每个字符1字节)文本

Say you wanted to store the text: "مرحبا"

你想储存文本说:“مرحبا”

If you were to (correctly) store it into an nvarchar column, the table would contain "مرحبا".

如果你(正确地)将它存储为nvarchar列,表将包含“مرحبا”。

Workaround

But i assume you're not able to change the database to the correct column type, that you're stuck with varchar, and you just need it to work with a varchar column in a Latin-1 collation.

但是我假设您无法将数据库更改为正确的列类型,您被困在varchar中,您只需要它与Latin-1排序中的varchar列一起工作。

In that case you will need to use your programming language to interpret the bytes of "مرحبا" as Latin-1 directly:

在这种情况下,您将需要使用编程语言来解释的字节”مرحبا”直接latin - 1:

| Character | Windows 1256 Hex Code | Windows-1252 |
|-----------|-----------------------|--------------|
| م         | E3                    | ã            |
| ر         | D1                    | Ñ            |
| ح         | CD                    | Í            |
| ب         | C8                    | È            |
| ا         | C7                    | Ç            |

So in order to save the:

所以为了保存

  • in order to save the Windows 1256 text: مرحبا
  • 为了节省Windows 1256文本:مرحبا
  • you must interpret it as: ãÑÍÈÇ
  • 你必须把它解释为:aNIEC

And when you save it to the database it will be stored in the varchar Latin-1 column as ãÑÍÈÇ.

当你将它保存到数据库时,它将作为aNIEC存储在varchar Latin-1列中。

Then reading it back

Now you also have to load it back. This means you have to take the string:

现在你还得把它装回去。这意味着你必须拿绳子:

ãÑÍÈÇ

aNIEC

which is how it is displayed when you assume it is Windows-1252 encoded, and assume it is Windows-1256 encoded.

这就是当您假设它是Windows-1252编码,并假设它是Windows-1256编码时的显示方式。

But how do i actually convert it?

I don't know the programming language you use, but we can assume a modern kind of language that provides the ability to convert text to encoded byte-sequences.

我不知道您使用的编程语言,但是我们可以假设一种现代语言,它提供了将文本转换为编码字节序列的能力。

In the end your goal is to write something like:

最后,你的目标是写出以下内容:

dbConnection.ExecuteNonQuery("INSERT INTO Users (Name) VALUES (`مرحبا`)");

First you have to convert your string from "مرحبا" to the Windows-1256 encoded version:

首先你得把你的字符串从“مرحبا”windows - 1256编码版本:

String name = "مرحبا";

//Convert the string into a Windows-1256 encoded byte sequence
Encoding arabic = Encoding.GetEncoding(1256);
byte[] nameAsBytes = arabic.GetBytes(name); //e.g. E3 D1 CD C8 C7

//Convert the byte sequence back to a string, but assume it is Windows-1252
Encoding latin1 = Encoding.GetEncoding(1252);
String latinName = latin1.GetString(nameAsBytes); //e.g. "ãÑÍÈÇ"

Now you can write your SQL statement:

现在可以编写SQL语句:

dbConnection.ExecuteNonQuery("INSERT INTO Users (Name) VALUES ('ãÑÍÈÇ')");

Now when you read the value back from the database, it will come at you as a string:

现在,当您从数据库中读取返回的值时,它将以字符串的形式出现:

SELECT Name FROM Users

Name
----------
ãÑÍÈÇ

1 row(s) affected

You then have to interpret that text (which is Windows-1252) as Windows-1256:

然后你必须将文本(即Windows-1252)解释为Windows-1256:

String latinName = reader.GetString("Name"); //"ãÑÍÈÇ"

Encoding latin1 = Encoding.GetEncoding(1252);
byte[] nameAsBytes = latin1.GetBytes(latinName);
Encoding arabic = Encoding.GetEncoding(1256);
String name = arabic.GetString(nameAsBytes);

//name is now مرحب

The magic is assuming your text is the Windows-1252 encoding. We can convert helper functions to force text to and from Latin1:

魔术是假设你的文本是Windows-1252编码。我们可以将辅助函数转换为强制文本到或从Latin1:

String ToLatin1(String s)
{
   //"مرحب" ==> "ãÑÍÈÇ"
   //(1256) ==> (1252)

   Encoding from = Encoding.GetEncoding(1256);
   byte[] hex = from.GetBytes(s);

   Encoding to = Encoding.GetEncoding(1252);
   String result = to.GetString(hex);

   return result;
}

String FromLatin1(String s)
{
   //"ãÑÍÈÇ" ==> "مرحب"
   //1252    ==> 1256

   Encoding from = Encoding.GetEncoding(1252);
   byte[] hex = from.GetBytes(s);

   Encoding to = Encoding.GetEncoding(1256);
   String result = to.GetString(hex);

   return result;
}       

So it's something like:

这是类似的:

dbConnection.ExecuteNonQuery("INSERT INTO Users(Name) VALUES (@name)",
      ToLatin1("مرحب")
);

and

IDataReader rdr = dbConnection.ExecuteReader("SELECT Name FROM Users");
if (rdr.Read())
{
   String name = FromLatin1(rdr.GetString("Name"));
}

#1


3  

What you need to do is make sure that your column is stored using nvarchar rather than varchar. The difference is that

您需要做的是确保您的列使用nvarchar而不是varchar存储。所不同的是,

  • nvarchar: stores UTF-16 (i.e. 2 bytes per character) unicode
  • nvarchar:存储UTF-16(即每个字符2字节)unicode
  • varchar: stores Ansi (i.e. 1 byte per character) text
  • varchar:存储Ansi(即每个字符1字节)文本

Say you wanted to store the text: "مرحبا"

你想储存文本说:“مرحبا”

If you were to (correctly) store it into an nvarchar column, the table would contain "مرحبا".

如果你(正确地)将它存储为nvarchar列,表将包含“مرحبا”。

Workaround

But i assume you're not able to change the database to the correct column type, that you're stuck with varchar, and you just need it to work with a varchar column in a Latin-1 collation.

但是我假设您无法将数据库更改为正确的列类型,您被困在varchar中,您只需要它与Latin-1排序中的varchar列一起工作。

In that case you will need to use your programming language to interpret the bytes of "مرحبا" as Latin-1 directly:

在这种情况下,您将需要使用编程语言来解释的字节”مرحبا”直接latin - 1:

| Character | Windows 1256 Hex Code | Windows-1252 |
|-----------|-----------------------|--------------|
| م         | E3                    | ã            |
| ر         | D1                    | Ñ            |
| ح         | CD                    | Í            |
| ب         | C8                    | È            |
| ا         | C7                    | Ç            |

So in order to save the:

所以为了保存

  • in order to save the Windows 1256 text: مرحبا
  • 为了节省Windows 1256文本:مرحبا
  • you must interpret it as: ãÑÍÈÇ
  • 你必须把它解释为:aNIEC

And when you save it to the database it will be stored in the varchar Latin-1 column as ãÑÍÈÇ.

当你将它保存到数据库时,它将作为aNIEC存储在varchar Latin-1列中。

Then reading it back

Now you also have to load it back. This means you have to take the string:

现在你还得把它装回去。这意味着你必须拿绳子:

ãÑÍÈÇ

aNIEC

which is how it is displayed when you assume it is Windows-1252 encoded, and assume it is Windows-1256 encoded.

这就是当您假设它是Windows-1252编码,并假设它是Windows-1256编码时的显示方式。

But how do i actually convert it?

I don't know the programming language you use, but we can assume a modern kind of language that provides the ability to convert text to encoded byte-sequences.

我不知道您使用的编程语言,但是我们可以假设一种现代语言,它提供了将文本转换为编码字节序列的能力。

In the end your goal is to write something like:

最后,你的目标是写出以下内容:

dbConnection.ExecuteNonQuery("INSERT INTO Users (Name) VALUES (`مرحبا`)");

First you have to convert your string from "مرحبا" to the Windows-1256 encoded version:

首先你得把你的字符串从“مرحبا”windows - 1256编码版本:

String name = "مرحبا";

//Convert the string into a Windows-1256 encoded byte sequence
Encoding arabic = Encoding.GetEncoding(1256);
byte[] nameAsBytes = arabic.GetBytes(name); //e.g. E3 D1 CD C8 C7

//Convert the byte sequence back to a string, but assume it is Windows-1252
Encoding latin1 = Encoding.GetEncoding(1252);
String latinName = latin1.GetString(nameAsBytes); //e.g. "ãÑÍÈÇ"

Now you can write your SQL statement:

现在可以编写SQL语句:

dbConnection.ExecuteNonQuery("INSERT INTO Users (Name) VALUES ('ãÑÍÈÇ')");

Now when you read the value back from the database, it will come at you as a string:

现在,当您从数据库中读取返回的值时,它将以字符串的形式出现:

SELECT Name FROM Users

Name
----------
ãÑÍÈÇ

1 row(s) affected

You then have to interpret that text (which is Windows-1252) as Windows-1256:

然后你必须将文本(即Windows-1252)解释为Windows-1256:

String latinName = reader.GetString("Name"); //"ãÑÍÈÇ"

Encoding latin1 = Encoding.GetEncoding(1252);
byte[] nameAsBytes = latin1.GetBytes(latinName);
Encoding arabic = Encoding.GetEncoding(1256);
String name = arabic.GetString(nameAsBytes);

//name is now مرحب

The magic is assuming your text is the Windows-1252 encoding. We can convert helper functions to force text to and from Latin1:

魔术是假设你的文本是Windows-1252编码。我们可以将辅助函数转换为强制文本到或从Latin1:

String ToLatin1(String s)
{
   //"مرحب" ==> "ãÑÍÈÇ"
   //(1256) ==> (1252)

   Encoding from = Encoding.GetEncoding(1256);
   byte[] hex = from.GetBytes(s);

   Encoding to = Encoding.GetEncoding(1252);
   String result = to.GetString(hex);

   return result;
}

String FromLatin1(String s)
{
   //"ãÑÍÈÇ" ==> "مرحب"
   //1252    ==> 1256

   Encoding from = Encoding.GetEncoding(1252);
   byte[] hex = from.GetBytes(s);

   Encoding to = Encoding.GetEncoding(1256);
   String result = to.GetString(hex);

   return result;
}       

So it's something like:

这是类似的:

dbConnection.ExecuteNonQuery("INSERT INTO Users(Name) VALUES (@name)",
      ToLatin1("مرحب")
);

and

IDataReader rdr = dbConnection.ExecuteReader("SELECT Name FROM Users");
if (rdr.Read())
{
   String name = FromLatin1(rdr.GetString("Name"));
}