使用Delphi 6处理Unicode字符

时间:2021-11-28 20:18:15

I have a polling application developed in Delphi 6. It reads a file, parse the file according to specification, performs validation and uploads into database (SQL Server 2008 Express Edition)

我有一个在Delphi 6中开发的轮询应用程序。它读取文件,根据规范解析文件,执行验证并上传到数据库(SQL Server 2008 Express Edition)

We had to provide support for Operating Systems having Double Byte Character Sets (DBCS) e.g. Japanese OS. So, we changed the database fields in SQL Server from varchar to nvarchar.

我们必须为具有双字节字符集(DBCS)的操作系统提供支持,例如:日本OS。因此,我们将SQL Server中的数据库字段从varchar更改为nvarchar。

Polling works fine in Operating Systems with DBCS. It also works successfully for non-DBCS Operating systems, if the System Locale is set to Japanese/Chinese/Korean and Operating system has the respective language pack. But, if the Locale is set to english then, the database contains junk characters for the double byte characters.

轮询在使用DBCS的操作系统中工作正常。如果系统区域设置设置为日语/中文/韩语且操作系统具有相应的语言包,则它也可以成功用于非DBCS操作系统。但是,如果Locale设置为english,则数据库包含双字节字符的垃圾字符。

I performed a few tests but failed to identify the solution.

我进行了一些测试但未能确定解决方案。

e.g. If I read from a UTF-8 file using a TStringList and save it to another file then, the Unicode data is saved. But, if I use the contents of the file to run an update query using TADOQuery component then, the junk characters are shown. The database also contains the junk characters.

例如如果我使用TStringList从UTF-8文件读取并将其保存到另一个文件,则保存Unicode数据。但是,如果我使用该文件的内容使用TADOQuery组件运行更新查询,则会显示垃圾字符。该数据库还包含垃圾字符。

PFB the sample code:

PFB示例代码:

var
    stlTemp : TStringList;
    qry : TADOQuery;
    stQuery : string;
begin
    stlTemp := TStringList.Create;
    qry := TADOQuery.Create(nil);
    stlTemp.LoadFromFile('D:\DelphiUnicode\unicode.txt');
    //stlTemp.SaveToFile('D:\DelphiUnicode\1.txt'); // This works. Even though 
    //the stlTemp.Strings[0] contains junk characters if seen in watch

    stQuery := 'UPDATE dbo.receivers SET company = ' + QuotedStr(stlTemp.Strings[0]) +
        ' WHERE receiver_cd = N' + QuotedStr('Receiver'); 
    //company is a nvarchar field in the  database
    qry.Connection := ADOConnection1;
    with qry do
    begin
        Close;
        SQL.Clear;
        SQL.Add(stQuery);
        ExecSQL;
    end;
    qry.Free;
    stlTemp.Free
end;

The above code works fine in a DBCS Operating system.

上面的代码在DBCS操作系统中正常工作。

I have tried playing with string,widestring and UTF8String. But, this does not work in English OS if the locale is set to English.

我尝试过使用string,widestring和UTF8String。但是,如果语言环境设置为英语,则在英语操作系统中不起作用。

Please provide any pointers for this issue.

请提供此问题的任何指示。

2 个解决方案

#1


3  

In non Unicode Delphi version, The basics are that you need to work with WideStrings (Unicode) instead of Strings (Ansi).

在非Unicode Delphi版本中,基础是你需要使用WideStrings(Unicode)而不是字符串(Ansi)。

Forget about TADOQuery.SQL (TStrings), and work with TADODataSet.CommandText or TADOCommand.CommandText(WideString) or typecast TADOQuery as TADODataSet. e.g:

忘记TADOQuery.SQL(TStrings),并使用TADODataSet.CommandText或TADOCommand.CommandText(WideString)或类型转换TADOQuery作为TADODataSet。例如:

stlTemp: TWideStringList; // <- Unicode strings - TNT or other Unicode lib
qry: TADOQuery;
stQuery: WideString; // <- Unicode string

TADODataSet(qry).CommandText := stQuery;
RowsAffected := qry.ExecSQL;

You can also use TADOConnection.Execute(stQuery) to execute queries directly.

您还可以使用TADOConnection.Execute(stQuery)直接执行查询。


Be extra careful with Parametrized queries: ADODB.TParameters.ParseSQL is Ansi. If ParamCheck is true (by default) TADOCommand.SetCommandText->AssignCommandText will cause problems if your Query is Unicode (InitParameters is Ansi).

使用参数化查询时要格外小心:ADODB.TParameters.ParseSQL是Ansi。如果ParamCheck为true(默认情况下),如果您的Query是Unicode(InitParameters是Ansi),TADOCommand.SetCommandText-> AssignCommandText将导致问题。

(note that you can use ADO Command.Parameters directly - using ? chars as placeholder for the parameter instead of Delphi's convention :param_name).

(请注意,您可以直接使用ADO Command.Parameters - 使用?chars作为参数的占位符而不是Delphi的约定:param_name)。


QuotedStr returns Ansi string. You need a Wide version of this function (TNT)

QuotedStr返回Ansi字符串。你需要这个功能的宽版本(TNT)


Also, As @Arioch 'The mentioned TNT Unicode Controls suite is your best fried for making Delphi Unicode application. It has all the controls and classes you need to successfully manage Unicode tasks in your application.

此外,作为@Arioch'提到的TNT Unicode控件套件是您最好的制作Delphi Unicode应用程序。它具有在应用程序中成功管理Unicode任务所需的所有控件和类。

In short, you need to think Wide :)

总之,你需要思考宽:)

#2


3  

  1. You did not specified database server, so this investigation remains on our part. You should check how does your database server support Unicode. That means how to specify Unicode charset for the database and the tables/column/indices/collations/etc inside it. You have to ensure that the whole DB is pervasively Unicode-enabled in every its detail, to avoid data loss.

    您没有指定数据库服务器,因此我们仍在进行此调查。您应该检查数据库服务器如何支持Unicode。这意味着如何为数据库和其中的tables / column / indices / collat​​ions / etc指定Unicode charset。您必须确保整个数据库在其每个细节中都具有普遍的Unicode功能,以避免数据丢失。

  2. Generally you also should check that your database connection (using database access library of choice) also is unicode-enabled. Generally Microsoft ADO, just like and OLE, should be Unicode-enabled. But still check your database server manual how to specify unicode codepage or charset in the connection string. non-Unicode connection may also result in data loss.

    通常,您还应该检查您的数据库连接(使用选择的数据库访问库)是否也启用了unicode。通常Microsoft ADO,就像和OLE一样,应该启用Unicode。但是仍然检查数据库服务器手册如何在连接字符串中指定unicode代码页或字符集。非Unicode连接也可能导致数据丢失。

  3. When you tell you read some unicode file - it is ambiguous. What ius unicode file ? Is it UTF-8 ? Or one of four flavours of UTF-16 ? Or UTF-7 ? Or some other Unicode Transportation Format ? Usual windows WideChar roughly corresponds to legacy UCS-2 and is expected be BOM-stripped Intel-Endian flavour of UTF-16. http://msdn.microsoft.com/en-us/library/windows/desktop/ms221069.aspx

    当你告诉你读一些unicode文件时 - 它是不明确的。什么ius unicode文件?是UTF-8吗?或者四种UTF-16中的一种?还是UTF-7?还是其他一些Unicode传输格式?通常的Windows WideChar大致对应于传统的UCS-2,并且预计将采用BOM剥离的Intel-Endian风格的UTF-16。 http://msdn.microsoft.com/en-us/library/windows/desktop/ms221069.aspx

  4. If the file is surely that flavour of UTF-16, then you can load it using Delphi TWideStringList or Jedi CodeLibrary TJclWideStringList. Review you code that you never work with your data using string variables - use WideString everywhere to avoid data loss.
    Since D6 was one of buggiest releases, i'd prefer to ensure EVERY update to Delphi is installed and then install and use JCL. JCL also provides codepage transition functions, that might be more flexible than plain AnsiStringVar := WideStringVar approach.
    For UTF-8 file, it can be loaded by TWideStringList class of JCL (but not TJclWideStringList).

    如果文件肯定是UTF-16的味道,那么你可以使用Delphi TWideStringList或Jedi CodeLibrary TJclWideStringList加载它。检查您从未使用字符串变量处理数据的代码 - 在任何地方使用WideString以避免数据丢失。由于D6是最糟糕的版本之一,我更愿意确保安装了Delphi的所有更新,然后安装并使用JCL。 JCL还提供了代码页转换功能,这可能比普通的AnsiStringVar:= WideStringVar方法更灵活。对于UTF-8文件,它可以由JCL的TWideStringList类加载(但不能由TJclWideStringList加载)。

  5. When debugging, load lines of the list to WideString variable and see that their content is preserved.

    调试时,将列表的行加载到WideString变量,并查看其内容是否已保留。

  6. Don't write queries like that. See http://bobby-tables.com/ Even if you do not expect malicious cracker - you can yourself make errors or meat unexpected data. Use parametrized queries, everywhere, every time! EVER!
    See the example of such: http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/ADODB_TADOQuery_Parameters.html
    Check that every SQL VARCHAR parameter would be ftWideString to contain Unicode, not ftString. Check the same about fields(columns).

    不要写那样的查询。请参阅http://bobby-tables.com/即使您不期望恶意破解者 - 您也可以自己制造错误或肉类意外数据。每次都在任何地方使用参数化查询! EVER!请参阅以下示例:http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/ADODB_TADOQuery_Parameters.html检查每个SQL VARCHAR参数是否为ftWideString以包含Unicode,而不是ftString。关于字段(列)检查相同。

  7. Think if legacy technologies can be casted aside since their support would only get harder in time.

    想想遗留技术是否可以放在一边,因为它们的支持只会越来越难以及时。

    7.1. Since Microsoft ADO is deprecated (for exampel newer versions of Microsoft SQL Server would not support it), consider switching to 'live' data access libraries. Like AnyDAC, UniDAC, ZeosDB or some other library. Torry.net may hint you some.

    7.1。由于Microsoft ADO已弃用(例如,对于较新版本的Microsoft SQL Server不支持),请考虑切换到“实时”数据访问库。像AnyDAC,UniDAC,ZeosDB或其他一些库。 Torry.net可能会暗示你一些。

    7.2. Since Delphi 6 RTL and VCL is not Unicode-ready, consider migrating your application to TNT Unicode Components, if you'd manage to find their free version or purchase them. Or migrating to newer Delphi releases.

    7.2。由于Delphi 6 RTL和VCL不支持Unicode,因此如果您设法找到他们的免费版本或购买它们,请考虑将您的应用程序迁移到TNT Unicode组件。或者迁移到更新的Delphi版本。

    7.3. Since Delphi 6 is very old and long not-supported and since it was one of buggiest Delphi releases, consider migrating to newer Delphi versions or free tools like CodeTyphoon or Lazarus. As a bonus, Lazarus started moving to Unicode in its recent beta builds, and it is possible that by the end of migration to it you would get you application unicode-ready.

    7.3。由于Delphi 6非常陈旧且长期不受支持,并且因为它是最繁忙的Delphi版本之一,因此请考虑迁移到较新的Delphi版本或CodeTyphoon或Lazarus等免费工具。作为奖励,Lazarus在其最近的beta版本中开始转向Unicode,并且有可能在迁移到它之后,您将获得应用程序unicode就绪。

    7.4 Migration might be excuse and stimulus for re-factoring your application and getting rid of legacy spaghetti.

    7.4迁移可能是重新考虑您的应用程序和摆脱传统意大利面条的借口和刺激因素。

#1


3  

In non Unicode Delphi version, The basics are that you need to work with WideStrings (Unicode) instead of Strings (Ansi).

在非Unicode Delphi版本中,基础是你需要使用WideStrings(Unicode)而不是字符串(Ansi)。

Forget about TADOQuery.SQL (TStrings), and work with TADODataSet.CommandText or TADOCommand.CommandText(WideString) or typecast TADOQuery as TADODataSet. e.g:

忘记TADOQuery.SQL(TStrings),并使用TADODataSet.CommandText或TADOCommand.CommandText(WideString)或类型转换TADOQuery作为TADODataSet。例如:

stlTemp: TWideStringList; // <- Unicode strings - TNT or other Unicode lib
qry: TADOQuery;
stQuery: WideString; // <- Unicode string

TADODataSet(qry).CommandText := stQuery;
RowsAffected := qry.ExecSQL;

You can also use TADOConnection.Execute(stQuery) to execute queries directly.

您还可以使用TADOConnection.Execute(stQuery)直接执行查询。


Be extra careful with Parametrized queries: ADODB.TParameters.ParseSQL is Ansi. If ParamCheck is true (by default) TADOCommand.SetCommandText->AssignCommandText will cause problems if your Query is Unicode (InitParameters is Ansi).

使用参数化查询时要格外小心:ADODB.TParameters.ParseSQL是Ansi。如果ParamCheck为true(默认情况下),如果您的Query是Unicode(InitParameters是Ansi),TADOCommand.SetCommandText-> AssignCommandText将导致问题。

(note that you can use ADO Command.Parameters directly - using ? chars as placeholder for the parameter instead of Delphi's convention :param_name).

(请注意,您可以直接使用ADO Command.Parameters - 使用?chars作为参数的占位符而不是Delphi的约定:param_name)。


QuotedStr returns Ansi string. You need a Wide version of this function (TNT)

QuotedStr返回Ansi字符串。你需要这个功能的宽版本(TNT)


Also, As @Arioch 'The mentioned TNT Unicode Controls suite is your best fried for making Delphi Unicode application. It has all the controls and classes you need to successfully manage Unicode tasks in your application.

此外,作为@Arioch'提到的TNT Unicode控件套件是您最好的制作Delphi Unicode应用程序。它具有在应用程序中成功管理Unicode任务所需的所有控件和类。

In short, you need to think Wide :)

总之,你需要思考宽:)

#2


3  

  1. You did not specified database server, so this investigation remains on our part. You should check how does your database server support Unicode. That means how to specify Unicode charset for the database and the tables/column/indices/collations/etc inside it. You have to ensure that the whole DB is pervasively Unicode-enabled in every its detail, to avoid data loss.

    您没有指定数据库服务器,因此我们仍在进行此调查。您应该检查数据库服务器如何支持Unicode。这意味着如何为数据库和其中的tables / column / indices / collat​​ions / etc指定Unicode charset。您必须确保整个数据库在其每个细节中都具有普遍的Unicode功能,以避免数据丢失。

  2. Generally you also should check that your database connection (using database access library of choice) also is unicode-enabled. Generally Microsoft ADO, just like and OLE, should be Unicode-enabled. But still check your database server manual how to specify unicode codepage or charset in the connection string. non-Unicode connection may also result in data loss.

    通常,您还应该检查您的数据库连接(使用选择的数据库访问库)是否也启用了unicode。通常Microsoft ADO,就像和OLE一样,应该启用Unicode。但是仍然检查数据库服务器手册如何在连接字符串中指定unicode代码页或字符集。非Unicode连接也可能导致数据丢失。

  3. When you tell you read some unicode file - it is ambiguous. What ius unicode file ? Is it UTF-8 ? Or one of four flavours of UTF-16 ? Or UTF-7 ? Or some other Unicode Transportation Format ? Usual windows WideChar roughly corresponds to legacy UCS-2 and is expected be BOM-stripped Intel-Endian flavour of UTF-16. http://msdn.microsoft.com/en-us/library/windows/desktop/ms221069.aspx

    当你告诉你读一些unicode文件时 - 它是不明确的。什么ius unicode文件?是UTF-8吗?或者四种UTF-16中的一种?还是UTF-7?还是其他一些Unicode传输格式?通常的Windows WideChar大致对应于传统的UCS-2,并且预计将采用BOM剥离的Intel-Endian风格的UTF-16。 http://msdn.microsoft.com/en-us/library/windows/desktop/ms221069.aspx

  4. If the file is surely that flavour of UTF-16, then you can load it using Delphi TWideStringList or Jedi CodeLibrary TJclWideStringList. Review you code that you never work with your data using string variables - use WideString everywhere to avoid data loss.
    Since D6 was one of buggiest releases, i'd prefer to ensure EVERY update to Delphi is installed and then install and use JCL. JCL also provides codepage transition functions, that might be more flexible than plain AnsiStringVar := WideStringVar approach.
    For UTF-8 file, it can be loaded by TWideStringList class of JCL (but not TJclWideStringList).

    如果文件肯定是UTF-16的味道,那么你可以使用Delphi TWideStringList或Jedi CodeLibrary TJclWideStringList加载它。检查您从未使用字符串变量处理数据的代码 - 在任何地方使用WideString以避免数据丢失。由于D6是最糟糕的版本之一,我更愿意确保安装了Delphi的所有更新,然后安装并使用JCL。 JCL还提供了代码页转换功能,这可能比普通的AnsiStringVar:= WideStringVar方法更灵活。对于UTF-8文件,它可以由JCL的TWideStringList类加载(但不能由TJclWideStringList加载)。

  5. When debugging, load lines of the list to WideString variable and see that their content is preserved.

    调试时,将列表的行加载到WideString变量,并查看其内容是否已保留。

  6. Don't write queries like that. See http://bobby-tables.com/ Even if you do not expect malicious cracker - you can yourself make errors or meat unexpected data. Use parametrized queries, everywhere, every time! EVER!
    See the example of such: http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/ADODB_TADOQuery_Parameters.html
    Check that every SQL VARCHAR parameter would be ftWideString to contain Unicode, not ftString. Check the same about fields(columns).

    不要写那样的查询。请参阅http://bobby-tables.com/即使您不期望恶意破解者 - 您也可以自己制造错误或肉类意外数据。每次都在任何地方使用参数化查询! EVER!请参阅以下示例:http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/ADODB_TADOQuery_Parameters.html检查每个SQL VARCHAR参数是否为ftWideString以包含Unicode,而不是ftString。关于字段(列)检查相同。

  7. Think if legacy technologies can be casted aside since their support would only get harder in time.

    想想遗留技术是否可以放在一边,因为它们的支持只会越来越难以及时。

    7.1. Since Microsoft ADO is deprecated (for exampel newer versions of Microsoft SQL Server would not support it), consider switching to 'live' data access libraries. Like AnyDAC, UniDAC, ZeosDB or some other library. Torry.net may hint you some.

    7.1。由于Microsoft ADO已弃用(例如,对于较新版本的Microsoft SQL Server不支持),请考虑切换到“实时”数据访问库。像AnyDAC,UniDAC,ZeosDB或其他一些库。 Torry.net可能会暗示你一些。

    7.2. Since Delphi 6 RTL and VCL is not Unicode-ready, consider migrating your application to TNT Unicode Components, if you'd manage to find their free version or purchase them. Or migrating to newer Delphi releases.

    7.2。由于Delphi 6 RTL和VCL不支持Unicode,因此如果您设法找到他们的免费版本或购买它们,请考虑将您的应用程序迁移到TNT Unicode组件。或者迁移到更新的Delphi版本。

    7.3. Since Delphi 6 is very old and long not-supported and since it was one of buggiest Delphi releases, consider migrating to newer Delphi versions or free tools like CodeTyphoon or Lazarus. As a bonus, Lazarus started moving to Unicode in its recent beta builds, and it is possible that by the end of migration to it you would get you application unicode-ready.

    7.3。由于Delphi 6非常陈旧且长期不受支持,并且因为它是最繁忙的Delphi版本之一,因此请考虑迁移到较新的Delphi版本或CodeTyphoon或Lazarus等免费工具。作为奖励,Lazarus在其最近的beta版本中开始转向Unicode,并且有可能在迁移到它之后,您将获得应用程序unicode就绪。

    7.4 Migration might be excuse and stimulus for re-factoring your application and getting rid of legacy spaghetti.

    7.4迁移可能是重新考虑您的应用程序和摆脱传统意大利面条的借口和刺激因素。