c#在SQL Server中存储文本进行全文搜索

时间:2022-09-13 08:01:01

Im writing an Outlook Add-in to file emails acdcording to certain parameters.

我在写Outlook插件文件电子邮件与特定的参数。

I am currently storing the Outlook.MailItem.Body property in a varbinary(max) field in SQL Server 2008R2. I have also enabled FTS on this column.

我现在正在存储Outlook.MailItem。在SQL Server 2008R2中的varbinary(max)字段中的Body属性。我还启用了这个列的FTS。

Currently I store the Body property of the email as a byte array in the database, and use ASCIIEncoder.GetBytes() function to convert this clear text. Currently I am experiencing some weird results, whereby I notice ? characters occasionally for apostrophes and new lines.

目前,我将电子邮件的Body属性作为字节数组存储在数据库中,并使用ASCIIEncoder.GetBytes()函数转换此明文。目前我正在经历一些奇怪的结果,我注意到了吗?字符偶尔为撇号和新行。

I have two questions:

我有两个问题:

  1. Is this the best method to store text in a database? As a byte array? And is the ASCIIEncoder the best method to acheive this?
  2. 这是在数据库中存储文本的最佳方法吗?作为一个字节数组?科学编码是解决这个问题的最好方法吗?
  3. I want to handle Unicode strings correctly, is there anything I should be aware of?
  4. 我想正确处理Unicode字符串,有什么需要注意的吗?

2 个解决方案

#1


2  

I'm not sure whether FullTextSearch works best on VarBinary columns, though my instinct says "no", but I can answer the second half of your question.

我不确定FullTextSearch是否适用于VarBinary列,尽管我的直觉是“否”,但我可以回答你问题的后半部分。

The reason you're getting odd characters is that ASCIIEncoder.GetBytes() treats the text as ASCII, and can have exactly those sort of errors if the text you're encoding ISN'T ASCII-encoded. By default, strings in .NET are UTF8, so you're probably running into problems there. Use Encoding.UTF8.GetBytes() to get the bytes for a UTF8 string.

获取奇怪字符的原因是,ASCIIEncoder.GetBytes()将文本视为ASCII,如果所编码的文本不是ASCII编码的,则可能会出现这些错误。默认情况下,. net中的字符串是UTF8,因此您可能会在那里遇到问题。使用Encoding.UTF8.GetBytes()获取UTF8字符串的字节。

This also answers the second question - is this method useful for Unicode strings? Yes, since you're not storing strings at all. You're storing bytes, which your application happens to know are encoded Unicode strings. SQL won't do anything to them, because they're just bytes.

这也回答了第二个问题——这个方法对Unicode字符串有用吗?是的,因为你根本不存储字符串。您正在存储字节,您的应用程序碰巧知道的是编码的Unicode字符串。SQL不会对它们做任何事情,因为它们只是字节。

#2


2  

Since you have to support Unicode characters and handle only text you should store your data in a column of type nvarchar. That would address both of your problems:

由于必须支持Unicode字符并只处理文本,所以应该将数据存储在nvarchar类型的列中。这将解决你的两个问题:

1.) Text is saved as variable-length Unicode character data in the database, you don't need a byte encoder/decoder to retrieve the data

1)。文本作为可变长度的Unicode字符数据保存在数据库中,您不需要字节编码器/解码器来检索数据

2.) See 1.)

2)。见1。)

#1


2  

I'm not sure whether FullTextSearch works best on VarBinary columns, though my instinct says "no", but I can answer the second half of your question.

我不确定FullTextSearch是否适用于VarBinary列,尽管我的直觉是“否”,但我可以回答你问题的后半部分。

The reason you're getting odd characters is that ASCIIEncoder.GetBytes() treats the text as ASCII, and can have exactly those sort of errors if the text you're encoding ISN'T ASCII-encoded. By default, strings in .NET are UTF8, so you're probably running into problems there. Use Encoding.UTF8.GetBytes() to get the bytes for a UTF8 string.

获取奇怪字符的原因是,ASCIIEncoder.GetBytes()将文本视为ASCII,如果所编码的文本不是ASCII编码的,则可能会出现这些错误。默认情况下,. net中的字符串是UTF8,因此您可能会在那里遇到问题。使用Encoding.UTF8.GetBytes()获取UTF8字符串的字节。

This also answers the second question - is this method useful for Unicode strings? Yes, since you're not storing strings at all. You're storing bytes, which your application happens to know are encoded Unicode strings. SQL won't do anything to them, because they're just bytes.

这也回答了第二个问题——这个方法对Unicode字符串有用吗?是的,因为你根本不存储字符串。您正在存储字节,您的应用程序碰巧知道的是编码的Unicode字符串。SQL不会对它们做任何事情,因为它们只是字节。

#2


2  

Since you have to support Unicode characters and handle only text you should store your data in a column of type nvarchar. That would address both of your problems:

由于必须支持Unicode字符并只处理文本,所以应该将数据存储在nvarchar类型的列中。这将解决你的两个问题:

1.) Text is saved as variable-length Unicode character data in the database, you don't need a byte encoder/decoder to retrieve the data

1)。文本作为可变长度的Unicode字符数据保存在数据库中,您不需要字节编码器/解码器来检索数据

2.) See 1.)

2)。见1。)