使用多语言列进行全文搜索。

Is there a way to use FULLTEXT in a multi-language table without giving each language its own column?

是否有一种方法可以在多语言表中使用FULLTEXT而不给每种语言自己的列?

I have one column I need to search, but the language in that column varies:

我有一列需要搜索，但这一列的语言各不相同:

ProductID    int
Description  nvarchar(max)
Language     char(2)

Language can be one of: en, de, it, kr, th

语言可以是:en, de, it, kr, th

Currently I build a concordance and use that for searching. But this is only for English, German and Italian, and even for those it doesn't support stemming. Everything else uses LIKE '%searchterm%', and I'm trying to improve on that.

目前我建立了一个索引并使用它进行搜索。但这只是针对英语、德语和意大利语，甚至是那些它不支持词干的语言。其他所有东西都使用“%searchterm%”，我正在努力改进这一点。

I'm using SQL Server 2005.

我使用的是SQL Server 2005。

4 个解决方案

#1

Instead of a separate column per language, if you know which rows contain which language you could create an indexed view filtered to include only rows of a single langauge per language and FTI each of those. You'll need to query each view individually though.

如果您知道哪些行包含哪种语言，而不是每个语言都有一个单独的列，那么您可以创建一个经过索引的视图，以过滤为每个语言只包含一行语言和FTI。不过，您需要分别查询每个视图。

#2

I know this is an old question, but I just encountered it.

我知道这是个老问题，但我刚遇到它。

One approach I have seen is to use an XML column and specify the xml:lang attribute. As mentioned in CREATE FULLTEXT INDEX (Transact-SQL).

我看到的一种方法是使用XML列并指定XML:lang属性。正如在创建全文索引(Transact-SQL)中提到的那样。

For documents stored in XML- or BLOB-type columns, the language encoding within the document will be used at indexing time. For example, in XML columns, the xml:lang attribute in XML documents will identify the language. At query time, the value previously specified in language_term becomes the default language used for full-text queries unless language_term is specified as part of a full-text query.

对于存储在XML或blob类型列中的文档，文档中的语言编码将在索引时使用。例如，在XML列中，XML文档中的XML:lang属性将标识语言。在查询时，前面在language_term中指定的值将成为全文查询的默认语言，除非将language_term指定为全文查询的一部分。

The main downside of this approach is that it changes the data type to XML, but it seemed to work fine for our needs at the time.

这种方法的主要缺点是它将数据类型更改为XML，但在当时它似乎很适合我们的需要。

#3

Quoting from the Microsoft reference on CREATE FULLTEXT INDEX:

引用微软关于创建全文索引的参考资料:

For non-BLOB and non-XML columns containing text data in multiple languages, or for cases when the language of the text stored in the column is unknown, it might be appropriate for you to use the neutral (0x0) language resource. However, first you should understand the possible consequences of using the neutral (0x0) language resource. For information about the possible solutions and consequences of using the neutral (0x0) language resource, see Best Practices for Choosing a Language When Creating a Full-Text Index.

对于包含多种语言的文本数据的非blob和非xml列，或者对于存储在列中的文本语言未知的情况，您可以使用中立(0x0)语言资源。但是，首先您应该理解使用中立(0x0)语言资源的可能后果。有关使用中立(0x0)语言资源的可能解决方案和结果的信息，请参阅在创建全文索引时选择语言的最佳实践。

#4

I am using views for 20+ languages. Works fine for querying (if a little complex to select the correct view to use in sprocs). However, inserts and updates on the underlying table get clobbered as the plan seems to need to include a check on every ft view even with no change tracking.

我正在为20多种语言使用视图。用于查询的工作(如果有点复杂，可以选择在sprocs中使用的正确视图)。然而，底层表上的插入和更新会遭到失败，因为计划似乎需要对每个ft视图进行检查，即使没有更改跟踪。

#1

#2