MS Sql全文搜索与LIKE表达式

I'm currently looking for a way to search a big database (500MB - 10GB or more on 10 tables) with a lot of different fields(nvarchars and bigints). Many of the fields, that should be searched are not in the same table.

我目前正在寻找一种方法来搜索一个大型数据库(500MB - 10个表格中的10GB或更多),其中包含许多不同的字段(nvarchars和bigints)。应搜索的许多字段不在同一个表中。

An example: A search for '5124 Peter' should return all items, that ...

一个例子:搜索'5124 Peter'应该返回所有项目,那......

have an ID with 5124 in it,

有一个5124的ID,

have 'Peter' in the title or description

在标题或描述中有'彼得'

have item type id with 5124 in it

项目类型ID为5124

created by a user named 'peter' or a user whose id has 5124 in it

由名为'peter'的用户或其id为5124的用户创建

created by a user with '5124' or 'peter' in his street address.

由用户在其街道地址中使用“5124”或“peter”创建。

How should i do the search? I read that the full-text search of MS-Sql is a lot more performant than a query with the LIKE keyword and i think the syntax is more clear, but i think it cant search on bigint(id) values and i read it has performance problems with indexing and therefore slows down inserts to the DB. In my project there will be more inserting than reading, so this could be a matter.

我该怎么做搜索?我读到MS-Sql的全文搜索比使用LIKE关键字的查询更高效,我认为语法更清晰,但我认为它无法搜索bigint(id)值并且我读了它索引的性能问题,因此减慢了DB的插入速度。在我的项目中,插入比插入更多,所以这可能是一个问题。

Thanks in advance, Marks

马克斯,提前谢谢你

3 个解决方案

#1

I don't think you're going to get the performance you need out of MS SQL; you're going to need to construct very complex queries to cover all the data/tables that you're going to be searching, and you have the added encumbrance of writing data to the database at the same time as you are querying it.

我认为你不会从MS SQL中获得所需的性能;您将需要构建非常复杂的查询来覆盖您将要搜索的所有数据/表,并且在查询数据的同时,您还有将数据写入数据库的额外障碍。

I would suggest you look at either Apache Solr (http://lucene.apache.org/solr/) or Lucene (http://lucene.apache.org). Solr is built on top of Lucene, both can be used to create an inverted file index, basically like the index in the back of book (term 1 appears in documents 1, 3, 7, etc.) Solr is a search-engine-in-a-box, and has several mechanisms that will let you tell it how and where to index data. Lucene is more lower-level, and will let you set up your indexing and searching architecture with more flexibility.

我建议您查看Apache Solr(http://lucene.apache.org/solr/)或Lucene(http://lucene.apache.org)。 Solr建立在Lucene之上,两者都可用于创建反向文件索引,基本上类似于书后面的索引(第1项出现在文档1,3,7等中)Solr是一个搜索引擎 - in-a-box,有几种机制可以让你告诉它如何以及在何处索引数据。 Lucene更低级,可以让您更灵活地设置索引和搜索体系结构。

The good thing about Solr is that it's available as a web service, so if you're not familiar with Java, you can find a Solr client in the language of your choice, and write indexing and searching code in whatever language suits you. Here's a link to a list of client libraries for Solr, including some in C# http://wiki.apache.org/solr/IntegratingSolr That's where I'd start.

Solr的优点在于它可以作为Web服务使用,因此如果您不熟悉Java,您可以使用您选择的语言找到Solr客户端,并以适合您的语言编写索引和搜索代码。这里是Solr客户端库列表的链接,包括C#中的一些客户端库http://wiki.apache.org/solr/IntegratingSolr这就是我要开始的地方。

#2

You could try a standalone search engine, such as Sphinx Search:

你可以试试一个独立的搜索引擎,比如Sphinx Search:

http://www.sphinxsearch.com/index.html

or Apache Solr:

或Apache Solr:

http://lucene.apache.org/solr/

#3

Full-text search is definitely more performant than like expression. What you can do is create a full-text index on a view instead of a table, and since it's just the index that gets searched that can save table joins later which can speed things up a bit. The view would also allow you to convert the bigint columns to varchar which can then get indexed, say by concatenating all the columns that are to be searched together as one varchar column. To do this you need to create a view with SCHEMABINDING and select at least one column that is unique and create a clustered unique index on it.

全文搜索肯定比表达更高效。你可以做的是在一个视图而不是一个表上创建一个全文索引,因为它只是被搜索的索引,可以在以后保存表连接,这可以加快速度。该视图还允许您将bigint列转换为varchar,然后可以将其编入索引,例如将所有要搜索的列连接在一起作为一个varchar列。为此,您需要使用SCHEMABINDING创建一个视图,并选择至少一个唯一的列并在其上创建一个聚簇唯一索引。

As for the effects on full-text on insert performance, I haven't noticed much of an impact on bulk insert myself but I see from * question 3301470, someone mention that performance was slow on sql 2005 but that in sql 2008 that's now fixed. This is because it now updates the index after the bulk insert instead of after every individual row insert (I'm running 2008). If you are running 2005 then to improve you can disable change tracking just for the bulk insert and manually call update index after.

至于对插入性能的全文的影响,我没有注意到自己对批量插入的影响很大,但我从*问题3301470看到,有人提到sql 2005上的性能很慢但是在sql 2008中现在修复了。这是因为它现在在批量插入之后而不是在每个单独的行插入之后更新索引(我正在运行2008)。如果您运行的是2005,那么为了改进,您可以仅针对批量插入禁用更改跟踪,然后手动调用更新索引。

#1

#2