I'm trying to improve the search functionality on my web forums. I've got a table of posts, and each post has (among other less interesting things):
我正在尝试改进我的网络论坛上的搜索功能。我有一个帖子表,每个帖子都有(除了其他不那么有趣的东西):
- PostID, a unique ID for the individual post.
- ThreadID, an ID of the thread the post belongs to. There can be any number of posts per thread.
- Text, because a forum would be really boring without it.
PostID,个别帖子的唯一ID。
ThreadID,帖子所属线程的ID。每个帖子可以有任意数量的帖子。
文字,因为没有它,论坛会很无聊。
I want to write an efficient query that will search the threads in the forum for a series of words, and it should return a hit for any ThreadID for which there are posts that include all of the search words. For example, let's say that thread 9 has post 1001 with the word "cat" in it, and also post 1027 with the word "hat" in it. I want a search for cat hat to return a hit for thread 9.
我想编写一个有效的查询来搜索论坛中的线程以获取一系列单词,并且它应该返回任何ThreadID的匹配,其中包含所有搜索单词的帖子。例如,假设线程9在其中发布了带有“cat”字样的1001,并且还在其中发布了带有“hat”一词的1027。我想搜索猫帽来为线程9返回一个命中。
This seems like a straightforward requirement, but I don't know of an efficient way to do it. Using the regular FREETEXT and CONTAINS capabilities for N'cat AND hat' won't return any hits in the above example because the words exist in different posts, even though those posts are in the same thread. (As far as I can tell, when using CREATE FULLTEXT INDEX I have to give it my index on the primary key PostID, and can't tell it to index all posts with the same ThreadID together.)
这似乎是一个简单的要求,但我不知道有效的方法。使用N'cat AND hat'的常规FREETEXT和CONTAINS功能将不会在上面的示例中返回任何匹配,因为单词存在于不同的帖子中,即使这些帖子在同一个帖子中。 (据我所知,当使用CREATE FULLTEXT INDEX时,我必须在主键PostID上给它我的索引,并且不能告诉它将具有相同ThreadID的所有帖子编入索引。)
The solution that I currently have in place works, but sucks: maintain a separate table that contains the entire concatenated post text of every thread, and make a full text index on THAT. I'm looking for a solution that doesn't require me to keep a duplicate copy of the entire text of every thread in my forums. Any ideas? Am I missing something obvious?
我目前使用的解决方案有效,但很糟糕:维护一个单独的表,其中包含每个线程的整个连接后文本,并在THAT上创建一个全文索引。我正在寻找一种解决方案,不要求我在论坛中保留每个帖子的整个文本的副本。有任何想法吗?我错过了一些明显的东西吗
2 个解决方案
#1
1
As far as i can see there is no "easy" way of doing this.
据我所知,没有“简单”的方法可以做到这一点。
I would create a stored procedure which simply splits up the search words and starts looking for the first word and put the threadid's in a table variable. Then you look for the other words (if any) in the threadids you just collected (inner join).
我将创建一个存储过程,它只是简单地拆分搜索词并开始寻找第一个单词并将threadid放在表变量中。然后在刚收集的threadids(内部联接)中查找其他单词(如果有)。
If intrested i can write a few bits of code but im guessing you wont need it.
如果有兴趣我可以写几个代码,但我猜你不会需要它。
#2
0
What are you searching for? CAT HAT as a complete word, in which case:
你在找什么? CAT HAT作为一个完整的单词,在这种情况下:
CONTAINS(*,'"CAT HAT")
CAT OR HAT then..
CAT或HAT然后..
CONTAINS (*,'CAT OR HAT')
Searching for "CAT HAT" and expecting just the post with CAT in doesn't make any sense. If the problem is parsing what the user types, you could just replace SPACES with OR (to search any of the words, AND if both required). The OR will give you both posts for thread 9.
搜索“CAT HAT”并期待只有CAT的帖子没有任何意义。如果问题是解析用户键入的内容,则可以用OR替换SPACES(搜索任何单词,如果需要则单击AND)。 OR将为您提供第9个帖子的帖子。
SELECT DISTINCT ThreadId
FROM Posts
WHERE CONTAINS (*,'"CAT OR HAT")
Better still you could , if it helps, use the brilliant irony (http://irony.codeplex.com/) which translates (parses) a search string into a Fulltext query. Might help for you.
更好的是,如果它有帮助,你可以使用将搜索字符串翻译(解析)为全文查询的精彩讽刺(http://irony.codeplex.com/)。可能对你有所帮助。
Requires the use of google syntax for the original search which can only be a good thing as most people are used to typing in google searches.
需要使用谷歌语法进行原始搜索,这只是一件好事,因为大多数人习惯于在谷歌搜索中输入。
Plus here is an article on how to use it. http://www.sqlservercentral.com/articles/Full-Text+Search+(2008)/64248/
另外这里有一篇关于如何使用它的文章。 http://www.sqlservercentral.com/articles/Full-Text+Search+(2008)/64248/
#1
1
As far as i can see there is no "easy" way of doing this.
据我所知,没有“简单”的方法可以做到这一点。
I would create a stored procedure which simply splits up the search words and starts looking for the first word and put the threadid's in a table variable. Then you look for the other words (if any) in the threadids you just collected (inner join).
我将创建一个存储过程,它只是简单地拆分搜索词并开始寻找第一个单词并将threadid放在表变量中。然后在刚收集的threadids(内部联接)中查找其他单词(如果有)。
If intrested i can write a few bits of code but im guessing you wont need it.
如果有兴趣我可以写几个代码,但我猜你不会需要它。
#2
0
What are you searching for? CAT HAT as a complete word, in which case:
你在找什么? CAT HAT作为一个完整的单词,在这种情况下:
CONTAINS(*,'"CAT HAT")
CAT OR HAT then..
CAT或HAT然后..
CONTAINS (*,'CAT OR HAT')
Searching for "CAT HAT" and expecting just the post with CAT in doesn't make any sense. If the problem is parsing what the user types, you could just replace SPACES with OR (to search any of the words, AND if both required). The OR will give you both posts for thread 9.
搜索“CAT HAT”并期待只有CAT的帖子没有任何意义。如果问题是解析用户键入的内容,则可以用OR替换SPACES(搜索任何单词,如果需要则单击AND)。 OR将为您提供第9个帖子的帖子。
SELECT DISTINCT ThreadId
FROM Posts
WHERE CONTAINS (*,'"CAT OR HAT")
Better still you could , if it helps, use the brilliant irony (http://irony.codeplex.com/) which translates (parses) a search string into a Fulltext query. Might help for you.
更好的是,如果它有帮助,你可以使用将搜索字符串翻译(解析)为全文查询的精彩讽刺(http://irony.codeplex.com/)。可能对你有所帮助。
Requires the use of google syntax for the original search which can only be a good thing as most people are used to typing in google searches.
需要使用谷歌语法进行原始搜索,这只是一件好事,因为大多数人习惯于在谷歌搜索中输入。
Plus here is an article on how to use it. http://www.sqlservercentral.com/articles/Full-Text+Search+(2008)/64248/
另外这里有一篇关于如何使用它的文章。 http://www.sqlservercentral.com/articles/Full-Text+Search+(2008)/64248/