I have a table with about 100,000 rows.
我有一个大约100,000行的表。
Each row contains a sentence, sentence fragment, or phrase.
每行包含一个句子,句子片段或短语。
I want to write a query that will find all rows that contain all of set of words, even in the words in the criteria are in a different order than the sentence.
我想编写一个查询,它将查找包含所有单词集的所有行,即使条件中的单词与句子的顺序不同。
For example, if my table looks like this:
例如,如果我的表看起来像这样:
id sentence
-- ---------------------------------------------------------------------------
1 How now brown cow
2 Alas, poor Yorick! I knew him
3 Call me Ishmael
4 A screaming comes across the sky
5 It was a bright cold day in April, and the clocks were striking thirteen
6 It was the best of times, it was the worst of times
7 You don't know about me without you have read a book
8 In the late summer of that year we lived in a house in a village
9 One summer afternoon Mrs. Oedipa Maas came home from a Tupperware party
10 It was a queer, sultry summer, the summer they electrocuted the Rosenbergs
My query criteria would be one or more words, in any particular order.
我的查询条件是一个或多个单词,按任何特定顺序排列。
The result set should contain all of the sentences that contain all of the words.
结果集应包含包含所有单词的所有句子。
For example, if the criteria is the was
, the results should include rows 5, 6, 10.
例如,如果条件是was,则结果应包括第5,6,10行。
Ideally, I'd like to improve this so that the query only needs to include the start of a word. (Note that I want to allow users to only enter the start of a word, but never just the middle or end).
理想情况下,我想改进这一点,以便查询只需要包含单词的开头。 (请注意,我希望允许用户只输入单词的开头,但不能只输入中间或结尾)。
E.g., if the criteria is elect sul
, the result would include row 10.
例如,如果标准是选定的,则结果将包括第10行。
Currently, here's how I'm doing this:
目前,我正在这样做:
SELECT
id, sentence
WHERE
(sentence LIKE 'elect%' OR sentence LIKE '% elect%')
AND
(sentence LIKE 'sul%' OR sentence LIKE '% sul%')
This works (I think...) - it finds everything it should. However, it's very slow.
这工作(我认为......) - 它找到了应有的一切。但是,它很慢。
Is there a better way to do this?
有一个更好的方法吗?
For what it's worth - I have flexibility to redesign the table, or create additional "helper" tables.
对于它的价值 - 我可以灵活地重新设计表格,或创建额外的“帮助”表。
E.g., I thought about creating a table that contains a row for every unique word and keys to each row of the sentence that includes it.
例如,我考虑创建一个表,其中包含每个唯一单词的行和包含它的句子的每一行的键。
Also - the query needs to work in MySQL.
此外 - 查询需要在MySQL中工作。
Many thanks in advance.
提前谢谢了。
1 个解决方案
#1
2
Your method is fine. If you want to handle multiple words, you can do something like:
你的方法很好。如果你想处理多个单词,你可以这样做:
select s.id, s.sentence
from sentence s join
(select 'elect' as word union all
select 'sul' as word
) words
on s.sentence like concat(word, '%') or
s.sentence like concat('% ', word, '%')
group by s.id, s.sentence
having count(*) = (select count(*) from words)
This won't be faster (because you have the additional group by
). But it does provide a bit more flexibility.
这不会更快(因为你有额外的组)。但它确实提供了更多的灵活性。
By the way, have you looked into the full text search capabilities in MySQL?
顺便问一下,您是否研究过MySQL中的全文搜索功能?
#1
2
Your method is fine. If you want to handle multiple words, you can do something like:
你的方法很好。如果你想处理多个单词,你可以这样做:
select s.id, s.sentence
from sentence s join
(select 'elect' as word union all
select 'sul' as word
) words
on s.sentence like concat(word, '%') or
s.sentence like concat('% ', word, '%')
group by s.id, s.sentence
having count(*) = (select count(*) from words)
This won't be faster (because you have the additional group by
). But it does provide a bit more flexibility.
这不会更快(因为你有额外的组)。但它确实提供了更多的灵活性。
By the way, have you looked into the full text search capabilities in MySQL?
顺便问一下,您是否研究过MySQL中的全文搜索功能?