I'm looking for a way to do a fast-as-possible incomplete-word LIKE "%foo%"
lookup across two tables in a MySQL database.
我正在寻找一种方法,可以在MySQL数据库的两个表之间快速查找“%foo%”之类的不完整的词。
Let's say I have two tables, Boxes and Objects, where each Box contains multiple Objects. What we want to do is find the ID of the box (Box.id) by matching a search string against Box.name
OR Object.name
.
假设我有两个表,Box和object,其中每个Box包含多个对象。我们要做的是通过将搜索字符串与box .name或Object.name匹配来查找框的ID (box . ID)。
To give you a picture of the scale we're dealing with, Boxes contains ~500,000 entries, while Objects contains ~200,000 entries.
为了让您了解我们正在处理的规模,箱子包含大约50万个条目,而对象包含大约20万个条目。
Every Object is in a Box, not every Box contains Objects. I have indices on Box.id
, Object.id
, and Object.box_id
.
每个对象都在一个框中,而不是每个框都包含对象。我在盒子上有索引。id、对象。id和Object.box_id。
Why?
为什么?
I need this data fast (200ms) so I can offer suggestions as a user types a search. The data set is essentially static, updated en masse yearly. Box.id
will never, ever change. I'm using an initial wildcard because the matching word may not start at the start of the string- for example, "flo"
needs to suggest "cake flour"
as well as "flour"
.
我需要这个数据快(200ms),所以我可以提供建议作为用户键入搜索。该数据集本质上是静态的,每年都进行一次整体更新。盒子。我永远不会改变。我使用的是初始的通配符,因为匹配的单词可能不会从字符串的开头开始——例如,“flo”需要表示“蛋糕粉”和“面粉”。
What I've tried:
我试着什么:
Doing a LEFT JOIN between the two tables:
在两个表之间做左连接:
SELECT b.id, b.name, o.name FROM boxes b LEFT JOIN objects o ON (b.id = o.box_id) WHERE ((b.name LIKE "%test str%") OR (o.name LIKE "%test str%")) LIMIT 10;
选择b。id, b.name, o.name从b框中离开JOIN对象o ON (b)。id = o.box_id)其中(b.name LIKE "%test str%")或(o.name LIKE "%test str%")限制为10;
Time to search: 3900ms.
时间搜索:3900 ms。
Denormalizing everything to one lookup table:
将所有内容都反规范化为一个查找表:
SELECT n.id, n.box_name, n.object_name from lookup_table n WHERE ((n.box_name LIKE "%test str%") OR (n.object_name LIKE "%test str%")) LIMIT 10;
选择n。id,n。box_name,n。object_name来自lookup_table n ((n)。例如“%test str%”)或(n)。object_name如“%test str%”)))限制10;
Time to search: 1100ms
.
时间搜索:1100 ms。
Getting rid of that JOIN clearly does wonders; however, this is still too slow. Ideally, this should take 200ms or less. Does anyone have any insight into how to optimize partial-word match queries?
摆脱这种联系显然会创造奇迹;然而,这还是太慢了。理想情况下,这需要200毫秒或更少。有没有人了解如何优化部分词匹配查询?
2 个解决方案
#1
2
Look into full text indexing. You shoud not be querying with a wildcard as the first character in a production system.
查看全文索引。您不应该将通配符作为生产系统中的第一个字符进行查询。
Do not denormalize as there are other problems associated with doing that including espcially problems with dat integrity other performance problems caused by tables being too wide, issues when the one-one relationship becomes one-to-many, other affected code that will break, etc. Joins are good. You should want joins, databases like joins. Of course you should make sure the fields you join on are indexed.
不要去规格化,因为还有其他与此相关的问题,包括dat完整性的问题,其他由于表太宽而引起的性能问题,当一个关系变成一对多关系时出现的问题,其他受影响的代码会被破坏,等等。您应该想要连接,像join这样的数据库。当然,您应该确保您加入的字段被编入索引。
#2
0
If this is a JS app in a UI, look for packages that do what you want. They are tuned for good speed, and do not depend on SQL.
如果这是UI中的一个JS应用程序,请查找实现所需功能的包。它们的调优速度很好,不依赖SQL。
#1
2
Look into full text indexing. You shoud not be querying with a wildcard as the first character in a production system.
查看全文索引。您不应该将通配符作为生产系统中的第一个字符进行查询。
Do not denormalize as there are other problems associated with doing that including espcially problems with dat integrity other performance problems caused by tables being too wide, issues when the one-one relationship becomes one-to-many, other affected code that will break, etc. Joins are good. You should want joins, databases like joins. Of course you should make sure the fields you join on are indexed.
不要去规格化,因为还有其他与此相关的问题,包括dat完整性的问题,其他由于表太宽而引起的性能问题,当一个关系变成一对多关系时出现的问题,其他受影响的代码会被破坏,等等。您应该想要连接,像join这样的数据库。当然,您应该确保您加入的字段被编入索引。
#2
0
If this is a JS app in a UI, look for packages that do what you want. They are tuned for good speed, and do not depend on SQL.
如果这是UI中的一个JS应用程序,请查找实现所需功能的包。它们的调优速度很好,不依赖SQL。