My client has a huge database containing just three fields:
我的客户有一个包含三个字段的巨大数据库:
- Primary key (a unsigned number)
- 主键(无符号号)
- Name (multi-word text)
- 名称(多词文本)
- Description (up to 1000 varchar)
- 描述(最多1000 varchar)
This database has got over few billion entries. I have no previous experience in handling such large amounts of data.
这个数据库有超过几十亿个条目。我以前没有处理这么多数据的经验。
He wants me to design an interface using AJAX (like Google) to search this database. My queries are as slow as turtle.
他希望我使用AJAX(如谷歌)设计一个界面来搜索这个数据库。我的问题像乌龟一样慢。
What is best way to search text fields in such a large database? If the user is typing wrong spelling on interface, how can I return what he wanted ?
在这么大的数据库中搜索文本字段的最佳方式是什么?如果用户在界面上输入错误的拼写,我如何返回他想要的?
3 个解决方案
#1
7
If you are using FULLTEXT indexes, you're correctly writing your queries, and the speed in which the results are returned are not adequate, you are entering a territory where MySQL may simply not be sufficient for you..
如果您正在使用全文索引,那么您正在正确地编写查询,并且返回结果的速度不够快,那么您正在进入一个MySQL可能对您来说根本不够的领域。
You may be able to tweak settings, purchase enough RAM to make sure that your entire data-set fits 100% in memory. It's definitely true that performance gains could be huge there.
您可以调整设置,购买足够的RAM,以确保您的整个数据集在内存中是100%合适的。毫无疑问,性能的提高是巨大的。
I'd definitely recommend looking into tweaks of your mysql configuration. We've had some silly settings in the past. Operating system defaults tend to really suck!
我绝对建议您研究一下对mysql配置的调整。我们过去有过一些愚蠢的设置。操作系统默认是很糟糕的!
However, if you have trouble at that point, you can:
然而,如果你当时有问题,你可以:
- Create a separate table containing each word (indexed) along with a record id that it refers to. This will allow you to search on single words.
- 创建一个单独的表,其中包含每个单词(索引)及其引用的记录id。这将允许您搜索单个单词。
- Use a different system that's optimized for solving this problem. Unless my information is now outdated, the 2 engines that are the most popular for solving this problem are:
- Sphinx
- 斯芬克斯
- Solr / Lucene
- Solr / Lucene
- 使用另一个优化的系统来解决这个问题。除非我的信息已经过时了,否则最常用的两个引擎是:Sphinx Solr / Lucene
#2
0
If your table is myISAM then you can set the Name and Description fields to FULLTEXT
如果您的表是myISAM,那么您可以将名称和描述字段设置为FULLTEXT。
CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
Name VARCHAR(200),
Description TEXT,
FULLTEXT (Name,Description)
);
Then you can use queries like:
然后您可以使用以下查询:
SELECT * FROM articles
WHERE MATCH (Name,Description) AGAINST ('database');
Your can find more info at http://docs.oracle.com/cd/E17952_01/refman-5.0-en/fulltext-search.html
您可以在http://docs.oracle.com/cd/e17952_01/refman -5.0-en/fulltext .html中找到更多信息。
Before doing any of the above you might want to backup (or atleast make a copy) of your database.
在执行上述任何操作之前,您可能希望备份(或至少复制)数据库。
#3
0
You can't. The only fast search in your scenario would be on the Primary Key since that's most likely to be the index. Text search is slow as a turtle.
你不能。在您的场景中,唯一的快速搜索将在主键上,因为这很可能是索引。文字搜索就像乌龟一样慢。
In all seriousness, you have a few solutions:
认真地说,你有一些解决办法:
If you have to stick with NoSQL you'll have to redesign you scheme. It's hard to give you a good recommendation without knowing the requirements. One solution would be to index keywords in a separate table.
如果你必须坚持使用NoSQL,你必须重新设计你的方案。如果不了解需求,很难给你一个好的推荐。一种解决方案是在单独的表中索引关键字。
Another solution is to switch to a different search engine, you can find suggestions in other questions here such as: Fast SQL Server search on 40M text records
另一个解决方案是切换到另一个搜索引擎,您可以在这里找到其他问题的建议,例如:在4000万条文本记录上进行快速SQL Server搜索
#1
7
If you are using FULLTEXT indexes, you're correctly writing your queries, and the speed in which the results are returned are not adequate, you are entering a territory where MySQL may simply not be sufficient for you..
如果您正在使用全文索引,那么您正在正确地编写查询,并且返回结果的速度不够快,那么您正在进入一个MySQL可能对您来说根本不够的领域。
You may be able to tweak settings, purchase enough RAM to make sure that your entire data-set fits 100% in memory. It's definitely true that performance gains could be huge there.
您可以调整设置,购买足够的RAM,以确保您的整个数据集在内存中是100%合适的。毫无疑问,性能的提高是巨大的。
I'd definitely recommend looking into tweaks of your mysql configuration. We've had some silly settings in the past. Operating system defaults tend to really suck!
我绝对建议您研究一下对mysql配置的调整。我们过去有过一些愚蠢的设置。操作系统默认是很糟糕的!
However, if you have trouble at that point, you can:
然而,如果你当时有问题,你可以:
- Create a separate table containing each word (indexed) along with a record id that it refers to. This will allow you to search on single words.
- 创建一个单独的表,其中包含每个单词(索引)及其引用的记录id。这将允许您搜索单个单词。
- Use a different system that's optimized for solving this problem. Unless my information is now outdated, the 2 engines that are the most popular for solving this problem are:
- Sphinx
- 斯芬克斯
- Solr / Lucene
- Solr / Lucene
- 使用另一个优化的系统来解决这个问题。除非我的信息已经过时了,否则最常用的两个引擎是:Sphinx Solr / Lucene
#2
0
If your table is myISAM then you can set the Name and Description fields to FULLTEXT
如果您的表是myISAM,那么您可以将名称和描述字段设置为FULLTEXT。
CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
Name VARCHAR(200),
Description TEXT,
FULLTEXT (Name,Description)
);
Then you can use queries like:
然后您可以使用以下查询:
SELECT * FROM articles
WHERE MATCH (Name,Description) AGAINST ('database');
Your can find more info at http://docs.oracle.com/cd/E17952_01/refman-5.0-en/fulltext-search.html
您可以在http://docs.oracle.com/cd/e17952_01/refman -5.0-en/fulltext .html中找到更多信息。
Before doing any of the above you might want to backup (or atleast make a copy) of your database.
在执行上述任何操作之前,您可能希望备份(或至少复制)数据库。
#3
0
You can't. The only fast search in your scenario would be on the Primary Key since that's most likely to be the index. Text search is slow as a turtle.
你不能。在您的场景中,唯一的快速搜索将在主键上,因为这很可能是索引。文字搜索就像乌龟一样慢。
In all seriousness, you have a few solutions:
认真地说,你有一些解决办法:
If you have to stick with NoSQL you'll have to redesign you scheme. It's hard to give you a good recommendation without knowing the requirements. One solution would be to index keywords in a separate table.
如果你必须坚持使用NoSQL,你必须重新设计你的方案。如果不了解需求,很难给你一个好的推荐。一种解决方案是在单独的表中索引关键字。
Another solution is to switch to a different search engine, you can find suggestions in other questions here such as: Fast SQL Server search on 40M text records
另一个解决方案是切换到另一个搜索引擎,您可以在这里找到其他问题的建议,例如:在4000万条文本记录上进行快速SQL Server搜索