I am a fairly new MySQL developer and am starting on a project that I could do with a bit of initial advice on...
我是一个相当新的MySQL开发人员,我正在开始一个项目,我可以做一些关于...的初步建议
I am creating a database that will primarily be holding a certain number of items (between 1-5k) and around 40 boolean variables associated with each one. Users will then be inputting their choice of these 40 values and it is the job of the system to determine the 'best' matched items. This may be items that match all 40 variables or, if none exist, the ones that match 39 etc.
我正在创建一个数据库,主要是持有一定数量的项目(1-5k之间)和大约40个与每个项目相关联的布尔变量。然后,用户将输入他们对这40个值的选择,系统的工作是确定“最佳”匹配项目。这可能是匹配所有40个变量的项目,如果不存在,则可能是匹配39个等的项目。
So, a couple of queries if anyone has the time!
所以,如果有人有时间的话,有几个问题!
- From my experience of MySQL there is no significant speed advantage in splitting up data into separate tables for a database of this size. The overheads for more tables are simply too large to make any viable difference to the overall performance. Therefore, I would be proposing to simply create one large table with 40 columns and up to 5000 rows to store all of the information (table locking is not an issue as all queries will be SELECT). Does this match with others' thinking and experience?
- What would be the most efficient way of returning the 'best' match? Is this even possible through database structure and SQL commands alone or am I going to have to simply return the entire array to PHP and run a form of heuristic function on that there to determine the best matches?
根据我对MySQL的经验,将数据拆分为这种大小的数据库的单独表没有明显的速度优势。更多表的开销太大,无法对整体性能产生任何可行的差异。因此,我建议简单地创建一个包含40列和最多5000行的大型表来存储所有信息(表锁定不是问题,因为所有查询都将是SELECT)。这与其他人的思想和经验相匹配吗?
返回“最佳”比赛的最有效方法是什么?这甚至可以通过数据库结构和SQL命令单独使用,还是我必须简单地将整个数组返回给PHP并在那里运行一种启发式函数以确定最佳匹配?
Thanks for your time & help!
感谢您的时间和帮助!
3 个解决方案
#1
3
A single table is surely right. You can store up to 64 boolean variables into a single BIGINT
column, as a "mask" with one bool per bit, and compute the match extremely fast as BIT_COUNT(~(the_column ^ user_preferences))
which will count how many bits are equal between the column and the mask giving the user's preferences (should PHP give you problems manipulating 64-bit integers, you can use two columnns of 32 bits each, summing the two bit counts will still be very fast).
一张桌子肯定是正确的。您可以将多达64个布尔变量存储到单个BIGINT列中,作为“掩码”,每位有一个bool,并且计算匹配的速度非常快,因为BIT_COUNT(〜(the_column ^ user_preferences))将计算多少位相等列和掩码给出了用户的首选项(如果PHP给你操作64位整数的问题,你可以使用两个32位的列,总结两位数仍然会非常快)。
#2
0
I'd be using two tables. One for the items and one for the boolean flags that match an item. Only make an entry in the 'flags' table for matches for an item. Then to get the number of matches for an item, would simply be a count of the records in the 'flags' table that match the itemId from the 'items' table.
我会用两张桌子。一个用于项目,一个用于与项目匹配的布尔标志。只在'flags'表中输入一个项目的匹配项。然后,为了获得项目的匹配数,只需要在'flags'表中与'items'表中的itemId匹配的记录计数。
#3
0
I don't think that is the best method to store this kind of information. It may look good visually but if all your storing is boolean values then i would create two tables and one link table with entries for each matching true value.
我不认为这是存储此类信息的最佳方法。它可能在视觉上看起来很好但是如果你所有的存储都是布尔值,那么我将创建两个表和一个链接表,每个匹配的真值都带有条目。
There is no overhead here as mysql prefers to search rows instead of columns. The count() function will come in handy then.
这里没有开销,因为mysql更喜欢搜索行而不是列。 count()函数将派上用场。
I'm pretty sure if it fails to find any match you will have to revert to PHP to run the search to find a match for 39 and so on. A recursive function would be a good way to do this.
我很确定如果找不到任何匹配,你将不得不恢复到PHP来运行搜索以找到39的匹配等等。递归函数是一种很好的方法。
e.g.
Table xOption id, name
表xOption id,name
table yOption id, name
table yOption id,name
table xOption_yOption xOption_id, yOption_id
table xOption_yOption xOption_id,yOption_id
The other good thing about this is you can easily add more X or Y options later to your grid and you could store more details about the Options too.
另一个好处是,您可以稍后在网格中轻松添加更多X或Y选项,您也可以存储有关选项的更多详细信息。
don't forget to use indexs too.
不要忘记也使用索引。
#1
3
A single table is surely right. You can store up to 64 boolean variables into a single BIGINT
column, as a "mask" with one bool per bit, and compute the match extremely fast as BIT_COUNT(~(the_column ^ user_preferences))
which will count how many bits are equal between the column and the mask giving the user's preferences (should PHP give you problems manipulating 64-bit integers, you can use two columnns of 32 bits each, summing the two bit counts will still be very fast).
一张桌子肯定是正确的。您可以将多达64个布尔变量存储到单个BIGINT列中,作为“掩码”,每位有一个bool,并且计算匹配的速度非常快,因为BIT_COUNT(〜(the_column ^ user_preferences))将计算多少位相等列和掩码给出了用户的首选项(如果PHP给你操作64位整数的问题,你可以使用两个32位的列,总结两位数仍然会非常快)。
#2
0
I'd be using two tables. One for the items and one for the boolean flags that match an item. Only make an entry in the 'flags' table for matches for an item. Then to get the number of matches for an item, would simply be a count of the records in the 'flags' table that match the itemId from the 'items' table.
我会用两张桌子。一个用于项目,一个用于与项目匹配的布尔标志。只在'flags'表中输入一个项目的匹配项。然后,为了获得项目的匹配数,只需要在'flags'表中与'items'表中的itemId匹配的记录计数。
#3
0
I don't think that is the best method to store this kind of information. It may look good visually but if all your storing is boolean values then i would create two tables and one link table with entries for each matching true value.
我不认为这是存储此类信息的最佳方法。它可能在视觉上看起来很好但是如果你所有的存储都是布尔值,那么我将创建两个表和一个链接表,每个匹配的真值都带有条目。
There is no overhead here as mysql prefers to search rows instead of columns. The count() function will come in handy then.
这里没有开销,因为mysql更喜欢搜索行而不是列。 count()函数将派上用场。
I'm pretty sure if it fails to find any match you will have to revert to PHP to run the search to find a match for 39 and so on. A recursive function would be a good way to do this.
我很确定如果找不到任何匹配,你将不得不恢复到PHP来运行搜索以找到39的匹配等等。递归函数是一种很好的方法。
e.g.
Table xOption id, name
表xOption id,name
table yOption id, name
table yOption id,name
table xOption_yOption xOption_id, yOption_id
table xOption_yOption xOption_id,yOption_id
The other good thing about this is you can easily add more X or Y options later to your grid and you could store more details about the Options too.
另一个好处是,您可以稍后在网格中轻松添加更多X或Y选项,您也可以存储有关选项的更多详细信息。
don't forget to use indexs too.
不要忘记也使用索引。