Maybe the solution is obvious, but I cant seem to find a good one.
也许解决方法是显而易见的,但我似乎找不到一个好的解决方案。
In my upcoming project, there will be one main table, its data will be read frequently. Update / Insert / Delete speed is not an issue.
在我即将到来的项目中,将有一个主表,它的数据将被频繁地读取。更新/插入/删除速度不是问题。
The items in that main table are associated to 4 or more categories. An item can have 50 - 100 or more relations within one category.
主表中的项与4个或多个类别相关联。一个项目可以有50 - 100或更多的关系在一个类别。
The most common operations that will be performed on the database:
将在数据库上执行的最常见操作:
- select all items that have been assigned to category A, B, C, ... with LIMIT X, Y
- 选择分配给A、B、C、…限制X,Y
- count all items that have been assignged to category A, B, C, ...
- 计算所有已分配给A、B、C类的项目。
My first thought on how to create a database for the above was something like this (classic approach I guess):
关于如何为上面的内容创建一个数据库,我的第一个想法是这样的(我猜是典型的方法):
First, for each of the four categories I create a category
table:
首先,我为这四个类别中的每一个创建了一个类别表:
id - PK, int(11), index
name - varchar(100)
then I will have one item
table:
然后我会有一个项目表:
id - PK, int(11), index
... some more data fields, about 30 or so ...
and to relate the category
tables, there will be 4 or more lookup / MM tables like so:
为了将类别表关联起来,将有4个或更多的查找/ MM表如下:
id_item - int(11)
id_category - int(11)
The queries looked something like this:
查询看起来是这样的:
select
item.*
from
item
inner mm_1 on mm_1.id_item = item.id
inner join cat_1 on cat_1.id = mm_1.id_category and cat_1.id in (1, 2, ... , 100)
inner mm_2 on mm_2.id_item = item.id
inner join cat_2 on cat_2.id = mm_2.id_category and cat_2.id in (50, 51, ... , 90)
Of course the above approach with MM tables would work, but as the app should provide very good SELECT
performance, I tested it with real world amounts of data (100.000 records in the item
table, 50 - 80 relations in each category), but it was not as fast as I expected, even with indexes in place. I also tried using WHERE EXISTS
instead of INNER JOIN
when selecting.
当然与毫米表上面的方法将工作,但随着应用程序应该提供很好的选择性能,我测试了它与现实世界的数据量(100.000项表中的记录,50 - 80在每个类别的关系),但它不像我预期的那么快,即使有索引。我还尝试在选择时使用WHERE exist而不是INNER JOIN。
My second idea was to just use the item
table from above denormalize the data.
我的第二个想法是使用上面的item表来反规范化数据。
After reading this blog post about using bitmasks I gave it a try and assigned each category a bit value:
在阅读了这篇关于使用位掩码的博文后,我尝试了一下,给每个类别分配了一个位值:
category 1.1 - 1
category 1.2 - 2
category 1.3 - 4
category 1.4 - 8
... etc ...
So, if an item
was tagged with category 1.1
and category 1.3
, it had a bitmask of 5
, which I then stored in a field item.bitmask
and I can query it like so:
因此,如果一个项目被标记为1.1和1.3类别,那么它有一个5的位掩码,然后我将其存储在一个字段项中。位掩码,我可以这样查询:
select count(*) from item where item.bitmask & 5 = 5
But performance was not so great either.
但表现也不是很好。
The problems with this bitmasking approach: mysql does NOT use any indexes when bit operators are involved and even when item.bitmask
would be of type BIGINT
I can only handle up to 64 relations, but I need to support up to 100 per category.
这种位屏蔽方法的问题是:当涉及位操作符,甚至是项时,mysql都不使用任何索引。位掩码将是BIGINT类型,我只能处理64个关系,但是我需要支持每个类别最多100个。
That was about it. I cant think of anything more except maybe polluting the item
table with many, many fields like category_1_1
up to category_4_100
each of the contains either 1 or 0. But that could lead to many AND
in the WHERE
clause of the select and that does not seem like a good idea, too.
也就这么多了。我想不出什么了,除了可能用很多很多字段(比如category_1_1到category_4_100)污染条目表之外,每个字段要么包含1要么包含0。但是这可能会导致很多和WHERE子句的选择,这似乎也不是一个好主意。
So, what are my options? Any better ideas out there?
那么,我有什么选择呢?有更好的主意吗?
EDIT: as an response to Cory Petosky comment "What does "An item can have 50 - 100 or more relations within one category." mean?":
编辑:作为对Cory Petosky评论“What does”的回应,一个条目在一个类别中可以有50 - 100个或更多的关系。
To make it more concrete, the item
table represents an image. Images are among other criterias categorized in moods (mood would be one of 4 categories). So it would look like this:
为了使它更具体,item表表示一个图像。图像是情绪分类的其他标准之一(心情是四种类型之一)。看起来是这样的:
Image:
- Category "mood":
- bright
- happy
- funny
- ... 50 or so more ...
- Category "XYZ":
- ... 70 or so more ...
If my image table would be a class in C#, it would look like this:
如果我的图像表是c#中的类,它会是这样的:
public class Image {
public List<Mood> Moods; // can contain 0 - 100 items
public List<Some> SomeCategory; // can contain 0 - 100 items
// ...
}
3 个解决方案
#1
2
What about this (pseudocode):
这(伪代码):
Item (image)
Id PK, int(11)
Name varchar(100)
Category (mood, xyz)
Id PK, int(11)
Name varchar(100)
Relations (happy, funny)
Id PK, int(11)
Name varchar(100)
ItemCategories
Id PK, int(11)
ItemId FK, int(11)
CategoryId FK, int(11)
ItemCategoryRelations
ItemCategoriesId FK, int(11)
RelationId FK, int(11)
SELECT *
FROM Item
JOIN ItemCategories ON Item.Id = ItemCategories.ItemId
WHERE ItemCategories.CategoryId IN (1, 2, ..., 10)
Below version uses one less table but doesn't supports categories without relations, and relations can't be reused. So, its just valid if matches your data structure requirements:
下面的版本使用了一个较少的表,但是不支持没有关系的类别,并且关系不能被重用。因此,如果符合您的数据结构要求,它就是有效的:
Item (image)
Id PK, int(11)
Name varchar(100)
Category (mood, xyz)
Id PK, int(11)
Name varchar(100)
Relations (happy, funny)
Id PK, int(11)
CategoryId FK, int(11)
Name varchar(100)
ItemRelations
ItemId FK, int(11)
RelationId FK, int(11)
SELECT *
FROM Item
JOIN ItemRelations ON Item.Id = ItemRelations.ItemId
JOIN Relations ON Relations.Id = ItemRelations.RelationsId
WHERE Relations.CategoryId IN (1, 2, ..., 10)
#2
1
How about this one; each category can have parent category. In your example, if bright
is a child of mood
then linking an item to bright
would automatically make it mood\bright
. alt text http://www.damirsystems.com/dp_images/itemcategory_model_01.png
这个呢;每个类别都可以有父类别。在你的例子中,如果bright是mood的子元素,那么将一个项目链接到bright就会自动使mood\bright明亮。alt文本http://www.damirsystems.com/dp_images/itemcategory_model_01.png
#3
0
So if I understand right, an image falls into one of four of your main categories...mood for example. Then within mood it can be linked to 'bright' and 'happy.' and so on.
所以如果我理解正确的话,一个图像就属于你的四个主要类别之一……情绪为例。然后在心情中,它可以与“光明”和“快乐”联系在一起。”等等。
While I absolutely love bitmasking (microprocessor programmer here by day), and while I always seem to love applying it to db design as well, there always seems to be a better way.
虽然我绝对喜欢位屏蔽(白天这里是微处理器程序员),而且我似乎总是喜欢将它应用到db设计中,但似乎总有更好的方法。
How about something like this.
像这样的东西怎么样?
tblItems
------------------
item_id
item_name
tblCategories
------------------
category_id
category_name
tblRelations
------------------
relation_id
relation_name
tblCategoryRelationLink (link relations to specific categories)
------------------
cat_rel_id
category_id
relation_id
tblItemRelationLink (set relations to items)
------------------
item_rel_id
item_id
rel_id
If your relations are specific to categories....then you can simply lookup which category a specific relation is linked to. If somehow you can have a relation linked to two categories, then you would need an extra table as well (to link an item to a category).
如果你的关系是特定类别....然后,您可以简单地查找特定关系链接到的类别。如果可以将一个关系链接到两个类别,那么还需要一个额外的表(将一个项目链接到一个类别)。
#1
2
What about this (pseudocode):
这(伪代码):
Item (image)
Id PK, int(11)
Name varchar(100)
Category (mood, xyz)
Id PK, int(11)
Name varchar(100)
Relations (happy, funny)
Id PK, int(11)
Name varchar(100)
ItemCategories
Id PK, int(11)
ItemId FK, int(11)
CategoryId FK, int(11)
ItemCategoryRelations
ItemCategoriesId FK, int(11)
RelationId FK, int(11)
SELECT *
FROM Item
JOIN ItemCategories ON Item.Id = ItemCategories.ItemId
WHERE ItemCategories.CategoryId IN (1, 2, ..., 10)
Below version uses one less table but doesn't supports categories without relations, and relations can't be reused. So, its just valid if matches your data structure requirements:
下面的版本使用了一个较少的表,但是不支持没有关系的类别,并且关系不能被重用。因此,如果符合您的数据结构要求,它就是有效的:
Item (image)
Id PK, int(11)
Name varchar(100)
Category (mood, xyz)
Id PK, int(11)
Name varchar(100)
Relations (happy, funny)
Id PK, int(11)
CategoryId FK, int(11)
Name varchar(100)
ItemRelations
ItemId FK, int(11)
RelationId FK, int(11)
SELECT *
FROM Item
JOIN ItemRelations ON Item.Id = ItemRelations.ItemId
JOIN Relations ON Relations.Id = ItemRelations.RelationsId
WHERE Relations.CategoryId IN (1, 2, ..., 10)
#2
1
How about this one; each category can have parent category. In your example, if bright
is a child of mood
then linking an item to bright
would automatically make it mood\bright
. alt text http://www.damirsystems.com/dp_images/itemcategory_model_01.png
这个呢;每个类别都可以有父类别。在你的例子中,如果bright是mood的子元素,那么将一个项目链接到bright就会自动使mood\bright明亮。alt文本http://www.damirsystems.com/dp_images/itemcategory_model_01.png
#3
0
So if I understand right, an image falls into one of four of your main categories...mood for example. Then within mood it can be linked to 'bright' and 'happy.' and so on.
所以如果我理解正确的话,一个图像就属于你的四个主要类别之一……情绪为例。然后在心情中,它可以与“光明”和“快乐”联系在一起。”等等。
While I absolutely love bitmasking (microprocessor programmer here by day), and while I always seem to love applying it to db design as well, there always seems to be a better way.
虽然我绝对喜欢位屏蔽(白天这里是微处理器程序员),而且我似乎总是喜欢将它应用到db设计中,但似乎总有更好的方法。
How about something like this.
像这样的东西怎么样?
tblItems
------------------
item_id
item_name
tblCategories
------------------
category_id
category_name
tblRelations
------------------
relation_id
relation_name
tblCategoryRelationLink (link relations to specific categories)
------------------
cat_rel_id
category_id
relation_id
tblItemRelationLink (set relations to items)
------------------
item_rel_id
item_id
rel_id
If your relations are specific to categories....then you can simply lookup which category a specific relation is linked to. If somehow you can have a relation linked to two categories, then you would need an extra table as well (to link an item to a category).
如果你的关系是特定类别....然后,您可以简单地查找特定关系链接到的类别。如果可以将一个关系链接到两个类别,那么还需要一个额外的表(将一个项目链接到一个类别)。