以关系表的形式检索MySQL EAV结果的最佳性能是什么

时间:2021-06-05 12:59:14

I want to extract results from EAV (entity-attribute-value) tables, or more specifically entity-metadata tables (think like wordpress wp_posts and wp_postmeta) as a "nicely formatted relational table", in order to do some sorting and/or filtering.

我想从EAV(实体 - 属性 - 值)表中提取结果,或者更具体地说是实体 - 元数据表(想像wordpress wp_posts和wp_postmeta)作为“格式良好的关系表”,以便进行一些排序和/或过滤。

I've found some examples of how to format the results within the query (as opposed to writing 2 queries and joining the results in code), but I would like to know the "most efficient" method for doing so, especially for larger result sets.

我已经找到了一些如何在查询中格式化结果的例子(而不是编写2个查询并在代码中加入结果),但我想知道“最有效”的方法,特别是对于更大的结果集。

And when I say "most efficient", I mean for something like the following scenarios:

当我说“效率最高”时,我的意思是出现以下情况:

Get all Entities with last name like XYZ

获取姓氏为XYZ的所有实体

Return a list of Entities sorted by birthday

返回按生日排序的实体列表


e.g. turn this:

例如转过来:

** ENTITY **
-----------------------
ID  | NAME | whatever
-----------------------
 1  | bob  | etc
 2  | jane | etc
 3  | tom  | etc

** META **
------------------------------------
ID | EntityID | KEY         | VALUE
------------------------------------
 1 |   1      | first name  | Bob
 2 |   1      | last name   | Bobson
 3 |   1      | birthday    | 1983-10-10
 . |   2      | first name  | Jane
 . |   2      | last name   | Janesdotter
 . |   2      | birthday    | 1983-08-10
 . |   3      | first name  | Tom
 . |   3      | last name   | Tomson
 . |   3      | birthday    | 1980-08-10

into this:

进入这个:

** RESULTS **
-----------------------------------------------
EID | NAME | first name | last name    | birthday
-----------------------------------------------
 1  | bob  | Bob        | Bobson       | 1983-10-10
 2  | jane | Jane       | Janesdotter  | 1983-08-10
 3  | tom  | Tom        | Tomson       | 1980-08-10

so I can sort or filter by any of the meta fields.

所以我可以按任何元字段排序或过滤。


I found some suggestions here, but I can't find any discussion of which performs better.

我在这里找到了一些建议,但是我找不到任何关于哪个更好的讨论。

Options:

选项:

  1. GROUP_CONCAT:
    SELECT e.*, GROUP_CONCAT( CONCAT_WS('||', m.KEY, m.VALUE) ORDER BY m.KEY SEPARATOR ';;' )
    FROM `ENTITY` e JOIN `META` m ON e.ID = m.EntityID
    
  2. GROUP_CONCAT: SELECT e。*,GROUP_CONCAT(CONCAT_WS('||',m.KEY,m.VALUE)ORDER BY m.KEY SEPARATOR';;') 来自`ENTITY` e JOIN`META` m ON e.ID = m.EntityID
  3. Multi-Join:
    SELECT e.*, m1.VALUE as 'first name', m2.VALUE as 'last name', m3.VALUE as 'birthday'
    FROM `ENTITY` e
    LEFT JOIN `META` m1
        ON e.ID = m1.EntityID AND m1.meta_key = 'first name'
    LEFT JOIN `META` m2
        ON e.ID = m2.EntityID AND m2.meta_key = 'last name'
    LEFT JOIN `META` m3
        ON e.ID = m3.EntityID AND m3.meta_key = 'birthday'
    
  4. 多加入: 选择e。*,m1.VALUE作为'名字',m2.VALUE作为'姓氏',m3.VALUE作为'生日' 来自`ENTITY` e LEFT JOIN`META` m1     ON e.ID = m1.EntityID AND m1.meta_key ='first name' LEFT JOIN`META` m2     ON e.ID = m2.EntityID AND m2.meta_key ='姓氏' LEFT JOIN`META` m3     ON e.ID = m3.EntityID AND m3.meta_key ='birthday'
  5. Coalescing:
    SELECT e.*
       , MAX( IF(m.KEY= 'first name', m.VALUE, NULL) ) as 'first name'
       , MAX( IF(m.KEY= 'last name', m.VALUE, NULL) ) as 'last name'
       , MAX( IF(m.KEY= 'birthday', m.VALUE, NULL) ) as 'birthday'
    FROM `ENTITY` e
    JOIN `META` m
        ON e.ID = m.EntityID
    
  6. 聚结: 选择e。*    ,MAX(IF(m.KEY ='名字',m.VALUE,NULL))作为'名字'    ,MAX(IF(m.KEY ='姓氏',m.VALUE,NULL))作为'姓氏'    ,MAX(IF(m.KEY ='birthday',m.VALUE,NULL))为'生日' 来自`ENTITY` e 加入`META` m     ON e.ID = m.EntityID
  7. Code:
    SELECT e.* FROM `ENTITY` e WHERE e.ID = {whatever};
    
    in PHP, create a placeholder object from result
    SELECT m.* FROM `META` m WHERE m.EntityID = {whatever};
    
    in PHP, loop through results and attach to entity object like: $e->{$result->key} = $result->VALUE
  8. 码: SELECT e。* FROM`ENTITY` e WHERE e.ID = {whatever};  在PHP中,从result创建一个占位符对象 SELECT m。* FROM`META` m WHERE m.EntityID = {whatever};  在PHP中,循环遍历结果并附加到实体对象,如:$ e - > {$ result-> key} = $ result-> VALUE

Which is better in general, and for filtering/sorting?

哪个更好,一般来说,过滤/排序?

Related questions:

相关问题:

  1. Binding EAV results
  2. 绑定EAV结果
  3. How to Pivot a MySQL entity
  4. 如何转动MySQL实体

2 个解决方案

#1


0  

Anything using pivot or aggregates will probably be faster, as they don't require the table to be self-joined. The join based approaches will require the optimiser to perform several sub-query operations and then join the results together. For a small data set this might not matter so much, but this could significantly degrade performance if you're doing an analytic query on a larger data set,

任何使用数据透视或聚合的东西都可能更快,因为它们不需要表自行连接。基于连接的方法将要求优化器执行多个子查询操作,然后将结果连接在一起。对于小型数据集,这可能无关紧要,但如果您对较大的数据集进行分析查询,这可能会显着降低性能,

#2


1  

The best way to find out would be to test, off course. The answer may be different depending on the size of the dataset, the number of different meta-keys, their distribution (do all entities have values for all meta-keys? or only for a few of them?), the settings of your database server and possibly many other factors.

找出答案的最佳方法是测试,当然。答案可能会有所不同,具体取决于数据集的大小,不同元键的数量,它们的分布(所有实体都有所有元键的值?还是只有少数几个?),数据库的设置服务器和可能的许多其他因素。

If I were to guess, I'd say that the cost of the JOIN operations in option 2 would be smaller than the cost of GROUP BY and aggregate functions needed in options 1 and 3.

如果我猜测,我会说选项2中的JOIN操作的成本将小于选项1和3中所需的GROUP BY和聚合函数的成本。

So, I would expect to find Option 2 faster than 1 and 3.

所以,我希望找到比1和3更快的选项2。

To measure Option 4, you'll have to consider more factors as the application may be in another server so the loads of the two (db and application) servers and the number of clients that will be requesting these results have to be taken into account.

要测量选项4,您必须考虑更多因素,因为应用程序可能位于另一个服务器中,因此必须考虑两个(数据库和应用程序)服务器的负载以及将请求这些结果的客户端数量。


Sidenote: you need GROUP BY e.ID in options 1 and 3.

旁注:在选项1和3中需要GROUP BY e.ID。

#1


0  

Anything using pivot or aggregates will probably be faster, as they don't require the table to be self-joined. The join based approaches will require the optimiser to perform several sub-query operations and then join the results together. For a small data set this might not matter so much, but this could significantly degrade performance if you're doing an analytic query on a larger data set,

任何使用数据透视或聚合的东西都可能更快,因为它们不需要表自行连接。基于连接的方法将要求优化器执行多个子查询操作,然后将结果连接在一起。对于小型数据集,这可能无关紧要,但如果您对较大的数据集进行分析查询,这可能会显着降低性能,

#2


1  

The best way to find out would be to test, off course. The answer may be different depending on the size of the dataset, the number of different meta-keys, their distribution (do all entities have values for all meta-keys? or only for a few of them?), the settings of your database server and possibly many other factors.

找出答案的最佳方法是测试,当然。答案可能会有所不同,具体取决于数据集的大小,不同元键的数量,它们的分布(所有实体都有所有元键的值?还是只有少数几个?),数据库的设置服务器和可能的许多其他因素。

If I were to guess, I'd say that the cost of the JOIN operations in option 2 would be smaller than the cost of GROUP BY and aggregate functions needed in options 1 and 3.

如果我猜测,我会说选项2中的JOIN操作的成本将小于选项1和3中所需的GROUP BY和聚合函数的成本。

So, I would expect to find Option 2 faster than 1 and 3.

所以,我希望找到比1和3更快的选项2。

To measure Option 4, you'll have to consider more factors as the application may be in another server so the loads of the two (db and application) servers and the number of clients that will be requesting these results have to be taken into account.

要测量选项4,您必须考虑更多因素,因为应用程序可能位于另一个服务器中,因此必须考虑两个(数据库和应用程序)服务器的负载以及将请求这些结果的客户端数量。


Sidenote: you need GROUP BY e.ID in options 1 and 3.

旁注:在选项1和3中需要GROUP BY e.ID。