从历史EAV数据库查询的最快方法是什么

时间:2022-01-09 06:44:56

Standard EAV schema : One column for Entity ID, one for Attribute ID, one for Value ID.

标准EAV架构:一列用于实体ID,一列用于属性ID,一列用于值ID。

Historical EAV schema : Add an additional column(s) for times/date-ranges

历史EAV架构:为时间/日期范围添加其他列

At run time, certain rows will be excluded. There may be 0, 1, or many rows returned per entity, per attribute. We only want the most recent value for each attribute remaining.

在运行时,将排除某些行。每个属性每个实体可能返回0,1或许多行。我们只希望保留每个属性的最新值。

Our current solution is using the SQL Server Rank() function to mark each row with a rank, and then in the where clause we have "and rank = 1".

我们当前的解决方案是使用SQL Server Rank()函数来标记具有排名的每一行,然后在where子句中使用“and rank = 1”。

However, performance is not satisfactory. During analysis we find that assigning the ranks is quite fast, however doing the where clause against the rank requires a second scan of the data, and keeps the entire data set in RAM.

但是,表现并不令人满意。在分析期间,我们发现分配排名非常快,但是对排名执行where子句需要对数据进行第二次扫描,并将整个数据集保存在RAM中。

What is the fastest way to rank the remaining attribute rows, and return only the latest?

对剩余属性行进行排名的最快方法是什么,只返回最新的?

2 个解决方案

#1


The general idea would be to extract the latest + key first, then join back to get value which is not part of the aggregate. The fact it's EAV does not matter.

一般的想法是首先提取最新的+键,然后加入以获得不属于聚合的值。它的EAV无关紧要。

SELECT
    *
FROM
    table t
    JOIN
    (SELECT MAX(dt) AS mdt, eID, aID FROM table GROUP BY eID, aID) mt
                     ON t.eID = mt.eID AND t.aID = mt.aID AND t.dt = mt.mdt
WHERE
    ...

#2


While I think gbn's answer is probably sufficient, I'm wondering whether use of an OVER clause to establish a MAX date per id/attribute with which to reduce the SELECT in a WHERE clause wouldn't be faster than a RANK? No time to test performance, but here's the query:

虽然我认为gbn的答案可能就足够了,但我想知道是否使用OVER子句来建立每个id /属性的MAX日期来减少WHERE子句中的SELECT并不比RANK快?没时间测试性能,但这是查询:

select * 
from (
  select *, max(dt) over (partition by eID, aID) maxdt 
    from table
) t
where t.dt = t.maxdt and ...

Good luck!

#1


The general idea would be to extract the latest + key first, then join back to get value which is not part of the aggregate. The fact it's EAV does not matter.

一般的想法是首先提取最新的+键,然后加入以获得不属于聚合的值。它的EAV无关紧要。

SELECT
    *
FROM
    table t
    JOIN
    (SELECT MAX(dt) AS mdt, eID, aID FROM table GROUP BY eID, aID) mt
                     ON t.eID = mt.eID AND t.aID = mt.aID AND t.dt = mt.mdt
WHERE
    ...

#2


While I think gbn's answer is probably sufficient, I'm wondering whether use of an OVER clause to establish a MAX date per id/attribute with which to reduce the SELECT in a WHERE clause wouldn't be faster than a RANK? No time to test performance, but here's the query:

虽然我认为gbn的答案可能就足够了,但我想知道是否使用OVER子句来建立每个id /属性的MAX日期来减少WHERE子句中的SELECT并不比RANK快?没时间测试性能,但这是查询:

select * 
from (
  select *, max(dt) over (partition by eID, aID) maxdt 
    from table
) t
where t.dt = t.maxdt and ...

Good luck!