在大型视图上选择MySQL GROUP BY或DISTINCT

时间:2022-12-30 04:30:22

Consider a view consisting of several tables... for example a v_active_car, which is made up of the tables car joined on to body, engine, wheels and stereo. It might look something like this:

考虑一个由几个表组成的视图...例如v_active_car,它由汽车连接到车身,发动机,车轮和立体声的表组成。它可能看起来像这样:

v_active_cars view

SELECT * FROM car
    INNER JOIN body ON car.body = body.body_id
    INNER JOIN engine ON car.engine = engine.engine_id
    INNER JOIN wheels ON car.wheels = wheels.wheels_id
    INNER JOIN stereo ON car.stereo = stereo.stereo_id
    WHERE car.active = 1
    AND engine.active = 1
    AND wheels.active = 1
    AND stereo.active = 1

Each component of the car has an "active" flag. Now, I need to find all the stereos that are available in active cars. To do this in need to use the whole view, not just the stereo table - just because a stereo is active doesn't mean it's available in a car.

汽车的每个部件都有一个“活动”标志。现在,我需要找到活动车中可用的所有立体声音响。要做到这一点,需要使用整个视图,而不仅仅是立体声表 - 只是因为立体声是活跃的并不意味着它可以在汽车中使用。

So I can do

所以我能做到

SELECT DISTINCT stereo_id FROM v_active_cars

Even though this may return a very small number of rows, it's stil a very slow query.

即使这可能返回非常少量的行,但它仍然是一个非常慢的查询。

I've tried this, but it's even slower:

我试过这个,但它甚至更慢:

SELECT stereo_id FROM stereo WHERE EXISTS
(SELECT 1 FROM v_active_cars WHERE stereo_id = stereo.stereo_id)

Is there anything else I could do to make this faster?

我还能做些什么来加快速度吗?

4 个解决方案

#1


  1. make sure that there are indexes for all the JOINs
    • in your case, each level is selected both by a key, and a flag. adding the flag as part of the index might allow the DB to use only the index, instead of reading the whole record
    • 在您的情况下,每个级别都由一个键和一个标志选择。将标志添加为索引的一部分可能允许DB仅使用索引,而不是读取整个记录

    • make sure you have enough RAM to hold the resultset. InnoDB tables in particular have lots of knobs that you have to tune. most of the defaults assume very old hardware and too little RAM.
    • 确保你有足够的RAM来保存结果集。特别是InnoDB表有很多旋钮需要调整。大多数默认值假设非常旧的硬件和太少的RAM。

  2. 确保在你的情况下有所有JOIN的索引,每个级别都由一个键和一个标志选择。将标志添加为索引的一部分可能允许DB仅使用索引,而不是读取整个记录,确保您有足够的RAM来保存结果集。特别是InnoDB表有很多旋钮需要调整。大多数默认值假设非常旧的硬件和太少的RAM。

#2


You seem to be doing everything right. The next step would be checking index coverage.

你好像做得对。下一步是检查索引覆盖率。

#3


Try this:

SELECT stereo_id
FROM stereo s, (
  SELECT *
  FROM v_active_cars
  ORDER BY stereo_id
  ) v
WHERE s.active = 1
  AND v.stereo = s.stereo_id

ORDER BY here should prevent pushing predicate into the view, and the optimizer should select a hash join.

ORDER BY在这里应该阻止将谓词推入视图,优化器应该选择散列连接。

#4


You can try creating a view for each part showing only the active ones and then join to those. eg.

您可以尝试为每个仅显示活动部分的部分创建视图,然后加入这些视图。例如。

VIEW activeCar
SELECT * FROM car WHERE car.active = 1

VIEW activeEngine
SELECT * FROM engine WHERE engine.active = 1

Then your final view can be

那么你的最终观点可以是

SELECT * FROM activeCar
INNER JOIN activeEngine ON activeCar.engine = activeEngine.engine_id

Obviously make sure you have an index on the active column.

显然,请确保您在活动列上有索引。

Another alternative is to have an index on both the id and the active flag. You can then perform the active=1 when joining. This way only one index is used to join rather than one for the id and one for active.

另一种方法是在id和活动标志上都有一个索引。然后,您可以在加入时执行active = 1。这样,只有一个索引用于连接而不是一个用于id,一个用于活动。

SELECT * FROM car
INNER JOIN body ON car.body = body.body_id AND body.active = 1
INNER JOIN engine ON car.engine = engine.engine_id AND engine.active = 1
INNER JOIN wheels ON car.wheels = wheels.wheels_id AND wheels.active = 1
INNER JOIN stereo ON car.stereo = stereo.stereo_id AND stereo.active = 1

#1


  1. make sure that there are indexes for all the JOINs
    • in your case, each level is selected both by a key, and a flag. adding the flag as part of the index might allow the DB to use only the index, instead of reading the whole record
    • 在您的情况下,每个级别都由一个键和一个标志选择。将标志添加为索引的一部分可能允许DB仅使用索引,而不是读取整个记录

    • make sure you have enough RAM to hold the resultset. InnoDB tables in particular have lots of knobs that you have to tune. most of the defaults assume very old hardware and too little RAM.
    • 确保你有足够的RAM来保存结果集。特别是InnoDB表有很多旋钮需要调整。大多数默认值假设非常旧的硬件和太少的RAM。

  2. 确保在你的情况下有所有JOIN的索引,每个级别都由一个键和一个标志选择。将标志添加为索引的一部分可能允许DB仅使用索引,而不是读取整个记录,确保您有足够的RAM来保存结果集。特别是InnoDB表有很多旋钮需要调整。大多数默认值假设非常旧的硬件和太少的RAM。

#2


You seem to be doing everything right. The next step would be checking index coverage.

你好像做得对。下一步是检查索引覆盖率。

#3


Try this:

SELECT stereo_id
FROM stereo s, (
  SELECT *
  FROM v_active_cars
  ORDER BY stereo_id
  ) v
WHERE s.active = 1
  AND v.stereo = s.stereo_id

ORDER BY here should prevent pushing predicate into the view, and the optimizer should select a hash join.

ORDER BY在这里应该阻止将谓词推入视图,优化器应该选择散列连接。

#4


You can try creating a view for each part showing only the active ones and then join to those. eg.

您可以尝试为每个仅显示活动部分的部分创建视图,然后加入这些视图。例如。

VIEW activeCar
SELECT * FROM car WHERE car.active = 1

VIEW activeEngine
SELECT * FROM engine WHERE engine.active = 1

Then your final view can be

那么你的最终观点可以是

SELECT * FROM activeCar
INNER JOIN activeEngine ON activeCar.engine = activeEngine.engine_id

Obviously make sure you have an index on the active column.

显然,请确保您在活动列上有索引。

Another alternative is to have an index on both the id and the active flag. You can then perform the active=1 when joining. This way only one index is used to join rather than one for the id and one for active.

另一种方法是在id和活动标志上都有一个索引。然后,您可以在加入时执行active = 1。这样,只有一个索引用于连接而不是一个用于id,一个用于活动。

SELECT * FROM car
INNER JOIN body ON car.body = body.body_id AND body.active = 1
INNER JOIN engine ON car.engine = engine.engine_id AND engine.active = 1
INNER JOIN wheels ON car.wheels = wheels.wheels_id AND wheels.active = 1
INNER JOIN stereo ON car.stereo = stereo.stereo_id AND stereo.active = 1