多表选择使用where where子句,有没有比自联接更好的方法?

时间:2021-05-03 15:52:01

I've got a bunch of tables that I'm joining using a unique item id. The majority of the where clause conditions will be built programatically from a user sumbitted form (search box) and multiple conditions will often be tested against the same table, in this case item tags.

我有一堆表,我正在使用一个唯一的项目ID加入。大多数where子句条件将以编程方式从用户求和形式(搜索框)构建,并且多个条件通常将针对同一个表进行测试,在本例中为项标签。

My experience with SQL is minimal, but I understand the basics. I want to find the ids of active (status=1) items that have been tagged with a tag of a certain type, with the values "cats" and "kittens". Tags are stored as (id, product_id, tag_type_id, value), with id being the only column requiring a unique value. My first attempt was;

我对SQL的经验很少,但我理解基础知识。我想找到标记有特定类型标记的活动(status = 1)项的ID,其值为“cats”和“kittens”。标签存储为(id,product_id,tag_type_id,value),id是唯一需要唯一值的列。我的第一次尝试是;

   select 
      distinct p2c.product_id 
   from '.TABLE_PRODUCT_TO_CATEGORY.' p2c
      inner join '.TABLE_PRODUCT.' p on p2c.product_id = p.id 
      inner join '.TABLE_PRODUCT_TAG.' pt on p.id = pt.product_id
      inner join '.TABLE_TAG_TYPE.' tt on pt.tag_type_id = tt.id
   where 
      tt.id = '.PRODUCT_TAG_TYPE_FREE_TAG.'
      and p.status = 1
      and lower(pt.value) = "cats"
      and lower(pt.value) = "kittens"

but that returned nothing. I realised that the final AND condition was the problem, so tried using a self-join instead;

但那没有任何回报。我意识到最终的AND条件是问题,所以尝试使用自连接;

   select 
      distinct p2c.product_id 
   from '.TABLE_PRODUCT_TO_CATEGORY.' p2c
      inner join '.TABLE_PRODUCT.' p on p2c.product_id = p.id 
      inner join '.TABLE_PRODUCT_TAG.' pt on p.id = pt.product_id
      inner join '.TABLE_PRODUCT_TAG.' pt2 on p.id = pt2.product_id
      inner join '.TABLE_TAG_TYPE.' tt on pt.tag_type_id = tt.id
   where 
      tt.id = '.PRODUCT_TAG_TYPE_FREE_TAG.'
      and p.status = 1
      and lower(pt.value) = "cats"
      and lower(pt2.value) = "kittens"

Now everything works as expected and the result set is correct. So what do I want to know? To re-iterate, the results I'm after are the ids of active (status = 1) items that have been tagged with a tag of a certain type, with the values "cats" AND "kittens"...

现在一切都按预期工作,结果集是正确的。那么我想知道什么?要重新迭代,我所追求的结果是活动(状态= 1)项目的ID,这些项目已被标记为某种类型的标记,其值为“cats”和“kittens”......

  1. Are self-joins the best way of achieving these results?
  2. 自我加入是实现这些结果的最佳方式吗?

  3. This query has the potential to be huge (I've omitted a category condition, of which there may be ~300), so does this self-join approach scale well? If not, is there an alternative?
  4. 这个查询有可能是巨大的(我省略了一个类别条件,其中可能有~300),那么这种自连接方法是否可以很好地扩展?如果没有,有替代方案吗?

  5. Will the self-join approach be the best way forward (assuming there is an alternative) if I allow users to specify complex tag searches? ie "cats" and ("kittens" or "dogs") not "parrots".
  6. 如果我允许用​​户指定复杂的标签搜索,自加入方法是否是最好的前进方式(假设有替代方法)?即“猫”和(“小猫”或“狗”)不是“鹦鹉”。

7 个解决方案

#1


The problem with the initial query was this:

初始查询的问题是:

  and lower(pt.value) = "cats"
  and lower(pt.value) = "kittens"

There exists no tag for which the value is both "cats" and "kittens", therefore no records will be returned. Using an IN clause as SQLMenace suggests would be the solution - that way you're saying, "give me back any active item that has been tagged 'cats' or 'kittens'".

没有标记的值为“cats”和“kittens”,因此不会返回任何记录。使用IN子句作为SQLMenace建议将是解决方案 - 你说的方式,“给我回复任何被标记为'猫'或'小猫'的活动项目”。

But if you want any active item that has BOTH tags - then you need to do something like your second query. It's not perfectly clear from your question if that's what you're after.

但是如果你想要任何有BOTH标签的活动项目 - 那么你需要做类似你的第二个查询。从您的问题来看,这是不是很清楚,如果这就是你所追求的。

For something like your Question #3:

对于像你的问题#3这样的东西:

"cats" and ("kittens" or "dogs") not "parrots".

“猫”和(“小猫”或“狗”)不是“鹦鹉”。

you would want pt1, pt2, and (in a subquery) pt3, and something like this:

你会想要pt1,pt2和(在子查询中)pt3,以及类似的东西:

and lower(pt1.value) = "cats"
and lower(pt2.value) in ("kittens", "dogs")
and not exists (select * from '.TABLE_PRODUCT_TAG.' pt3 where pt3.product_id = p.id and lower(pt3.value) = "parrots")

The broadly general case could get quite messy...

广泛的一般情况可能会变得非常混乱......

#2


wouldn't this work in your first query?

这不会在你的第一个查询中工作吗?

instead of

and lower(pt.value) = "cats"
and lower(pt.value) = "kittens"

do this

and lower(pt.value) in ("cats","kittens")

#3


select 
  distinct p2c.product_id
from '.TABLE_PRODUCT_TO_CATEGORY.' p2c
  inner join '.TABLE_PRODUCT.' p on p2c.product_id = p.id
  inner join '.TABLE_PRODUCT_TAG.' pt on p.id = pt.product_id
  inner join '.TABLE_TAG_TYPE.' tt on pt.tag_type_id = tt.id   
where 
  tt.id = '.PRODUCT_TAG_TYPE_FREE_TAG.'
  and p.status = 1  
  and (lower(pt.value) = "cats" or lower(pt.value) = "kittens")

#4


Your answer is "yes, that's a scalable technique". As far as adding complexity, I think you'll overreach your users' ability to understand what they are doing before you have an efficient-query problem.

你的答案是“是的,这是一种可扩展的技术”。至于增加复杂性,我认为在你遇到高效查询问题之前,你会超出用户理解他们正在做什么的能力。

#5


Ok, let me re-state the question to make sure I understand:

好的,让我重新陈述这个问题,以确保我理解:

You are trying to show all products that have two different specific tags ("cats" and "kittens") but the tags are stored in a 1-to-many table.

您试图显示具有两个不同特定标签(“猫”和“小猫”)的所有产品,但标签存储在1对多表中。

The double-join does work, but here's another alternative:

双连接确实有效,但这是另一种选择:

SELECT ...
FROM P
WHERE p.status = 1
  AND p.ProductID IN (SELECT Product_ID FROM tags WHERE value = "cats")
  AND p.ProductID IN (SELECT Product_ID FROM tags WHERE value = "kittens")

Just add additional AND statements depending on the options the user selects.

只需添加其他AND语句,具体取决于用户选择的选项。

The SQL optimizer should actually treat this the same way it treats a join, so I don't think performance would scale any worse than your version. Worth testing with your dataset, though, to make sure.

SQL优化器实际上应该像处理连接一样处理它,所以我认为性能不会比你的版本更糟糕。但是,值得测试您的数据集,以确保。

#6


You're building yet another entity-attribute-value data model. Since you asked about scalability, here's a warning: EAV models usually don't scale and don't perform on top of RDBMS. Ultimately this 'flexible' data model ends up clobbering the optimizer and you'll be scanning millions and millions of rows to fetch your few dogs and kittens. Wikipedia has a topic covering this model and some of the downsides. Don't know what your target DB is, for instance SQL Server CAT published a white paper with common problems in the EAV model.

您正在构建另一个实体 - 属性 - 值数据模型。由于您询问了可扩展性,这里有一个警告:EAV模型通常不会扩展,也不会在RDBMS之上执行。最终,这种“灵活”的数据模型最终会破坏优化器,您将扫描数百万行来获取您的少数狗和小猫。*有一个涵盖这个模型的主题和一些缺点。不知道你的目标数据库是什么,例如SQL Server CAT发布了一份白皮书,其中包含EAV模型中的常见问题。

#7


AIR CODE

select 
  distinct p2c.product_id 
from '.TABLE_PRODUCT_TO_CATEGORY.' p2c
  inner join '.TABLE_PRODUCT.'     p  on p2c.product_id = p.id 
where 
  and p.status = 1
  and 2 = (
      SELECT  COUNT(1)
      FROM '.TABLE_PRODUCT_TAG.' pt
        INNER JOIN  '.TABLE_TAG_TYPE.' tt ON pt.tag_type_id = tt.id
      WHERE tt.id = '.PRODUCT_TAG_TYPE_FREE_TAG.'
      AND pt.product_id = p.id /* edit */
      lower(pt.value) IN( "cats", "kittens" )
)

#1


The problem with the initial query was this:

初始查询的问题是:

  and lower(pt.value) = "cats"
  and lower(pt.value) = "kittens"

There exists no tag for which the value is both "cats" and "kittens", therefore no records will be returned. Using an IN clause as SQLMenace suggests would be the solution - that way you're saying, "give me back any active item that has been tagged 'cats' or 'kittens'".

没有标记的值为“cats”和“kittens”,因此不会返回任何记录。使用IN子句作为SQLMenace建议将是解决方案 - 你说的方式,“给我回复任何被标记为'猫'或'小猫'的活动项目”。

But if you want any active item that has BOTH tags - then you need to do something like your second query. It's not perfectly clear from your question if that's what you're after.

但是如果你想要任何有BOTH标签的活动项目 - 那么你需要做类似你的第二个查询。从您的问题来看,这是不是很清楚,如果这就是你所追求的。

For something like your Question #3:

对于像你的问题#3这样的东西:

"cats" and ("kittens" or "dogs") not "parrots".

“猫”和(“小猫”或“狗”)不是“鹦鹉”。

you would want pt1, pt2, and (in a subquery) pt3, and something like this:

你会想要pt1,pt2和(在子查询中)pt3,以及类似的东西:

and lower(pt1.value) = "cats"
and lower(pt2.value) in ("kittens", "dogs")
and not exists (select * from '.TABLE_PRODUCT_TAG.' pt3 where pt3.product_id = p.id and lower(pt3.value) = "parrots")

The broadly general case could get quite messy...

广泛的一般情况可能会变得非常混乱......

#2


wouldn't this work in your first query?

这不会在你的第一个查询中工作吗?

instead of

and lower(pt.value) = "cats"
and lower(pt.value) = "kittens"

do this

and lower(pt.value) in ("cats","kittens")

#3


select 
  distinct p2c.product_id
from '.TABLE_PRODUCT_TO_CATEGORY.' p2c
  inner join '.TABLE_PRODUCT.' p on p2c.product_id = p.id
  inner join '.TABLE_PRODUCT_TAG.' pt on p.id = pt.product_id
  inner join '.TABLE_TAG_TYPE.' tt on pt.tag_type_id = tt.id   
where 
  tt.id = '.PRODUCT_TAG_TYPE_FREE_TAG.'
  and p.status = 1  
  and (lower(pt.value) = "cats" or lower(pt.value) = "kittens")

#4


Your answer is "yes, that's a scalable technique". As far as adding complexity, I think you'll overreach your users' ability to understand what they are doing before you have an efficient-query problem.

你的答案是“是的,这是一种可扩展的技术”。至于增加复杂性,我认为在你遇到高效查询问题之前,你会超出用户理解他们正在做什么的能力。

#5


Ok, let me re-state the question to make sure I understand:

好的,让我重新陈述这个问题,以确保我理解:

You are trying to show all products that have two different specific tags ("cats" and "kittens") but the tags are stored in a 1-to-many table.

您试图显示具有两个不同特定标签(“猫”和“小猫”)的所有产品,但标签存储在1对多表中。

The double-join does work, but here's another alternative:

双连接确实有效,但这是另一种选择:

SELECT ...
FROM P
WHERE p.status = 1
  AND p.ProductID IN (SELECT Product_ID FROM tags WHERE value = "cats")
  AND p.ProductID IN (SELECT Product_ID FROM tags WHERE value = "kittens")

Just add additional AND statements depending on the options the user selects.

只需添加其他AND语句,具体取决于用户选择的选项。

The SQL optimizer should actually treat this the same way it treats a join, so I don't think performance would scale any worse than your version. Worth testing with your dataset, though, to make sure.

SQL优化器实际上应该像处理连接一样处理它,所以我认为性能不会比你的版本更糟糕。但是,值得测试您的数据集,以确保。

#6


You're building yet another entity-attribute-value data model. Since you asked about scalability, here's a warning: EAV models usually don't scale and don't perform on top of RDBMS. Ultimately this 'flexible' data model ends up clobbering the optimizer and you'll be scanning millions and millions of rows to fetch your few dogs and kittens. Wikipedia has a topic covering this model and some of the downsides. Don't know what your target DB is, for instance SQL Server CAT published a white paper with common problems in the EAV model.

您正在构建另一个实体 - 属性 - 值数据模型。由于您询问了可扩展性,这里有一个警告:EAV模型通常不会扩展,也不会在RDBMS之上执行。最终,这种“灵活”的数据模型最终会破坏优化器,您将扫描数百万行来获取您的少数狗和小猫。*有一个涵盖这个模型的主题和一些缺点。不知道你的目标数据库是什么,例如SQL Server CAT发布了一份白皮书,其中包含EAV模型中的常见问题。

#7


AIR CODE

select 
  distinct p2c.product_id 
from '.TABLE_PRODUCT_TO_CATEGORY.' p2c
  inner join '.TABLE_PRODUCT.'     p  on p2c.product_id = p.id 
where 
  and p.status = 1
  and 2 = (
      SELECT  COUNT(1)
      FROM '.TABLE_PRODUCT_TAG.' pt
        INNER JOIN  '.TABLE_TAG_TYPE.' tt ON pt.tag_type_id = tt.id
      WHERE tt.id = '.PRODUCT_TAG_TYPE_FREE_TAG.'
      AND pt.product_id = p.id /* edit */
      lower(pt.value) IN( "cats", "kittens" )
)