清楚解释关系代数中的“theta join”?

时间:2022-10-03 23:50:10

I'm looking for a clear, basic explanation of the concept of theta join in relational algebra and perhaps an example (using SQL perhaps) to illustrate its usage.

我正在寻找关于关系代数中theta连接概念的清晰,基本的解释,也许是一个例子(或许使用SQL)来说明它的用法。

If I understand it correctly, the theta join is a natural join with a condition added in. So, whereas the natural join enforces equality between attributes of the same name (and removes the duplicate?), the theta join does the same thing but adds in a condition. Do I have this right? Any clear explanation, in simple terms (for a non-mathmetician) would be greatly appreciated.

如果我理解正确,那么theta join是一个添加了条件的自然连接。那么,自然连接强制同名属性之间的相等(并删除副本?),theta连接做同样的事情,但添加在一个条件。我有这个权利吗?任何明确的解释,简单来说(对于非数学家)将不胜感激。

Also (sorry to just throw this in at the end, but its sort of related), could someone explain the importance or idea of cartesian product? I think I'm missing something with regard to the basic concept, because to me it just seems like a restating of a basic fact, i.e that a set of 13 X a set of 4 = 52...

另外(对不起,最后只是把它扔进去,但它有点相关),有人可以解释笛卡尔积的重要性或想法吗?我认为我遗漏了一些关于基本概念的东西,因为对我来说它似乎只是一个基本事实的重述,即一组13 X一组4 = 52 ......

3 个解决方案

#1


12  

Leaving SQL aside for a moment...

抛开SQL片刻......

A relational operator takes one or more relations as parameters and results in a relation. Because a relation has no attributes with duplicate names by definition, relational operations theta join and natural join will both "remove the duplicate attributes." [A big problem with posting examples in SQL to explain relation operations, as you requested, is that the result of a SQL query is not a relation because, among other sins, it can have duplicate rows and/or columns.]

关系运算符将一个或多个关系作为参数并产生关系。由于关系没有按定义重复名称的属性,因此关系操作theta join和自然连接都将“删除重复的属性”。 [在SQL中发布用于解释关系操作的示例的一个大问题是,SQL请求的结果不是关系,因为在其他错误中,它可能具有重复的行和/或列。

The relational Cartesian product operation (results in a relation) differs from set Cartesian product (results in a set of pairs). The word 'Cartesian' isn't particularly helpful here. In fact, Codd called his primitive operator 'product'.

关系笛卡尔乘积运算(得到关系)与集笛卡尔乘积不同(得到一组对)。 “笛卡尔”这个词在这里并不是特别有用。事实上,Codd称他的原始运营商为“产品”。

The truly relational language Tutorial D lacks a product operator and product is not a primitive operator in the relational algebra proposed by co-author of Tutorial D, Hugh Darwen**. This is because the natural join of two relations with no attribute names in common results in the same relation as the product of the same two relations i.e. natural join is more general and therefore more useful.

真正的关系语言教程D缺乏产品操作员,产品不是教程D的合着者Hugh Darwen **提出的关系代数中的原始算子。这是因为两个关系的自然连接没有共同的属性名称导致与相同两个关系的乘积相同的关系,即自然连接更通用,因此更有用。

Consider these examples (Tutorial D):

考虑这些例子(教程D):

WITH RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } , TUPLE { Y 3 } } AS R1 ,
     RELATION { TUPLE { X 1 } , TUPLE { X 2 } } AS R2 :
R1 JOIN R2

returns the product of the relations i.e. degree of two (i.e. two attributes, X and Y) and cardinality of 6 (2 x 3 = 6 tuples).

返回关系的乘积,即2的次数(即两个属性,X和Y)和6的基数(2×3 = 6元组)。

However,

然而,

WITH RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } , TUPLE { Y 3 } } AS R1 ,
     RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } } AS R2 :
R1 JOIN R2

returns the natural join of the relations i.e. degree of one (i.e. the set union of the attributes yielding one attribute Y) and cardinality of 2 (i.e. duplicate tuples removed).

返回关系的自然连接,即一个程度(即产生一个属性Y的属性的集合并)和基数为2(即删除了重复元组)。

I hope the above examples explain why your statement "that a set of 13 X a set of 4 = 52" is not strictly correct.

我希望上面的例子解释为什么你的陈述“一组13 X一组4 = 52”并不严格正确。

Similarly, Tutorial D does not include a theta join operator. This is essentially because other operators (e.g. natural join and restriction) make it both unnecessary and not terribly useful. In contrast, Codd's primitive operators included product and restriction which can be used to perform a theta join.

类似地,教程D不包括theta join运算符。这主要是因为其他操作符(例如自然连接和限制)使它既不必要又不是非常有用。相比之下,Codd的原始运算符包括可用于执行theta连接的产品和限制。


SQL has an explicit product operator named CROSS JOIN which forces the result to be the product even if it entails violating 1NF by creating duplicate columns (attributes). Consider the SQL equivalent to the latter Tutoral D exmaple above:

SQL有一个名为CROSS JOIN的显式产品运算符,它强制结果成为产品,即使它需要通过创建重复列(属性)来违反1NF。考虑上面的后者Tutoral D exmaple的SQL等价物:

WITH R1 AS (SELECT * FROM (VALUES (1), (2), (3)) AS T (Y)), 
     R2 AS (SELECT * FROM (VALUES (1), (2)) AS T (Y))
SELECT * 
  FROM R1 CROSS JOIN R2;

This returns a table expression with two columns (rather than one attribute) both called Y (!!) and 6 rows i.e. this

这将返回一个表表达式,其中包含两列(而不是一个属性),它们都称为Y(!!)和6行,即此行

SELECT c1 AS Y, c2 AS Y 
  FROM (VALUES (1, 1), 
               (2, 1), 
               (3, 1), 
               (1, 2), 
               (2, 2), 
               (3, 2)
       ) AS T (c1, c2);

** That is, although there is only one relational model (i.e. Codd's), there can be more than one relational algebra (i.e. Codd's is but one).

**也就是说,虽然只有一个关系模型(即Codd),但可能存在多个关系代数(即Codd只有一个)。

#2


3  

You're not quite right - a theta join is a join which may include a condition other than = - in SQL, typically < or >= etc. See TechNet

你不是很正确 - 一个theta join是一个连接,它可能包括除SQL以外的条件 - 在SQL中,通常是 <或> =等。请参阅TechNet

As for cartesian product (or CROSS JOIN), it is an operation rather than an idea or concept. It's important because sometimes you need to use it! It is a basic fact that set of 13 x set of 4 = 52, and cartesian product is based on this fact.

至于笛卡尔积(或CROSS JOIN),它是一种操作而不是一种想法或概念。这很重要,因为有时你需要使用它!这是一个基本的事实,13 x的集合4 = 52,笛卡尔积基于这个事实。

#3


2  

In my opinion, to make it simple, if you understand equijoin, you suppose should understand theta join. If you change the symbol = (equal) in equijoin to >=, then you already done theta join. However, I think it is quite difficult to see the practicality of using theta join as compared to equijoin since the join cause that we normally use is V.primarykey = C.foreignkey. And if you want to change to theta join, then it may depends on the value, as such you are doing selection.

在我看来,为了简单起见,如果你了解equijoin,你应该理解theta join。如果将equijoin中的symbol =(equal)更改为> =,那么您已经完成了theta join。但是,我认为与equijoin相比,很难看到使用theta join的实用性,因为我们通常使用的连接原因是V.primarykey = C.foreignkey。如果你想改变为theta join,那么它可能取决于值,因此你正在做选择。

For natural Join, it is just similar to equijoin, the difference is just that get rid of the redundant attributes. easy!:)

对于自然Join,它与equijoin类似,不同之处在于摆脱冗余属性。简单!:)

Hope this explanation helps.

希望这个解释有所帮助

#1


12  

Leaving SQL aside for a moment...

抛开SQL片刻......

A relational operator takes one or more relations as parameters and results in a relation. Because a relation has no attributes with duplicate names by definition, relational operations theta join and natural join will both "remove the duplicate attributes." [A big problem with posting examples in SQL to explain relation operations, as you requested, is that the result of a SQL query is not a relation because, among other sins, it can have duplicate rows and/or columns.]

关系运算符将一个或多个关系作为参数并产生关系。由于关系没有按定义重复名称的属性,因此关系操作theta join和自然连接都将“删除重复的属性”。 [在SQL中发布用于解释关系操作的示例的一个大问题是,SQL请求的结果不是关系,因为在其他错误中,它可能具有重复的行和/或列。

The relational Cartesian product operation (results in a relation) differs from set Cartesian product (results in a set of pairs). The word 'Cartesian' isn't particularly helpful here. In fact, Codd called his primitive operator 'product'.

关系笛卡尔乘积运算(得到关系)与集笛卡尔乘积不同(得到一组对)。 “笛卡尔”这个词在这里并不是特别有用。事实上,Codd称他的原始运营商为“产品”。

The truly relational language Tutorial D lacks a product operator and product is not a primitive operator in the relational algebra proposed by co-author of Tutorial D, Hugh Darwen**. This is because the natural join of two relations with no attribute names in common results in the same relation as the product of the same two relations i.e. natural join is more general and therefore more useful.

真正的关系语言教程D缺乏产品操作员,产品不是教程D的合着者Hugh Darwen **提出的关系代数中的原始算子。这是因为两个关系的自然连接没有共同的属性名称导致与相同两个关系的乘积相同的关系,即自然连接更通用,因此更有用。

Consider these examples (Tutorial D):

考虑这些例子(教程D):

WITH RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } , TUPLE { Y 3 } } AS R1 ,
     RELATION { TUPLE { X 1 } , TUPLE { X 2 } } AS R2 :
R1 JOIN R2

returns the product of the relations i.e. degree of two (i.e. two attributes, X and Y) and cardinality of 6 (2 x 3 = 6 tuples).

返回关系的乘积,即2的次数(即两个属性,X和Y)和6的基数(2×3 = 6元组)。

However,

然而,

WITH RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } , TUPLE { Y 3 } } AS R1 ,
     RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } } AS R2 :
R1 JOIN R2

returns the natural join of the relations i.e. degree of one (i.e. the set union of the attributes yielding one attribute Y) and cardinality of 2 (i.e. duplicate tuples removed).

返回关系的自然连接,即一个程度(即产生一个属性Y的属性的集合并)和基数为2(即删除了重复元组)。

I hope the above examples explain why your statement "that a set of 13 X a set of 4 = 52" is not strictly correct.

我希望上面的例子解释为什么你的陈述“一组13 X一组4 = 52”并不严格正确。

Similarly, Tutorial D does not include a theta join operator. This is essentially because other operators (e.g. natural join and restriction) make it both unnecessary and not terribly useful. In contrast, Codd's primitive operators included product and restriction which can be used to perform a theta join.

类似地,教程D不包括theta join运算符。这主要是因为其他操作符(例如自然连接和限制)使它既不必要又不是非常有用。相比之下,Codd的原始运算符包括可用于执行theta连接的产品和限制。


SQL has an explicit product operator named CROSS JOIN which forces the result to be the product even if it entails violating 1NF by creating duplicate columns (attributes). Consider the SQL equivalent to the latter Tutoral D exmaple above:

SQL有一个名为CROSS JOIN的显式产品运算符,它强制结果成为产品,即使它需要通过创建重复列(属性)来违反1NF。考虑上面的后者Tutoral D exmaple的SQL等价物:

WITH R1 AS (SELECT * FROM (VALUES (1), (2), (3)) AS T (Y)), 
     R2 AS (SELECT * FROM (VALUES (1), (2)) AS T (Y))
SELECT * 
  FROM R1 CROSS JOIN R2;

This returns a table expression with two columns (rather than one attribute) both called Y (!!) and 6 rows i.e. this

这将返回一个表表达式,其中包含两列(而不是一个属性),它们都称为Y(!!)和6行,即此行

SELECT c1 AS Y, c2 AS Y 
  FROM (VALUES (1, 1), 
               (2, 1), 
               (3, 1), 
               (1, 2), 
               (2, 2), 
               (3, 2)
       ) AS T (c1, c2);

** That is, although there is only one relational model (i.e. Codd's), there can be more than one relational algebra (i.e. Codd's is but one).

**也就是说,虽然只有一个关系模型(即Codd),但可能存在多个关系代数(即Codd只有一个)。

#2


3  

You're not quite right - a theta join is a join which may include a condition other than = - in SQL, typically < or >= etc. See TechNet

你不是很正确 - 一个theta join是一个连接,它可能包括除SQL以外的条件 - 在SQL中,通常是 <或> =等。请参阅TechNet

As for cartesian product (or CROSS JOIN), it is an operation rather than an idea or concept. It's important because sometimes you need to use it! It is a basic fact that set of 13 x set of 4 = 52, and cartesian product is based on this fact.

至于笛卡尔积(或CROSS JOIN),它是一种操作而不是一种想法或概念。这很重要,因为有时你需要使用它!这是一个基本的事实,13 x的集合4 = 52,笛卡尔积基于这个事实。

#3


2  

In my opinion, to make it simple, if you understand equijoin, you suppose should understand theta join. If you change the symbol = (equal) in equijoin to >=, then you already done theta join. However, I think it is quite difficult to see the practicality of using theta join as compared to equijoin since the join cause that we normally use is V.primarykey = C.foreignkey. And if you want to change to theta join, then it may depends on the value, as such you are doing selection.

在我看来,为了简单起见,如果你了解equijoin,你应该理解theta join。如果将equijoin中的symbol =(equal)更改为> =,那么您已经完成了theta join。但是,我认为与equijoin相比,很难看到使用theta join的实用性,因为我们通常使用的连接原因是V.primarykey = C.foreignkey。如果你想改变为theta join,那么它可能取决于值,因此你正在做选择。

For natural Join, it is just similar to equijoin, the difference is just that get rid of the redundant attributes. easy!:)

对于自然Join,它与equijoin类似,不同之处在于摆脱冗余属性。简单!:)

Hope this explanation helps.

希望这个解释有所帮助