So in my database, I have two tables which have a many to one relationship. I am trying to update the 'parent' table by looking at all the rows on the 'child' table (sorry if I'm not using the correct terminology here) and applying different sets of rules to the data to determine the values to update with. But I want to do this efficiently (which is to say, quickly).
在我的数据库中,我有两个表它们之间有很多对一的关系。我试图通过查看“child”表上的所有行来更新“父表”(如果我没有使用正确的术语),并对数据应用不同的规则集,以确定要更新的值。但我想要高效地做到这一点(也就是说,快速地)。
So, assume the following tables.
因此,假设下面的表。
PARENT(
ID NUMBER,
NAME VARCHAR(20),
NUMBER_OF_CHILDREN NUMBER,
AVERAGE_CHILD_AGE NUMBER,
OLDEST_CHILD_AGE NUMBER,
YOUNGEST_CHILD_AGE NUMBER,
MODE_EYE_COLOR VARCHAR(20),
EVERY_CHILD_MADE_A VARCHAR(1),
BLOODTYPES_THAT_CAN_BE_ACCEPTED VARCHAR(100),
SOMETHING_COMPLEX COMPLEX_OBJECT_1
)
CHILD(
ID NUMBER,
PARENT_ID NUMBER,
AGE NUMBER,
EYE_COLOR VARCHAR(20),
MADE_AN_A VARCHAR(1),
BLOODTYPE VARCHAR(5),
COMPLEXITY COMPLEX_OBJECT_2
)
I've used simplified examples, the actual rules that need to be applied are a decent bit more complicated that min/max/average. Now, these are the two ways I'm thinking this can be done. The first is to just have the procedure pass the parent ID on to functions (I use separate functions so later going back and maintaining this code is easier) and each one selects the children and then processes them. The second way is to open a cursor that selects the children and then pass the cursor into each function.
我用了简化的例子,实际需要应用的规则比最小/最大值/平均值要复杂一些。这是我认为可以做到的两种方式。第一种方法是让过程将父ID传递给函数(我使用单独的函数,以便稍后返回并维护这段代码),每个过程选择子函数,然后处理它们。第二种方法是打开一个游标,该游标选择子对象,然后将游标传递给每个函数。
PROCEDURE UPDATE_PARENT_1 (PARENT_ID IN NUMBER)
BEGIN
UPDATE PARENT
SET
NUMBER_OF_CHILDREN = CHILD_COUNT_FUNCTION(PARENT_ID),
AVERAGE_CHILD_AGE = CHILD_AGE_AVERAGE_FUNCTION(PARENT_ID),
OLDER_CHILD_AGE = PICK_OLDEST_AGE_FUNCTION(PARENT_ID),
YOUNGEST_CHILD_AGE = PICK_YOUNGEST_AGE_FUNCTION(PARENT_ID),
MODE_EYE_COLOR = MOST_OFTEN_EYE_COLOR_FUNCTION(PARENT_ID),
BLOODTYPES_THAT_CAN_BE_ACCEPTED = DETERMINE_BLOOD_DONOR_TYPES(PARENT_ID),
SOMETHING_COMPLEX = COMPLEX_FUNCTION(PARENT_ID)
WHERE
ID = PARENT_ID;
END;
PROCEDURE UPDATE_PARENT_2 (PARENT_ID IN NUMBER)
CURSOR C IS SELECT * FROM CHILD WHERE CHILD.PARENT_ID = PARENT_ID
BEGIN
OPEN C;
UPDATE PARENT
SET
NUMBER_OF_CHILDREN = CHILD_COUNT_FUNCTION(C),
AVERAGE_CHILD_AGE = CHILD_AGE_AVERAGE_FUNCTION(C),
OLDER_CHILD_AGE = PICK_OLDEST_AGE_FUNCTION(C),
YOUNGEST_CHILD_AGE = PICK_YOUNGEST_AGE_FUNCTION(C),
MODE_EYE_COLOR = MOST_OFTEN_EYE_COLOR_FUNCTION(C)
BLOODTYPES_THAT_CAN_BE_ACCEPTED = DETERMINE_BLOOD_DONOR_TYPES(C),
SOMETHING_COMPLEX = COMPLEX_FUNCTION(C)
WHERE
ID = PARENT_ID;
CLOSE C;
END;
With either way, I feel like things I'm doing extra work. The first way feels the worse, because it appears I'm doing far too many select statements (1 for each rule I have to apply, and there are many). The second way I only need to go back to the front of the cursor instead of doing another select, but it still feels as if there should be a more efficient way. At the same time, oracle has great behind the scenes optimization, so either way may be being optimized to the best way to do it behind the scenes.
不管怎样,我觉得我在做额外的工作。第一种方式感觉更糟,因为似乎我做了太多的select语句(对于我必须应用的每个规则,都有一个,还有很多)。第二种方法,我只需要返回到光标的前面,而不需要执行另一个选择,但仍然感觉应该有一个更有效的方法。与此同时,oracle在后台有很好的优化,因此两种方法都可能被优化为在后台进行优化的最佳方式。
So my question is what is the quickest way to do this sort of update, or can I not worry about optimizing it and oracle will take care of it for me?
所以我的问题是做这种更新最快的方法是什么,或者我可以不担心优化它,而oracle会为我处理它吗?
EDIT: Made the example a bit more complex.
编辑:使这个示例稍微复杂一点。
3 个解决方案
#1
1
You can use various STATS_* functions in addition to the more standard MIN(), MAX() etc. If these still aren't enough you can create user defined aggregate functions. (sample SQL taken from another answer)
除了更标准的MIN()、MAX()等之外,还可以使用各种STATS_*函数。(来自另一个答案的SQL示例)
UPDATE Parent
SET (Number_Of_Children, Average_Child_Age, Oldest_Child_Age,
Youngest_Child_Age, MODE_EYE_COLOR, BLOODTYPES_THAT_CAN_BE_ACCEPTED,
SOMETHING_COMPLEX ) =
(
SELECT COUNT(*), AVG(Age), MAX(Age), MIN(Age), STATS_MODE(EYE_COLOR),
ListBloodTypes(BLOODTYPE), ComplexCombine(SOMETHING_COMPLEX)
FROM Child
WHERE Parent.ID = Child.Parent_ID
)
Your user defined aggregate functions ListBloodTypes and ComplexCombine would then need to be defined with: Using User-Defined Aggregate Functions as a guide.
然后,您的用户定义的聚合函数ListBloodTypes和ComplexCombine需要使用:使用用户定义的聚合函数作为指南。
#2
3
You can do everything but the mode of the eye color like this:
你可以做任何事情,除了眼睛颜色的模式如下:
UPDATE Parent
SET (Number_Of_Children, Average_Child_Age, Oldest_Child_Age, Youngest_Child_Age) = (
SELECT COUNT(*), AVG(Age), MAX(Age), MIN(Age)
FROM Child
WHERE Parent.ID = Child.Parent_ID
)
I couldn't think of a way to fit the mode in there. It's a tough calculation in general in SQL, and I don't think it lends itself to storage in a column because of these scenarios:
我想不出一个方法来适应那里的模式。在SQL中,这是一个很难的计算,我认为它不适合存储在列中,因为这些场景:
- Three children, each with a different eye color: that's either no mode or three modes (one for each eye color) depending on who you ask - and some will answer "both".
- 三个孩子,每个孩子都有不同的眼睛颜色:要么没有模式,要么有三种模式(每种眼睛颜色各有一种),这取决于你问谁——有些孩子会回答“两者都有”。
- Three children, two with green eyes: OK, green is the mode here, no problem.
- 三个孩子,两个绿眼睛:好的,绿色是这里的模式,没问题。
- Four children, two with brown eyes and two with blue eyes: brown and blue are both modes.
- 四个孩子,两个棕色眼睛,两个蓝色眼睛:棕色和蓝色眼睛都是模式。
I hope this helps; it could be that your efforts to simplify the question, while excellent, sent me on a wrong path :) Let me know.
我希望这可以帮助;可能是你简化问题的努力,虽然很出色,却让我走上了一条错误的道路:)让我知道。
#3
2
First, I am shamelessly borrowing from Ed Gibb's answer. My only addition is to show how to get the mode.
首先,我无耻地借用了艾德·吉布的回答。我唯一添加的是如何获得模式。
To do this, I am using analytic functions instead of aggregation. Most of the new columns are the same, just with an over (partition by parent_id)
clause. The innermost subquery also includes the count of the number of children with a given eye color. The next level of subquery orders by that value, and the outermost chooses one of the rows -- which will have the mode.
为此,我使用的是解析函数而不是聚合。大多数新列都是相同的,只有一个over (partition by parent_id)子句。最内部的子查询还包括给定眼睛颜色的子查询数。下一级别的子查询命令按此值进行,而最外层则选择其中一个行——它将具有该模式。
UPDATE Parent
SET (Number_Of_Children, Average_Child_Age, Oldest_Child_Age, Youngest_Child_Age
Mode_Eye_Color) =
(select cnt, avg_age, min_age, max_age, eyecolor
from (select cnt, avg_age, min_age, max_age, eyecolor
ROW_NUMBER() over (order by cnt_ec desc) as seqnum
from (select COUNT(*) over (partition by Parent_id) as cnt,
AVG(Age) over (partition by Parent_id) as avg_age,
MIN(Age) over (partition by Parent_id) as min_age,
MAX(Age) over (partition by Parent_id) as max_age,
COUNT(*) over (partition by Parent_id, eyecolor) as cnt_ec,
eyecolor
from Child
where Parent.ID = Child.Parent_ID
) t
) t
where seqnum = 1
)
#1
1
You can use various STATS_* functions in addition to the more standard MIN(), MAX() etc. If these still aren't enough you can create user defined aggregate functions. (sample SQL taken from another answer)
除了更标准的MIN()、MAX()等之外,还可以使用各种STATS_*函数。(来自另一个答案的SQL示例)
UPDATE Parent
SET (Number_Of_Children, Average_Child_Age, Oldest_Child_Age,
Youngest_Child_Age, MODE_EYE_COLOR, BLOODTYPES_THAT_CAN_BE_ACCEPTED,
SOMETHING_COMPLEX ) =
(
SELECT COUNT(*), AVG(Age), MAX(Age), MIN(Age), STATS_MODE(EYE_COLOR),
ListBloodTypes(BLOODTYPE), ComplexCombine(SOMETHING_COMPLEX)
FROM Child
WHERE Parent.ID = Child.Parent_ID
)
Your user defined aggregate functions ListBloodTypes and ComplexCombine would then need to be defined with: Using User-Defined Aggregate Functions as a guide.
然后,您的用户定义的聚合函数ListBloodTypes和ComplexCombine需要使用:使用用户定义的聚合函数作为指南。
#2
3
You can do everything but the mode of the eye color like this:
你可以做任何事情,除了眼睛颜色的模式如下:
UPDATE Parent
SET (Number_Of_Children, Average_Child_Age, Oldest_Child_Age, Youngest_Child_Age) = (
SELECT COUNT(*), AVG(Age), MAX(Age), MIN(Age)
FROM Child
WHERE Parent.ID = Child.Parent_ID
)
I couldn't think of a way to fit the mode in there. It's a tough calculation in general in SQL, and I don't think it lends itself to storage in a column because of these scenarios:
我想不出一个方法来适应那里的模式。在SQL中,这是一个很难的计算,我认为它不适合存储在列中,因为这些场景:
- Three children, each with a different eye color: that's either no mode or three modes (one for each eye color) depending on who you ask - and some will answer "both".
- 三个孩子,每个孩子都有不同的眼睛颜色:要么没有模式,要么有三种模式(每种眼睛颜色各有一种),这取决于你问谁——有些孩子会回答“两者都有”。
- Three children, two with green eyes: OK, green is the mode here, no problem.
- 三个孩子,两个绿眼睛:好的,绿色是这里的模式,没问题。
- Four children, two with brown eyes and two with blue eyes: brown and blue are both modes.
- 四个孩子,两个棕色眼睛,两个蓝色眼睛:棕色和蓝色眼睛都是模式。
I hope this helps; it could be that your efforts to simplify the question, while excellent, sent me on a wrong path :) Let me know.
我希望这可以帮助;可能是你简化问题的努力,虽然很出色,却让我走上了一条错误的道路:)让我知道。
#3
2
First, I am shamelessly borrowing from Ed Gibb's answer. My only addition is to show how to get the mode.
首先,我无耻地借用了艾德·吉布的回答。我唯一添加的是如何获得模式。
To do this, I am using analytic functions instead of aggregation. Most of the new columns are the same, just with an over (partition by parent_id)
clause. The innermost subquery also includes the count of the number of children with a given eye color. The next level of subquery orders by that value, and the outermost chooses one of the rows -- which will have the mode.
为此,我使用的是解析函数而不是聚合。大多数新列都是相同的,只有一个over (partition by parent_id)子句。最内部的子查询还包括给定眼睛颜色的子查询数。下一级别的子查询命令按此值进行,而最外层则选择其中一个行——它将具有该模式。
UPDATE Parent
SET (Number_Of_Children, Average_Child_Age, Oldest_Child_Age, Youngest_Child_Age
Mode_Eye_Color) =
(select cnt, avg_age, min_age, max_age, eyecolor
from (select cnt, avg_age, min_age, max_age, eyecolor
ROW_NUMBER() over (order by cnt_ec desc) as seqnum
from (select COUNT(*) over (partition by Parent_id) as cnt,
AVG(Age) over (partition by Parent_id) as avg_age,
MIN(Age) over (partition by Parent_id) as min_age,
MAX(Age) over (partition by Parent_id) as max_age,
COUNT(*) over (partition by Parent_id, eyecolor) as cnt_ec,
eyecolor
from Child
where Parent.ID = Child.Parent_ID
) t
) t
where seqnum = 1
)