Assume I have the following tables:
假设我有以下表格:
tableA
a_name | age | country
Jordan | 5 | Germany
Jordan | 6 | Spain
Molly | 6 | Spain
Paris | 7 | France
John | 7 | Saudi Arabia
John | 5 | Saudi Arabia
John | 6 | Spain
tableB
id (auto increment primary key)
| age | country | group_num (initially null)
1 | 5 | Germany |
2 | 6 | Spain |
3 | 7 | France |
4 | 7 | Spain |
5 | 8 | France |
6 | 9 | France |
7 | 2 | Mexico |
8 | 7 | Saudi Arabia |
9 | 5 | Saudi Arabia |
I want to be able to do some kind of select/update where I am able to get the following values for the "group_num" column:
我希望能够进行某种选择/更新,我可以为“group_num”列获取以下值:
tableB
id (auto increment primary key)
| age | country | group_num
1 | 5 | Germany | 1
2 | 6 | Spain | 1
3 | 7 | France | 1
4 | 7 | Spain |
5 | 7 | France | 2
6 | 9 | France |
7 | 2 | Mexico |
8 | 7 | Saudi Arabia | 1
9 | 5 | Saudi Arabia | 1
group_num is assigned based on the criteria of:
group_num根据以下标准分配:
1) Places person "a_name" went.
2) Whether other people visited that same country. (regardless of age).
The reason why id's 1,2,3,8,9 all have the same groupId is because Jordan, Molly, and Paris all happen to be somehow linked because of the above two criteria. (they all went to spain) and other countries, i.e. Germany was visited by Jordan who also visited spain, so it has the same group_num. Saudi Arabia was visited by John, who also visited spain, so it has the same group_num.
id的1,2,3,8,9都具有相同的groupId的原因是因为上述两个标准,Jordan,Molly和Paris都碰巧以某种方式联系在一起。 (他们都去了西班牙)和其他国家,即德国也访问了西班牙的乔丹,所以它有相同的group_num。约翰也访问了沙特阿拉伯,他也访问了西班牙,因此它拥有相同的group_num。
is there some SQL query or queries (may or may not involve creation of other "complementary" tables to get to the desired result shown above? (i.e. it is okay if group_num should first to be filled with auto_incrementing values like the "id", then updated later if it is necessary. (it is okay to have non-null values for the other value fields currently shown as "(empty)"
是否有一些SQL查询或查询(可能或可能不涉及创建其他“补充”表以获得上面显示的所需结果?(即,如果group_num应该首先填充auto_incrementing值,如“id”,如果有必要,则稍后更新。(可以为当前显示为“(空)”的其他值字段设置非空值
Cursors/iteration is very slow... The following are the steps I would perform to fill out those values, very slow process using cursors, if I can get rid of this it would be great:
游标/迭代非常慢......以下是我将执行以填充这些值的步骤,使用游标的过程非常缓慢,如果我可以摆脱它,它会很棒:
- For tableA, we see that Jordan visited Germany at age 5. (Group_Num in tableB for [5,Germany] updated to 1).
- Jordan visits Spain at age 6. (Group Num for [6,Spain] updated to 1 to show its the same grouping as the same guy Jordan visited Spain)
- Molly visits Spain at age 6 (group_num for [6,Spain] updated to 1 since even though its a different person, the same age/country pair was hit)
- Paris visited France at age 7 (group_num in tableB updated to 2 since she is a different person, visited a completely different country, regardless of age.
- John visits Saudi Arabia at age 7 (group_num for [7,Saudi Arabia] in tableB updated to 3 for age+country pair)
- John visits Saudi Arabia at Age 5 (group_num for [5,Saudi Arabia] in tableB updated to 3 for age+country pair since its still John)
- John visits Spain at age 6 (group_num for [6,Spain] is already 1.. Jordan visited there before, there may be some grouping... so group_num for all the places John visited [6, Spain], [5, Saudi Arabia], and [7,Saudi Arabia] are all updated to 1
对于tableA,我们看到Jordan在5岁时访问了德国。(表5中的Group_Num为[5,德国]更新为1)。
约旦在6岁时访问了西班牙队。(西班牙队6号队的队员更新为1队,显示与乔丹访问西班牙队的队员相同)
Molly在6岁时访问西班牙(group_num为[6,西班牙]]更新为1,因为即使它是一个不同的人,同一年龄/国家对被击中
巴黎7岁时访问了法国(由于她是一个不同的人,因此表B中的group_num更新为2,访问了一个完全不同的国家,不论年龄大小。
约翰7岁时访问沙特阿拉伯(table_num为[7,沙特阿拉伯],表B更新为3岁,年龄+国家对)
John在5岁时访问了沙特阿拉伯(group_num为[5,沙特阿拉伯],在表B中更新为3岁,因为它的年龄+国家对仍然是约翰)
John在6岁时访问西班牙(group_num为[6,西班牙]已经是1 .. Jordan之前访问过那里,可能会有一些分组......所以John_num访问的所有地方的group_num [6,西班牙],[5,沙特阿拉伯]阿拉伯]和[7,沙特阿拉伯]全部更新为1
2 个解决方案
#1
1
You will need an iterative approach which will be based on each new item added to Table1, if you execute the following statements for each such item it will be fast and efficient:
您将需要一种迭代方法,该方法将基于添加到Table1的每个新项目,如果您为每个此类项目执行以下语句,它将快速有效:
Here is SQLFiddle for state of the db just before inserting the last record in Table 1.
在插入表1中的最后一条记录之前,这是SQLFiddle for db的状态。
BTW: Your example is not entirely consistent with your description , i assume you signed France 7 as group 1 by mistake, since Paris has no relation to no one in group 1.
顺便说一句:你的例子与你的描述并不完全一致,我认为你错误地将法国7作为第1组签约,因为巴黎与第1组中没有人没关系。
Notice the selects that i'm executing:
注意我正在执行的选择:
- The first one searched for the group num of my previous places i have visited (this is my disjoint group , e.g. group num 3).
- The second is searches if there is a disjoint group that the inserted record may be related to, by searching group num for spain and age 6.
第一个搜索我以前去过的地方的组数(这是我的不相交组,例如组#3)。
第二种是通过搜索组号为西班牙和年龄6来搜索插入的记录可能与之相关的脱节组。
After finding out that you have two disjoint sets that becomes joined as a result of newly inserted record , you may that UPDATE all the group num previously assigned as the second group number to the first one, in such way:
在发现您有两个由于新插入的记录而加入的不相交集之后,您可以将先前分配为第二个组号的所有组num更新为第一个,这样:
UPDATE Table2 set group_num = 1 where group_num = 3
UPDATE Table2 set group_num = 1其中group_num = 3
So i have not used any cursors , but this update is per insert for Table 1.
所以我没有使用任何游标,但是此更新是针对表1的每个插入。
#2
0
@ Damascusi you can see if tiggers can work instead of cursors. Triggers are faster than cursors if only you could update group_num on the fly as and when the data is inserted into Table A.
@ Damascusi你可以看到跳跳器是否可以工作而不是游标。如果只有在数据插入表A时才能动态更新group_num,则触发器比游标更快。
#1
1
You will need an iterative approach which will be based on each new item added to Table1, if you execute the following statements for each such item it will be fast and efficient:
您将需要一种迭代方法,该方法将基于添加到Table1的每个新项目,如果您为每个此类项目执行以下语句,它将快速有效:
Here is SQLFiddle for state of the db just before inserting the last record in Table 1.
在插入表1中的最后一条记录之前,这是SQLFiddle for db的状态。
BTW: Your example is not entirely consistent with your description , i assume you signed France 7 as group 1 by mistake, since Paris has no relation to no one in group 1.
顺便说一句:你的例子与你的描述并不完全一致,我认为你错误地将法国7作为第1组签约,因为巴黎与第1组中没有人没关系。
Notice the selects that i'm executing:
注意我正在执行的选择:
- The first one searched for the group num of my previous places i have visited (this is my disjoint group , e.g. group num 3).
- The second is searches if there is a disjoint group that the inserted record may be related to, by searching group num for spain and age 6.
第一个搜索我以前去过的地方的组数(这是我的不相交组,例如组#3)。
第二种是通过搜索组号为西班牙和年龄6来搜索插入的记录可能与之相关的脱节组。
After finding out that you have two disjoint sets that becomes joined as a result of newly inserted record , you may that UPDATE all the group num previously assigned as the second group number to the first one, in such way:
在发现您有两个由于新插入的记录而加入的不相交集之后,您可以将先前分配为第二个组号的所有组num更新为第一个,这样:
UPDATE Table2 set group_num = 1 where group_num = 3
UPDATE Table2 set group_num = 1其中group_num = 3
So i have not used any cursors , but this update is per insert for Table 1.
所以我没有使用任何游标,但是此更新是针对表1的每个插入。
#2
0
@ Damascusi you can see if tiggers can work instead of cursors. Triggers are faster than cursors if only you could update group_num on the fly as and when the data is inserted into Table A.
@ Damascusi你可以看到跳跳器是否可以工作而不是游标。如果只有在数据插入表A时才能动态更新group_num,则触发器比游标更快。