The problem:
I want to move the links of the categories from the table companies_1 into the company_categories table. The company_id in the company_categories table need to be equal to the id of the companies_2 table. The records of the companies_1 and the companies_2 table are linked by the "name"-column.
我想将类别的链接从表companyes_1移动到company_categories表。company_categories中的company_id必须等于companyes_2表的id。公司的记录和公司的表由“name”列连接。
- The current code below took me over a night, still unfinished! I want to learn to be more efficient and speed this progress up. I feel like there is very much to optimize because there are A LOT of company records.
- 下面的代码花了我一个晚上,还没有完成!我想要学会更有效率,并加快这一进程。我觉得有很多需要优化的地方,因为有很多公司记录。
- Another issue was that i found no way how to check where my query was while looping (resulting in no way to check the progress). Because the progress took so long i killed the query and I'm searching for a better way to solve this issue.
- 另一个问题是,我发现无法在循环期间检查查询在哪里(导致无法检查进度)。因为这个过程花费了很长时间,所以我终止了查询,我正在寻找更好的方法来解决这个问题。
The information:
There is a table with companies like:
有一张桌子上有这样的公司:
----------------------------------------
| companies_1 |
----------------------------------------
| id | category_id | name |
----------------------------------------
| 1 | 1 | example-1 |
| 2 | 2 | example-1 |
| 3 | 1 | example-2 |
| 4 | 2 | example-2 |
| 5 | 3 | example-2 |
| 6 | 1 | example-3 |
----------------------------------------
A table with the DISTINCT company names:
有明显公司名称的表格:
-------------------------
| companies_2 |
-------------------------
| id | name |
-------------------------
| 1 | example-1 |
| 2 | example-2 |
| 3 | example-3 |
-------------------------
A categories table, like:
一个类别表,如:
-------------------------
| categories |
-------------------------
| id | name |
-------------------------
And a junction table, like:
和一个连接表,比如:
---------------------------------
| company_categories |
---------------------------------
| company_id | category_id |
---------------------------------
The current code:
This code works, but is far from efficient.
这段代码有效,但远远没有达到效率。
DELIMITER $$
DROP PROCEDURE IF EXISTS fill_junc_table$$
CREATE PROCEDURE fill_junc_table()
BEGIN
DECLARE r INT;
DECLARE i INT;
DECLARE i2 INT;
DECLARE loop_length INT;
DECLARE company_old_len INT;
DECLARE _href VARCHAR(255);
DECLARE cat_id INT;
DECLARE comp_id INT;
SET r = 0;
SET i = 0;
SET company_old_len = 0;
SELECT COUNT(*) INTO loop_length FROM companies;
WHILE i < loop_length DO
SELECT href INTO _href FROM company_old LIMIT i,1;
SELECT id INTO comp_id FROM companies WHERE site_href=_href;
SELECT COUNT(*) INTO company_old_len FROM company_old WHERE href=_href;
SET i2 = 0;
WHILE i2 < company_old_len DO
SELECT category_id INTO cat_id FROM company_old WHERE href=_href LIMIT i2,1;
INSERT INTO company_categories (company_id, category_id) VALUES (comp_id, cat_id);
SET r = r + 1;
SET i2 = i2 + 1;
END WHILE;
SET i = i + 1;
END WHILE;
SELECT r;
END$$
DELIMITER ;
CALL fill_junc_table();
Edit (new idea):
I am going to test another way to solve this problem by fully copying the companies_1 table with the following columns (company_id empty on copy):
我将测试另一种方法来解决这个问题,方法是完全复制companyes_1表,列如下(company_id为空):
---------------------------------------------
| company_id | category_id | name |
---------------------------------------------
Then, I will loop through the companies_2 table to fill the correct company_id related to the name-column.
然后,我将循环通过companyes_2表来填充与名称列相关的正确company_id。
I hope you can give your thoughts about this. When I finish my test I will leave the result over here for others.
我希望你能谈谈你对此的看法。当我完成测试时,我将把结果留给其他人。
2 个解决方案
#1
2
To clarify, I don't see any PIVOT
transformation in the company_categories
. What I see is you want a JUNCTION TABLE because it seems that companies
and categories
tables have many-to-many
relationship.
澄清一下,我在company_categories中没有看到任何PIVOT转换。我看到的是你想要一个连接表因为看起来公司和类别表有多对多关系。
In your case, you have company
which has multiple categories
. And you also have categories
assigned to multiple companies
.
在你的例子中,你的公司有多个类别。你也有分配给多个公司的类别。
Now base from your requirement:
现在根据你的要求:
I want to move the links of the categories from the table companies_1 into the company_categories table. The company_id in the company_categories table need to be equal to the id of the companies_2 table. The records of the companies_1 and the companies_2 table are linked by the "name"-column.
我想将类别的链接从表companyes_1移动到company_categories表。company_categories中的company_id必须等于companyes_2表的id。公司的记录和公司的表由“name”列连接。
I arrived with this query:
我提出这个问题:
INSERT INTO company_categories (company_id, category_id)
SELECT C2.id
, C1.category_id
FROM companies_1 C1
INNER JOIN companies_2 C2 ON C2.name = C1.name
Let me know if this works. The nested loops that you created will really take a while.
如果可以的话,请告诉我。您创建的嵌套循环确实需要一段时间。
As @DanielE pointed out, this query will work in the assumption that company_categories
is empty. We will need to use UPDATE
otherwise.
正如@DanielE所指出的,该查询将在假定company_categories为空的情况下工作。否则我们将需要使用UPDATE。
#2
2
Why not just update companies_1?
为什么不更新companyes_1呢?
ALTER TABLE companies_1 ADD (company_id INT)
UPDATE companies_1 SET company_id = (SELECT id FROM companies_2 WHERE name=companies_1.name)
ALTER TABLE companies_1 DROP name, RENAME TO company_categories
SELECT * FROM `company_categories`
Output
输出
id category_id company_id
1 1 1
2 2 1
3 1 2
4 2 2
5 3 2
6 1 3
#1
2
To clarify, I don't see any PIVOT
transformation in the company_categories
. What I see is you want a JUNCTION TABLE because it seems that companies
and categories
tables have many-to-many
relationship.
澄清一下,我在company_categories中没有看到任何PIVOT转换。我看到的是你想要一个连接表因为看起来公司和类别表有多对多关系。
In your case, you have company
which has multiple categories
. And you also have categories
assigned to multiple companies
.
在你的例子中,你的公司有多个类别。你也有分配给多个公司的类别。
Now base from your requirement:
现在根据你的要求:
I want to move the links of the categories from the table companies_1 into the company_categories table. The company_id in the company_categories table need to be equal to the id of the companies_2 table. The records of the companies_1 and the companies_2 table are linked by the "name"-column.
我想将类别的链接从表companyes_1移动到company_categories表。company_categories中的company_id必须等于companyes_2表的id。公司的记录和公司的表由“name”列连接。
I arrived with this query:
我提出这个问题:
INSERT INTO company_categories (company_id, category_id)
SELECT C2.id
, C1.category_id
FROM companies_1 C1
INNER JOIN companies_2 C2 ON C2.name = C1.name
Let me know if this works. The nested loops that you created will really take a while.
如果可以的话,请告诉我。您创建的嵌套循环确实需要一段时间。
As @DanielE pointed out, this query will work in the assumption that company_categories
is empty. We will need to use UPDATE
otherwise.
正如@DanielE所指出的,该查询将在假定company_categories为空的情况下工作。否则我们将需要使用UPDATE。
#2
2
Why not just update companies_1?
为什么不更新companyes_1呢?
ALTER TABLE companies_1 ADD (company_id INT)
UPDATE companies_1 SET company_id = (SELECT id FROM companies_2 WHERE name=companies_1.name)
ALTER TABLE companies_1 DROP name, RENAME TO company_categories
SELECT * FROM `company_categories`
Output
输出
id category_id company_id
1 1 1
2 2 1
3 1 2
4 2 2
5 3 2
6 1 3