I have a MySQL database with the following tables and fields:
我有一个MySQL数据库,包含以下表格和字段:
- Student (id)
- Class (id)
- Grade (id, student_id, class_id, grade)
成绩(id,student_id,class_id,年级)
The student and class tables are indexed on id (primary keys). The grade table is indexed on id (primary key) and student_id, class_id and grade.
学生和班级表在id(主键)上编制索引。成绩表是在id(主键)和student_id,class_id和grade上编制索引的。
I need to construct a query which, given a class ID, gives a list of all other classes and the number of students who scored more in that other class.
我需要构建一个查询,给定一个类ID,给出所有其他类的列表以及在其他类中获得更多分数的学生数。
Essentially, given the following data in the grades table:
基本上,鉴于成绩表中的以下数据:
student_id | class_id | grade
--------------------------------------
1 | 1 | 87
1 | 2 | 91
1 | 3 | 75
2 | 1 | 68
2 | 2 | 95
2 | 3 | 84
3 | 1 | 76
3 | 2 | 88
3 | 3 | 71
Querying with class ID 1 should yield:
使用类ID 1查询应该产生:
class_id | total
-------------------
2 | 3
3 | 1
Ideally I'd like this to execute in a few seconds, as I'd like it to be part of a web interface.
理想情况下,我希望在几秒钟内执行此操作,因为我希望它能够成为Web界面的一部分。
The issue I have is that in my database, I have over 1300 classes and 160,000 students. My grade table has almost 15 million rows and as such, the query takes a long time to execute.
我的问题是,在我的数据库中,我有超过1300个班级和160,000名学生。我的成绩表有近1500万行,因此查询需要很长时间才能执行。
Here's what I've tried so far along with the times each query took:
这是我到目前为止所尝试的以及每个查询所花费的时间:
-- I manually stopped execution after 2 hours
SELECT c.id, COUNT(*) AS total
FROM classes c
INNER JOIN grades a ON a.class_id = c.id
INNER JOIN grades b ON b.grade < a.grade AND
a.student_id = b.student_id AND
b.class_id = 1
WHERE c.id != 1 AND
GROUP BY c.id
-- I manually stopped execution after 20 minutes
SELECT c.id,
(
SELECT COUNT(*)
FROM grades g
WHERE g.class_id = c.id AND g.grade > (
SELECT grade
FROM grades
WHERE student_id = g.student_id AND
class_id = 1
)
) AS total
FROM classes c
WHERE c.id != 1;
-- 1 min 12 sec
CREATE TEMPORARY TABLE temp_blah (student_id INT(11) PRIMARY KEY, grade INT);
INSERT INTO temp_blah SELECT student_id, grade FROM grades WHERE class_id = 1;
SELECT o.id,
(
SELECT COUNT(*)
FROM grades g
INNER JOIN temp_blah t ON g.student_id = t.student_id
WHERE g.class_id = c.id AND t.grade < g.grade
) AS total
FROM classes c
WHERE c.id != 1;
-- Same thing but with joins instead of a subquery - 1 min 54 sec
SELECT c.id,
COUNT(*) AS total
FROM classes c
INNER JOIN grades g ON c.id = p.class_id
INNER JOIN temp_blah t ON g.student_id = t.student_id
WHERE c.id != 1
GROUP BY c.id;
I also considered creating a 2D table, with students as rows and classes as columns, however I can see two issues with this:
我还考虑创建一个2D表,学生作为行和类作为列,但我可以看到两个问题:
- MySQL implements a maximum column count (4096) and maximum row size (in bytes) which may be exceeded by this approach
- I can't think of a good way to query that structure to get the results I need
MySQL实现了这种方法可能超过的最大列数(4096)和最大行大小(以字节为单位)
我想不出一个查询该结构以获得我需要的结果的好方法
I also considered performing these calculations as background jobs and storing the results somewhere, but for the information to remain current (it must), they would need to be recalculated every time a student, class or grade record was created or updated.
我还考虑将这些计算作为后台作业并将结果存储在某处,但为了使信息保持最新(必须),每次创建或更新学生,班级或成绩记录时都需要重新计算。
Does anyone know a more efficient way to construct this query?
有谁知道构建此查询的更有效方法?
EDIT: Create table statements:
编辑:创建表语句:
CREATE TABLE `classes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1331 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci$$
CREATE TABLE `students` (
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=160803 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci$$
CREATE TABLE `grades` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`student_id` int(11) DEFAULT NULL,
`class_id` int(11) DEFAULT NULL,
`grade` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_grades_on_student_id` (`student_id`),
KEY `index_grades_on_class_id` (`class_id`),
KEY `index_grades_on_grade` (`grade`)
) ENGINE=InnoDB AUTO_INCREMENT=15507698 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci$$
Output of explain on the most efficient query (the 1 min 12 sec one):
关于最有效查询的解释输出(1分12秒一):
id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | PRIMARY | c | range | PRIMARY | PRIMARY | 4 | | 683 | Using where; Using index
2 | DEPENDENT SUBQUERY | g | ref | index_grades_on_student_id,index_grades_on_class_id,index_grades_on_grade | index_grades_on_class_id | 5 | mydb.c.id | 830393 | Using where
2 | DEPENDENT SUBQUERY | t | eq_ref | PRIMARY | PRIMARY | 4 | mydb.g.student_id | 1 | Using where
Another edit - explain output for sgeddes suggestion:
另一个编辑 - 解释sgeddes建议的输出:
+----+-------------+------------+--------+---------------+------+---------+------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+------+---------+------+----------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 14953992 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | <derived3> | system | NULL | NULL | NULL | NULL | 1 | Using filesort |
| 2 | DERIVED | G | ALL | NULL | NULL | NULL | NULL | 15115388 | |
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+------------+--------+---------------+------+---------+------+----------+----------------------------------------------+
2 个解决方案
#1
3
I think this should work for you using SUM
and CASE
:
我认为这应该适用于你使用SUM和CASE:
SELECT C.Id,
SUM(
CASE
WHEN G.Grade > C2.Grade THEN 1 ELSE 0
END
)
FROM Class C
INNER JOIN Grade G ON C.Id = G.Class_Id
LEFT JOIN (
SELECT Grade, Student_Id, Class_Id
FROM Class
JOIN Grade ON Class.Id = Grade.Class_Id
WHERE Class.Id = 1
) C2 ON G.Student_Id = C2.Student_Id
WHERE C.Id <> 1
GROUP BY C.Id
示例小提琴演示
--EDIT--
In response to your comment, here is another attempt that should be much faster:
在回应您的评论时,这是另一个应该更快的尝试:
SELECT
Class_Id,
SUM(CASE WHEN Grade > minGrade THEN 1 ELSE 0 END)
FROM
(
SELECT
Student_Id,
@classToCheck:=
IF(G.Class_Id = 1, Grade, @classToCheck) minGrade ,
Class_Id,
Grade
FROM Grade G
JOIN (SELECT @classToCheck:= 0) t
ORDER BY Student_Id, IF(Class_Id = 1, 0, 1)
) t
WHERE Class_Id <> 1
GROUP BY Class_ID
And more sample fiddle.
还有更多样品小提琴。
#2
0
Can you give this a try on the original data as well! It is only one join :)
你能不能试试原始数据吧!这只是一个加入:)
select
final.class_id, count(*) as total
from
(
select * from
(select student_id as p_student_id, grade as p_grade from table1 where class_id = 1) as partial
inner join table1 on table1.student_id = partial.p_student_id
where table1.class_id <> 1 and table1.grade > partial.p_grade
) as final
group by
final.class_id;
#1
3
I think this should work for you using SUM
and CASE
:
我认为这应该适用于你使用SUM和CASE:
SELECT C.Id,
SUM(
CASE
WHEN G.Grade > C2.Grade THEN 1 ELSE 0
END
)
FROM Class C
INNER JOIN Grade G ON C.Id = G.Class_Id
LEFT JOIN (
SELECT Grade, Student_Id, Class_Id
FROM Class
JOIN Grade ON Class.Id = Grade.Class_Id
WHERE Class.Id = 1
) C2 ON G.Student_Id = C2.Student_Id
WHERE C.Id <> 1
GROUP BY C.Id
示例小提琴演示
--EDIT--
In response to your comment, here is another attempt that should be much faster:
在回应您的评论时,这是另一个应该更快的尝试:
SELECT
Class_Id,
SUM(CASE WHEN Grade > minGrade THEN 1 ELSE 0 END)
FROM
(
SELECT
Student_Id,
@classToCheck:=
IF(G.Class_Id = 1, Grade, @classToCheck) minGrade ,
Class_Id,
Grade
FROM Grade G
JOIN (SELECT @classToCheck:= 0) t
ORDER BY Student_Id, IF(Class_Id = 1, 0, 1)
) t
WHERE Class_Id <> 1
GROUP BY Class_ID
And more sample fiddle.
还有更多样品小提琴。
#2
0
Can you give this a try on the original data as well! It is only one join :)
你能不能试试原始数据吧!这只是一个加入:)
select
final.class_id, count(*) as total
from
(
select * from
(select student_id as p_student_id, grade as p_grade from table1 where class_id = 1) as partial
inner join table1 on table1.student_id = partial.p_student_id
where table1.class_id <> 1 and table1.grade > partial.p_grade
) as final
group by
final.class_id;