这是否需要递归CTE,只需要创意窗口函数,循环?

时间:2022-12-09 22:58:54

I cannot for the life of me figure out how to get a weighted ranking for scores across X categories. For example, the student needs to answer 10 questions across 3 categories (both # of questions and # of categories will be variable eventually). To get a total score the top 1 score in each of the X (3) categories will be added to whatever is left to add up to 10 total question scores.

我不能为我的生活弄清楚如何获得X类别的分数的加权排名。例如,学生需要回答3个类别中的10个问题(问题#和类别#最终都是变量)。要获得总分,每个X(3)类别中的前1个分数将被添加到剩下的任何内容中以添加总共10个问题分数。

Here is the data. I used a CASE WHEN Row_Number() to get the TopInCat

这是数据。我使用了一个CASE WHEN Row_Number()来获取TopInCat

http://sqlfiddle.com/#!6/e6e9f/1

http://sqlfiddle.com/#!6/e6e9f/1

The fiddle has more students.

小提琴有更多的学生。

| Question | Student | Category | Score | TopInCat |
|----------|---------|----------|-------|----------|
|   120149 |     125 |        6 | 1     |        1 |
|   120127 |     125 |        6 | 0.9   |        0 |
|   120124 |     125 |        6 | 0.8   |        0 |
|   120125 |     125 |        6 | 0.7   |        0 |
|   120130 |     125 |        6 | 0.6   |        0 |
|   120166 |     125 |        6 | 0.5   |        0 |
|   120161 |     125 |        6 | 0.4   |        0 |
|   120138 |     125 |        4 | 0.15  |        1 |
|   120069 |     125 |        4 | 0.15  |        0 |
|   120022 |     125 |        4 | 0.15  |        0 |
|   120002 |     125 |        4 | 0.15  |        0 |
|   120068 |     125 |        2 | 0.01  |        1 |
|   120050 |     125 |        3 | 0.05  |        1 |
|   120139 |     125 |        2 | 0     |        0 |
|   120156 |     125 |        2 | 0     |        0 |

This is how I envision it needs to look, but it doesn't have to be exactly this. I just need to have 10 questions by 3 categories detail data in a way that would allow me to sum and average the Sort 1-10 column below. The 999's could be null or whatever as long as I can sum whats important and present the details.

这就是我想象它需要看的东西,但它不一定是这个。我只需要通过3个类别的详细数据来提出10个问题,这样我就可以对下面的1-10排列进行求和。 999可能是无效的,只要我可以总结什么重要并提供细节。

| Question | Student | Category | Score | TopInCat | Sort |
|----------|---------|----------|-------|----------|------|
|   120149 |     125 |        6 | 1     |        1 |    1 |
|   120138 |     125 |        4 | 0.15  |        1 |    2 |
|   120068 |     125 |        2 | 0.01  |        1 |    3 |
|   120127 |     125 |        6 | 0.9   |        0 |    4 |
|   120124 |     125 |        6 | 0.8   |        0 |    5 |
|   120125 |     125 |        6 | 0.7   |        0 |    6 |
|   120130 |     125 |        6 | 0.6   |        0 |    7 |
|   120166 |     125 |        6 | 0.5   |        0 |    8 |
|   120161 |     125 |        6 | 0.4   |        0 |    9 |
|   120069 |     125 |        4 | 0.15  |        0 |   10 |
|   120022 |     125 |        4 | 0.15  |        0 |  999 |
|   120002 |     125 |        4 | 0.15  |        0 |  999 |
|   120050 |     125 |        3 | 0.05  |        1 |  999 |
|   120139 |     125 |        2 | 0     |        0 |  999 |
|   120156 |     125 |        2 | 0     |        0 |  999 |

One last thing, the category no longer matters once the X (3) threshold is met. So a 4th category would just sort normally.

最后一件事,一旦满足X(3)阈值,该类别就不再重要。所以第四类只是正常排序。

| Question | Student | Category | Score | TopInCat | Sort |
|----------|---------|----------|-------|----------|------|
|   120149 |     126 |        6 | 1     |        1 |    1 |
|   120138 |     126 |        4 | 0.75  |        1 |    2 |
|   120068 |     126 |        2 | 0.50  |        1 |    3 |
|   120127 |     126 |        6 | 0.9   |        0 |    4 |
|   120124 |     126 |        6 | 0.8   |        0 |    5 |
|   120125 |     126 |        6 | 0.7   |        0 |    6 |
|   120130 |     126 |        6 | 0.6   |        0 |    7 |
|   120166 |     126 |        6 | 0.5   |        0 |    8 |
|   120050 |     126 |        3 | 0.45  |        1 |    9 |********
|   120161 |     126 |        6 | 0.4   |        0 |   10 |
|   120069 |     126 |        4 | 0.15  |        0 |  999 |
|   120022 |     126 |        4 | 0.15  |        0 |  999 |
|   120002 |     126 |        4 | 0.15  |        0 |  999 |
|   120139 |     126 |        2 | 0     |        0 |  999 |
|   120156 |     126 |        2 | 0     |        0 |  999 |

I really appreciate any help. Been banging my head on this for a few days.

我非常感谢任何帮助。几天来我一直在敲打着我。

1 个解决方案

#1


2  

With such matters I like to proceed with a 'building blocks' approach. Following the maxim of first make it work, then if you need to make it fast, this first step is often enough.

有了这些事情,我喜欢继续采用“积木”的方法。遵循首先使其工作的格言,然后如果你需要快速,这第一步通常就足够了。

So, given

所以,给定

CREATE TABLE WeightedScores
    ([Question] int, [Student] int, [Category] int, [Score] dec(3,2))
;

and your sample data

和你的样本数据

INSERT INTO WeightedScores
    ([Question], [Student], [Category], [Score])
VALUES
    (120161, 123, 6, 1),    (120166, 123, 6, 0.64),    (120138, 123, 4, 0.57),    (120069, 123, 4, 0.5),
    (120068, 123, 2, 0.33),    (120022, 123, 4, 0.18),    (120061, 123, 6, 0),    (120002, 123, 4, 0),
    (120124, 123, 6, 0),    (120125, 123, 6, 0),    (120137, 123, 6, 0),    (120154, 123, 6, 0),
    (120155, 123, 6, 0),   (120156, 123, 6, 0),    (120139, 124, 2, 1),    (120156, 124, 2, 1),
    (120050, 124, 3, 0.88),    (120068, 124, 2, 0.87),    (120161, 124, 6, 0.87),    (120138, 124, 4, 0.85),
    (120069, 124, 4, 0.51),    (120166, 124, 6, 0.5),    (120022, 124, 4, 0.43),    (120002, 124, 4, 0),
    (120130, 124, 6, 0),    (120125, 124, 6, 0),    (120124, 124, 6, 0),    (120127, 124, 6, 0),
    (120149, 124, 6, 0),    (120149, 125, 6, 1),    (120127, 125, 6, 0.9),    (120124, 125, 6, 0.8),
    (120125, 125, 6, 0.7),    (120130, 125, 6, 0.6),    (120166, 125, 6, 0.5),    (120161, 125, 6, 0.4),
    (120138, 125, 4, 0.15),    (120069, 125, 4, 0.15),    (120022, 125, 4, 0.15),    (120002, 125, 4, 0.15),
    (120068, 125, 2, 0.01),    (120050, 125, 3, 0.05),    (120139, 125, 2, 0),    (120156, 125, 2, 0),
    (120149, 126, 6, 1),    (120138, 126, 4, 0.75),    (120068, 126, 2, 0.50),    (120127, 126, 6, 0.9),
    (120124, 126, 6, 0.8),    (120125, 126, 6, 0.7),    (120130, 126, 6, 0.6),    (120166, 126, 6, 0.5),
    (120050, 126, 3, 0.45),    (120161, 126, 6, 0.4),    (120069, 126, 4, 0.15),    (120022, 126, 4, 0.15),
    (120002, 126, 4, 0.15),    (120139, 126, 2, 0),    (120156, 126, 2, 0)
;

let's proceed.

我们继续吧


The complicated part here is identifying the top three top-in-category questions; the others of the ten questions of interest per student are simply sorted by score, which is easy. So let's start with identifying the top three top-in-category questions.

这里复杂的部分是确定前三个*问题;每个学生感兴趣的十个问题中的其他问题只是按分数排序,这很容易。因此,让我们首先确定前三个*问题。

First, assign to each row a row number giving the ordering of that score within the category, for the student:

首先,为每个行分配一个行号,为该学生分配该类别中该分数的顺序:

;WITH Numbered1 ( Question, Student, Category, Score, SeqInStudentCategory ) AS
(
    SELECT Question, Student, Category, Score
        , ROW_NUMBER() OVER (PARTITION BY Student, Category ORDER BY Score DESC) SeqInStudentCategory 
    FROM WeightedScores
)

Now we are only interested in rows where SeqInStudentCategory is 1. Considering only such rows, let's order them by score within student, and number those rows:

现在我们只对SeqInStudentCategory为1的行感兴趣。只考虑这些行,让我们按学生的分数对它们进行排序,然后对这些行进行编号:

-- within the preceding WITH
, Numbered2 ( Question, Student, Category, Score, SeqInStudent ) AS
(
    SELECT 
        Question, Student, Category, Score
        , ROW_NUMBER() OVER (PARTITION BY Student ORDER BY Score DESC) SeqInStudent
    FROM
        Numbered1
    WHERE
        SeqInStudentCategory = 1
)

Now we are only interested in rows where SeqInStudent is at most 3. Let's pull them out, so that we know to include it (and exclude it from the simple sort by score, that we will use to make up the remaining seven rows):

现在我们只对SeqInStudent最多的行感兴趣。让我们把它们拉出来,这样我们就知道要包含它(并将它从简单的排序中排除,我们将用它来构成剩下的七行):

-- within the preceding WITH
, TopInCat ( Question, Student, Category, Score, SeqInStudent ) AS
(
     SELECT Question, Student, Category, Score, SeqInStudent FROM Numbered2 WHERE SeqInStudent <= 3
)

Now we have the three top-in-category questions for each student. We now need to identify and order by score the not top-in-category questions for each student:

现在我们为每个学生提供三个*类别的问题。我们现在需要通过分数识别并按顺序排列每个学生的不是*问题:

-- within the preceding WITH
, NotTopInCat ( Question, Student, Category, Score, SeqInStudent ) AS
(
    SELECT
        Question, Student, Category, Score
        , ROW_NUMBER() OVER (PARTITION BY Student ORDER BY Score DESC) SeqInStudent
    FROM
        WeightedScores WS
    WHERE
        NOT EXISTS ( SELECT 1 FROM TopInCat T WHERE T.Question = WS.Question AND T.Student = WS.Student )
)

Finally we combine TopInCat with NotTopInCat, applying an appropriate offset and restriction to NotTopInCat.SeqInStudent - we need to add 3 to the raw value, and take the top 7 (which is 10 - 3):

最后,我们将TopInCat与NotTopInCat结合起来,对NotTopInCat.SeqInStudent应用适当的偏移和限制 - 我们需要将3添加到原始值,并取前7(即10 - 3):

-- within the preceding WITH
, Combined ( Question, Student, Category, Score, CombinedSeq ) AS
(
    SELECT
        Question, Student, Category, Score, SeqInStudent AS CombinedSeq
    FROM
        TopInCat
    UNION
    SELECT
        Question, Student, Category, Score, SeqInStudent + 3 AS CombinedSeq
    FROM
        NotTopInCat
    WHERE
        SeqInStudent <= 10 - 3
)

To get our final results:

要获得我们的最终结果:

SELECT * FROM Combined ORDER BY Student, CombinedSeq
;

You can see the results on sqlfiddle.

你可以在sqlfiddle上看到结果。


Note that here I have assumed that every student will always have answers from at least three categories. Also, the final output doesn't have a TopInCat column, but hopefully you will see how to regain that if you want it.

请注意,在这里我假设每个学生总是会得到至少三个类别的答案。此外,最终输出没有TopInCat列,但希望你会看到如果你想要它如何重新获得它。

Also, "(both # of questions and # of categories will be variable eventually)" should be relatively straightforward to deal with here. But watch out for my assumption that (in this case) 3 categories will definitely be present in the answers of each student.

此外,“(问题的#和类别的#最终将是变量的)”应该在这里处理相对简单。但请注意我的假设(在这种情况下)3个类别肯定会出现在每个学生的答案中。

#1


2  

With such matters I like to proceed with a 'building blocks' approach. Following the maxim of first make it work, then if you need to make it fast, this first step is often enough.

有了这些事情,我喜欢继续采用“积木”的方法。遵循首先使其工作的格言,然后如果你需要快速,这第一步通常就足够了。

So, given

所以,给定

CREATE TABLE WeightedScores
    ([Question] int, [Student] int, [Category] int, [Score] dec(3,2))
;

and your sample data

和你的样本数据

INSERT INTO WeightedScores
    ([Question], [Student], [Category], [Score])
VALUES
    (120161, 123, 6, 1),    (120166, 123, 6, 0.64),    (120138, 123, 4, 0.57),    (120069, 123, 4, 0.5),
    (120068, 123, 2, 0.33),    (120022, 123, 4, 0.18),    (120061, 123, 6, 0),    (120002, 123, 4, 0),
    (120124, 123, 6, 0),    (120125, 123, 6, 0),    (120137, 123, 6, 0),    (120154, 123, 6, 0),
    (120155, 123, 6, 0),   (120156, 123, 6, 0),    (120139, 124, 2, 1),    (120156, 124, 2, 1),
    (120050, 124, 3, 0.88),    (120068, 124, 2, 0.87),    (120161, 124, 6, 0.87),    (120138, 124, 4, 0.85),
    (120069, 124, 4, 0.51),    (120166, 124, 6, 0.5),    (120022, 124, 4, 0.43),    (120002, 124, 4, 0),
    (120130, 124, 6, 0),    (120125, 124, 6, 0),    (120124, 124, 6, 0),    (120127, 124, 6, 0),
    (120149, 124, 6, 0),    (120149, 125, 6, 1),    (120127, 125, 6, 0.9),    (120124, 125, 6, 0.8),
    (120125, 125, 6, 0.7),    (120130, 125, 6, 0.6),    (120166, 125, 6, 0.5),    (120161, 125, 6, 0.4),
    (120138, 125, 4, 0.15),    (120069, 125, 4, 0.15),    (120022, 125, 4, 0.15),    (120002, 125, 4, 0.15),
    (120068, 125, 2, 0.01),    (120050, 125, 3, 0.05),    (120139, 125, 2, 0),    (120156, 125, 2, 0),
    (120149, 126, 6, 1),    (120138, 126, 4, 0.75),    (120068, 126, 2, 0.50),    (120127, 126, 6, 0.9),
    (120124, 126, 6, 0.8),    (120125, 126, 6, 0.7),    (120130, 126, 6, 0.6),    (120166, 126, 6, 0.5),
    (120050, 126, 3, 0.45),    (120161, 126, 6, 0.4),    (120069, 126, 4, 0.15),    (120022, 126, 4, 0.15),
    (120002, 126, 4, 0.15),    (120139, 126, 2, 0),    (120156, 126, 2, 0)
;

let's proceed.

我们继续吧


The complicated part here is identifying the top three top-in-category questions; the others of the ten questions of interest per student are simply sorted by score, which is easy. So let's start with identifying the top three top-in-category questions.

这里复杂的部分是确定前三个*问题;每个学生感兴趣的十个问题中的其他问题只是按分数排序,这很容易。因此,让我们首先确定前三个*问题。

First, assign to each row a row number giving the ordering of that score within the category, for the student:

首先,为每个行分配一个行号,为该学生分配该类别中该分数的顺序:

;WITH Numbered1 ( Question, Student, Category, Score, SeqInStudentCategory ) AS
(
    SELECT Question, Student, Category, Score
        , ROW_NUMBER() OVER (PARTITION BY Student, Category ORDER BY Score DESC) SeqInStudentCategory 
    FROM WeightedScores
)

Now we are only interested in rows where SeqInStudentCategory is 1. Considering only such rows, let's order them by score within student, and number those rows:

现在我们只对SeqInStudentCategory为1的行感兴趣。只考虑这些行,让我们按学生的分数对它们进行排序,然后对这些行进行编号:

-- within the preceding WITH
, Numbered2 ( Question, Student, Category, Score, SeqInStudent ) AS
(
    SELECT 
        Question, Student, Category, Score
        , ROW_NUMBER() OVER (PARTITION BY Student ORDER BY Score DESC) SeqInStudent
    FROM
        Numbered1
    WHERE
        SeqInStudentCategory = 1
)

Now we are only interested in rows where SeqInStudent is at most 3. Let's pull them out, so that we know to include it (and exclude it from the simple sort by score, that we will use to make up the remaining seven rows):

现在我们只对SeqInStudent最多的行感兴趣。让我们把它们拉出来,这样我们就知道要包含它(并将它从简单的排序中排除,我们将用它来构成剩下的七行):

-- within the preceding WITH
, TopInCat ( Question, Student, Category, Score, SeqInStudent ) AS
(
     SELECT Question, Student, Category, Score, SeqInStudent FROM Numbered2 WHERE SeqInStudent <= 3
)

Now we have the three top-in-category questions for each student. We now need to identify and order by score the not top-in-category questions for each student:

现在我们为每个学生提供三个*类别的问题。我们现在需要通过分数识别并按顺序排列每个学生的不是*问题:

-- within the preceding WITH
, NotTopInCat ( Question, Student, Category, Score, SeqInStudent ) AS
(
    SELECT
        Question, Student, Category, Score
        , ROW_NUMBER() OVER (PARTITION BY Student ORDER BY Score DESC) SeqInStudent
    FROM
        WeightedScores WS
    WHERE
        NOT EXISTS ( SELECT 1 FROM TopInCat T WHERE T.Question = WS.Question AND T.Student = WS.Student )
)

Finally we combine TopInCat with NotTopInCat, applying an appropriate offset and restriction to NotTopInCat.SeqInStudent - we need to add 3 to the raw value, and take the top 7 (which is 10 - 3):

最后,我们将TopInCat与NotTopInCat结合起来,对NotTopInCat.SeqInStudent应用适当的偏移和限制 - 我们需要将3添加到原始值,并取前7(即10 - 3):

-- within the preceding WITH
, Combined ( Question, Student, Category, Score, CombinedSeq ) AS
(
    SELECT
        Question, Student, Category, Score, SeqInStudent AS CombinedSeq
    FROM
        TopInCat
    UNION
    SELECT
        Question, Student, Category, Score, SeqInStudent + 3 AS CombinedSeq
    FROM
        NotTopInCat
    WHERE
        SeqInStudent <= 10 - 3
)

To get our final results:

要获得我们的最终结果:

SELECT * FROM Combined ORDER BY Student, CombinedSeq
;

You can see the results on sqlfiddle.

你可以在sqlfiddle上看到结果。


Note that here I have assumed that every student will always have answers from at least three categories. Also, the final output doesn't have a TopInCat column, but hopefully you will see how to regain that if you want it.

请注意,在这里我假设每个学生总是会得到至少三个类别的答案。此外,最终输出没有TopInCat列,但希望你会看到如果你想要它如何重新获得它。

Also, "(both # of questions and # of categories will be variable eventually)" should be relatively straightforward to deal with here. But watch out for my assumption that (in this case) 3 categories will definitely be present in the answers of each student.

此外,“(问题的#和类别的#最终将是变量的)”应该在这里处理相对简单。但请注意我的假设(在这种情况下)3个类别肯定会出现在每个学生的答案中。