新列中的SQL模式长度

时间:2021-11-22 23:07:39

I have a (very very large) table of similar format to the following:

我有一个(非常非常大)的表格,格式类似于以下内容:

+--------+-------+
| id     | value |
+--------+-------+
|      1 | 5     |
|      2 | 6     |
|      3 | 6     |
|      4 | 4     |
|      5 | 3     |
|      6 | 2     |
|      7 | 4     |
|      8 | 5     |
+--------+-------+

What I'd like to be able to do is return the pattern length of the value column increasing or decreasing in a third column (with pattern being negative for decreasing and positive for increasing), while ignoring IDs where there is no change. The pattern should reset to 1 or -1 when the pattern is broken.

我希望能够做的是返回值列的模式长度在第三列中增加或减少(模式为负数减少而增加为正),而忽略没有变化的ID。当模式被破坏时,模式应重置为1或-1。

I've not explained that well at all, so with the table above, ideally the result would be:

我根本没有解释得那么好,所以对于上面的表格,理想情况下结果是:

+--------+-------+---------+
| id     | value | pattern |
+--------+-------+---------+
|      1 | 5     | 0/NULL  |
|      2 | 6     | 1       |
|      3 | 6     | 1       |
|      4 | 4     | -1      |
|      5 | 3     | -2      |
|      6 | 2     | -3      |
|      7 | 4     | 1       |
|      8 | 5     | 2       |
+--------+-------+---------+

I did some research and came across pattern matching, but it turns out either the version of SQL I'm using (it's the version used by/on Amazon Redshift , which according to them is 'based on' PostgreSQL 8.0.2 http://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html)) doesn't support it, or I'm being very silly.

我做了一些研究并遇到了模式匹配,但事实证明我正在使用的SQL版本(它是在Amazon Redshift上使用的版本,根据它们是'基于'PostgreSQL 8.0.2 http:/ /docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html))不支持它,或者我很傻。

So, is this something that is even possible with SQL, and if so how should I go about it? Many thanks.

那么,这对于SQL来说甚至是可能的,如果是这样的话,我应该怎么做呢?非常感谢。

2 个解决方案

#1


1  

In SQL Server 2012, you can do this with lead() and lag() and cumulative sum.

在SQL Server 2012中,您可以使用lead()和lag()以及累积总和来执行此操作。

Something that comes quite close is this:

一些非常接近的是:

select t.*, sum(nextinc) over (order by id) as pattern
from (select t.*,
             (case when lead(t.value) > t.value then 1
                   when lead(t.value) = t.value then 0
                   else -1 end) as nextinc,
             (case when lag(t.value) > t.value then 1 else 0 end) as previnc                 
      from table t
     ) t;

However, the pattern goes up and down in increments of 1 instead of starting over. So, we need to find the pattern breaks. The following defines the breaks in the pattern and then increments pattern for for sequences of increasing/decreasing values:

但是,模式以1为增量上下移动而不是重新开始。所以,我们需要找到模式中断。以下定义模式中的中断,然后为递增/递减值序列递增模式:

select t.*,
       sum(nextinc) over (partition by grp order by id) as pattern
from (select t.*,
             sum(case when (prev_value <= value and value <= next_value) or
                           (prev_value >= value and value >= next_value)
                      then 0 else 1
                 end) over (order by id) as grp
      from (select t.*, lead(t.value) over (order by id) as next_value,
                   lag(t.value) over (order by id) as prev_value,
                   (case when lead(t.value) over (order by id) > t.value then 1
                         when lead(t.value) over (order by id) = t.value then 0
                         else -1 end) as nextinc  
            from table t
           ) t
      ) t

#2


0  

For the given example, the following seems to do the job:

对于给定的示例,以下似乎可以完成这项工作:

SELECT
  S3.id
  , S3.value
  , S3.pattern
  , SUM(minusNullPlus) OVER (PARTITION BY sequenceID ORDER BY id) calculated
FROM
  (SELECT
    S2.*
    , SUM(newSequence) OVER (ORDER BY id) sequenceID
  FROM
    (SELECT
      S1.*
      , CASE
          WHEN minusNullPlus = LAG(minusNullPlus, 1, NULL) OVER (ORDER BY id)
               OR
               minusNullPlus = 0
               OR
               (minusNullPlus = 1
                AND
                value - LAG(value, 1, NULL) OVER (ORDER BY id) = 1
               )
               OR
               (minusNullPlus = -1
                AND
                value - LAG(value, 1, NULL) OVER (ORDER BY id) = -1
               )
            THEN 0
          ELSE 1
        END newSequence
    FROM
      (SELECT
        id
        , value
        , CASE
            WHEN value > LAG(value, 1, NULL) OVER (ORDER BY id) THEN 1
            WHEN value < LAG(value, 1, NULL) OVER (ORDER BY id) THEN -1
            WHEN value = LAG(value, 1, NULL) OVER (ORDER BY id) THEN 0
            ELSE 0
          END minusNullPlus
        , CASE
            WHEN value - LAG(value, 1, NULL) OVER (ORDER BY id) = 0 THEN 0
            ELSE 1
          END change
      , pattern
      FROM SomeTable
      ) S1
    ) S2
  ) S3
ORDER BY id
;

See it in action: SQL Fiddle
It uses some additional data to check against - please verify the respective patterns to be actually in line with your expectations/requirements.

看看它的实际应用:SQL Fiddle它使用了一些额外的数据来检查 - 请确认各自的模式实际上符合您的期望/要求。

NB: The suggested solution relies on some of the particularities of the provided sample data (and its expansion in above SQL Fiddle).

注意:建议的解决方案依赖于提供的示例数据的一些特性(以及它在上面的SQL Fiddle中的扩展)。

Please comment, if and as adjustment / further detail is required.

如果需要调整/进一步详细说明,请发表评论。

#1


1  

In SQL Server 2012, you can do this with lead() and lag() and cumulative sum.

在SQL Server 2012中,您可以使用lead()和lag()以及累积总和来执行此操作。

Something that comes quite close is this:

一些非常接近的是:

select t.*, sum(nextinc) over (order by id) as pattern
from (select t.*,
             (case when lead(t.value) > t.value then 1
                   when lead(t.value) = t.value then 0
                   else -1 end) as nextinc,
             (case when lag(t.value) > t.value then 1 else 0 end) as previnc                 
      from table t
     ) t;

However, the pattern goes up and down in increments of 1 instead of starting over. So, we need to find the pattern breaks. The following defines the breaks in the pattern and then increments pattern for for sequences of increasing/decreasing values:

但是,模式以1为增量上下移动而不是重新开始。所以,我们需要找到模式中断。以下定义模式中的中断,然后为递增/递减值序列递增模式:

select t.*,
       sum(nextinc) over (partition by grp order by id) as pattern
from (select t.*,
             sum(case when (prev_value <= value and value <= next_value) or
                           (prev_value >= value and value >= next_value)
                      then 0 else 1
                 end) over (order by id) as grp
      from (select t.*, lead(t.value) over (order by id) as next_value,
                   lag(t.value) over (order by id) as prev_value,
                   (case when lead(t.value) over (order by id) > t.value then 1
                         when lead(t.value) over (order by id) = t.value then 0
                         else -1 end) as nextinc  
            from table t
           ) t
      ) t

#2


0  

For the given example, the following seems to do the job:

对于给定的示例,以下似乎可以完成这项工作:

SELECT
  S3.id
  , S3.value
  , S3.pattern
  , SUM(minusNullPlus) OVER (PARTITION BY sequenceID ORDER BY id) calculated
FROM
  (SELECT
    S2.*
    , SUM(newSequence) OVER (ORDER BY id) sequenceID
  FROM
    (SELECT
      S1.*
      , CASE
          WHEN minusNullPlus = LAG(minusNullPlus, 1, NULL) OVER (ORDER BY id)
               OR
               minusNullPlus = 0
               OR
               (minusNullPlus = 1
                AND
                value - LAG(value, 1, NULL) OVER (ORDER BY id) = 1
               )
               OR
               (minusNullPlus = -1
                AND
                value - LAG(value, 1, NULL) OVER (ORDER BY id) = -1
               )
            THEN 0
          ELSE 1
        END newSequence
    FROM
      (SELECT
        id
        , value
        , CASE
            WHEN value > LAG(value, 1, NULL) OVER (ORDER BY id) THEN 1
            WHEN value < LAG(value, 1, NULL) OVER (ORDER BY id) THEN -1
            WHEN value = LAG(value, 1, NULL) OVER (ORDER BY id) THEN 0
            ELSE 0
          END minusNullPlus
        , CASE
            WHEN value - LAG(value, 1, NULL) OVER (ORDER BY id) = 0 THEN 0
            ELSE 1
          END change
      , pattern
      FROM SomeTable
      ) S1
    ) S2
  ) S3
ORDER BY id
;

See it in action: SQL Fiddle
It uses some additional data to check against - please verify the respective patterns to be actually in line with your expectations/requirements.

看看它的实际应用:SQL Fiddle它使用了一些额外的数据来检查 - 请确认各自的模式实际上符合您的期望/要求。

NB: The suggested solution relies on some of the particularities of the provided sample data (and its expansion in above SQL Fiddle).

注意:建议的解决方案依赖于提供的示例数据的一些特性(以及它在上面的SQL Fiddle中的扩展)。

Please comment, if and as adjustment / further detail is required.

如果需要调整/进一步详细说明,请发表评论。