Condense multiple rows to single row with counts based on unique values in sqlite

时间:2021-05-09 13:21:54

I am trying to condense a table which contains multiple rows per event to a smaller table which contains counts of key sub-events within each event. Events are defined based on unique combinations across columns.

我试图将每个事件包含多行的表压缩到一个较小的表中,该表包含每个事件中的关键子事件的计数。事件是基于列之间的唯一组合定义的。

As a specific example, say I have the following data involving customer visits to various stores on different dates with different items purchased:

作为一个具体的例子,假设我有以下数据,涉及客户在不同日期购买不同商品的各种商店:

cust date   store   item_type
a    1     Main St  1
a    1     Main St  2
a    1     Main St  2
a    1     Main St  2
b    1     Main St  1
b    1     Main St  2
b    1     Main St  2
c    1     Main St  1
d    2     Elm St   1
d    2     Elm St   3
e    2     Main St  1
e    2     Main St  1
a    3     Main St  1
a    3     Main St  2

I would like to restructure the data to a table that contains a single line per customer visit on a given day, with appropriate counts. I am trying to understand how to use SQLite to condense this to:

我想将数据重组为一个表,该表在给定的一天中每个客户访问包含一行,并具有适当的计数。我试图了解如何使用SQLite将其浓缩为:

Index   cust   date   store     n_items    item1    item2    item3     item4
1         a      1   Main St        4        1        3        0        0
2         b      1   Main St        3        1        2        0        0
3         c      1   Main St        1        1        0        0        0
4         d      2   Elm St         2        1        0        1        0
5         e      2   Main St        2        2        0        0        0
6         a      3   Main St        2        1        1        0        0

I can do this in excel for this trivial example (begin with sumproduct( cutomer * date) as suggested here, followed by cumulative sum on this column to generate Index, then countif and countifs to generate desired counts).

我可以在excel中为这个简单的例子做这个(从这里建议的sumproduct(cutomer * date)开始,然后在这个列上生成Index的累积和,然后用countif和countif来生成所需的计数)。

Excel is poorly suited to doing this for thousands of rows, so I am looking for a solution using SQLite.

Excel非常适合在数千行中执行此操作,因此我正在寻找使用SQLite的解决方案。

Sadly, my SQLite kung-fu is weak.

可悲的是,我的SQLite功夫很弱。

I think this is the closest I have found, but I am having trouble understanding exactly how to adapt it.

我认为这是我找到的最接近的,但我很难理解如何适应它。

When I tried a more basic approach to begin by generating a unique index:

当我尝试通过生成唯一索引开始更基本的方法时:

CREATE UNIQUE INDEX ui ON t(cust, date);

I get:

Error: indexed columns are not unique

I would greatly appreciate any help with where to start. Many thanks in advance!

我非常感谢任何帮助,从哪里开始。提前谢谢了!

1 个解决方案

#1


0  

To create one result record for each unique combination of column values, use GROUP BY.

要为每个唯一的列值组合创建一个结果记录,请使用GROUP BY。

The number of records in the group is available with COUNT.

COUNT中的记录数可用。

To count specific item types, use a boolean expression like item_type=x, which returns 0 or 1, and sum this over all records in the group:

要计算特定的项类型,请使用类似于item_type = x的布尔表达式,它返回0或1,并对组中的所有记录求和:

SELECT cust,
       date,
       store,
       COUNT(*) AS n_items,
       SUM(item_type = 1) AS item1,
       SUM(item_type = 2) AS item2,
       SUM(item_type = 3) AS item3,
       SUM(item_type = 4) AS item4
FROM t
GROUP BY cust,
         date,
         store

#1


0  

To create one result record for each unique combination of column values, use GROUP BY.

要为每个唯一的列值组合创建一个结果记录,请使用GROUP BY。

The number of records in the group is available with COUNT.

COUNT中的记录数可用。

To count specific item types, use a boolean expression like item_type=x, which returns 0 or 1, and sum this over all records in the group:

要计算特定的项类型,请使用类似于item_type = x的布尔表达式,它返回0或1,并对组中的所有记录求和:

SELECT cust,
       date,
       store,
       COUNT(*) AS n_items,
       SUM(item_type = 1) AS item1,
       SUM(item_type = 2) AS item2,
       SUM(item_type = 3) AS item3,
       SUM(item_type = 4) AS item4
FROM t
GROUP BY cust,
         date,
         store