如何在SQL中有效地将不同的值计数到不同的行中?

时间:2021-11-22 23:07:57

Problem:

问题:

Say there is a simple (yet big) table foods

假设有一种简单(但很大)的餐桌食品

id   name 
--   -----------  
01   ginger beer
02   white wine
03   red wine
04   ginger wine

I'd like to count how many entries have specific hardcoded patterns, say contain words 'ginger' (LIKE '%ginger%') or 'wine' (LIKE '%wine%'), or whatever else in them, and write these numbers into rows along comments. The result I'm looking for is the following

我想要计算有多少条目具有特定的硬编码模式,比如包含单词“ginger”(比如“%ginger%”)或“wine”(比如“%wine%”),或者其中的其他内容,并将这些数字写在注释中。我寻找的结果如下。

comment           total 
---------------   -----  
contains ginger   2
for wine lovers   3

Solution 1 (good format but inefficient):

解决方案1(格式好但效率低):

It is possible to use UNION ALL and construct the following

可以使用UNION ALL并构造以下内容

SELECT * FROM
(
  (
    SELECT
      'contains ginger' AS comment,
      sum((name LIKE '%ginger%')::INT) AS total
    FROM foods
  )
  UNION ALL
  (
    SELECT
      'for wine lovers' AS comment,
      sum((name LIKE '%wine%')::INT) AS total
    FROM foods
  )
)

Apparently it works similarly to simply executing multiple queries and sewing them together afterwards. It is very inefficient.

显然,它的工作原理类似于简单地执行多个查询并在之后将它们组装在一起。这是非常低效的。

Solution 2 (efficient but bad format):

解决方案2(高效但糟糕的格式):

The following is multiple times faster compared to previous solution

与以前的解决方案相比,下面的方法要快很多倍

SELECT
  sum((name LIKE '%ginger%')::INT) AS contains_ginger,
  sum((name LIKE '%wine%')::INT) AS for_wine_lovers
FROM foods

And the result is

结果是

contains_ginger   for_wine_lovers 
---------------   ---------------  
2                 3

So it is definitely possible to get the same information much faster, but in a wrong format...

因此,获得同样的信息肯定是有可能的,但是以一种错误的格式……

Discussion:

讨论:

What is the best overall approach? What should I do to get the result I want in an efficient manner and preferable format? Or is it really impossible?

什么是最好的整体方法?我应该怎样做才能以一种有效的方式和更好的格式得到我想要的结果?还是真的不可能?

By the way, I am writing this for Redshift (based on PostgreSQL).

顺便说一下,我写这篇文章是为了Redshift(基于PostgreSQL)。

Thanks.

谢谢。

12 个解决方案

#1


1  

option 1: manually reshape

选项1:手动重塑

CREATE TEMPORARY TABLE wide AS (
  SELECT
    sum((name LIKE '%ginger%')::INT) AS contains_ginger,
    sum((name LIKE '%wine%')::INT) AS for_wine_lovers
    ...
  FROM foods;
SELECT
  'contains ginger', contains_ginger FROM wide

UNION ALL
SELECT 
  'for wine lovers', contains_wine FROM wine

UNION ALL
...;

option 2: create a categories table & use a join

选项2:创建一个目录表并使用一个连接

-- not sure if redshift supports values, hence I'm using the union all to build the table
WITH categories (category_label, food_part) AS (
    SELECT 'contains ginger', 'ginger'
    union all
    SELECT 'for wine lovers', 'wine'
    ...
)
SELECT
categories.category_label, COUNT(*)
FROM categories
LEFT JOIN foods ON foods.name LIKE ('%' || categories.food_part || '%')
GROUP BY 1

Since your solution 2 you consider to be fast enough, option 1 should work for you.

由于您认为解决方案2足够快,所以选项1应该适合您。

Option 2 should also be fairly efficient, and it is much easier to write & extend, and as an added bonus, this query will let you know if no foods exist in a given category.

选项2也应该是相当高效的,并且更容易编写和扩展,作为附加的好处,这个查询将让您知道在给定的类别中是否不存在任何食物。

Option 3: Reshape & redistribute your data to better match the grouping keys.

选项3:重新配置和重新分配数据,以更好地匹配分组键。

You could also pre-process your dataset if the query execution time is very important. A lot the benefits of this depend on your data volume and data distribution. Do you only have a few hard categories, or will they be searched dynamically from some sort of interface.

如果查询执行时间非常重要,也可以预先处理数据集。这在很大程度上取决于您的数据量和数据分布。您是否只有几个硬类别,或者它们将从某种接口动态地搜索。

For example:

例如:

If the dataset were reshaped like this:

如果数据集的形状是这样的:

content   name 
--------  ----
ginger    01
ginger    04
beer      01
white     02
wine      02 
wine      04
wine      03

Then you could shard & distribute on content, and each instance could execute that part of the aggregation in parallel.

然后可以对内容进行切分和分发,每个实例可以并行地执行聚合的这一部分。

Here an equivalent query might look like this:

在这里,类似的查询可能是这样的:

WITH content_count AS (
  SELECT content, COUNT(*) total
  FROM reshaped_food_table 
  GROUP BY 1
)
SELECT
    CASE content 
      WHEN 'ginger' THEN 'contains ginger'
      WHEN 'wine' THEN 'for wine lovers'
      ELSE 'other' 
    END category
  , total
FROM content_count

#2


2  

In both the queries LIKE operator is used. Alternatively We can use Position to find the location of the hardcoded words in the name. If hardcoded words are available in the name then a number greater than 0 will be returned.

在这两个查询中都使用了操作符。或者,我们可以使用Position来查找名称中的硬编码词的位置。如果硬编码的单词在名称中可用,那么将返回大于0的数字。

SELECT 
       unnest(array['ginger', 'wine']) AS comments,
       unnest(array[ginger, wine]) AS count
FROM(
     (SELECT sum(contains_ginger) ginger , sum(contains_wine) wine
        FROM
             (SELECT CASE WHEN Position('ginger' in name)>0 
                          THEN 1 
                           END contains_ginger,
                     CASE WHEN Position('wine' in name) > 0 
                          THEN 1
                           END contains_wine
                 FROM foods) t) t1

#3


1  

I don't know about Redshift, but in Postgres I'd start with something like this:

我不知道红移,但在Postgres中我会以这样的方式开始:

WITH foods (id, name) AS (VALUES 
  (1, 'ginger beer'), (2, 'white wine'), (3, 'red wine'), (4, 'ginger wine'))
SELECT hardcoded.comment, count(*)
FROM (VALUES ('%ginger%', 'contains ginger'), ('%wine%', 'for wine lovers'))
  AS hardcoded (pattern, comment)
JOIN foods ON foods.name LIKE hardcoded.pattern
GROUP BY hardcoded.comment;

┌─────────────────┬───────┐
│     comment     │ count │
├─────────────────┼───────┤
│ contains ginger │     2 │
│ for wine lovers │     3 │
└─────────────────┴───────┘
(2 rows)

and if it's OK then I'd go on to create appropriate indexes on foods.name. That might include indexes on name and reverse(name); or perhaps (name gist_trgm_ops), but I don't expect Redshift to provide pg_trgm.

如果可以的话,我将继续在food .name上创建适当的索引。这可能包括关于名称和反向(名称)的索引;或者(name gist_trgm_ops),但是我不期望Redshift提供pg_trgm。

#4


1  

Try this for size:

试试这个尺寸:

Declare @searchTerms table (term varchar(100), comment varchar(100))
insert into @searchTerms values
('ginger','contains ginger')
,('wine','for wine lovers')
-- Add any others here

select t.comment, isnull(count(f.id),0) [total]
from @searchTerms t
left join food f on (f.name like '%'+t.term+'%')
group by t.term
order by 1

I'm not sure what the temp table syntax is for postgresql - this example is for MS SQL Server, but I'm sure you get the idea

我不确定postgresql的临时表语法是什么——这个示例是针对MS SQL Server的,但是我肯定您已经理解了

UPDATE: According to the online converter at SQLines the syntax is effectively the same

更新:根据SQLines的在线转换器,语法实际上是相同的

#5


1  

Redshift is rather limited in comparison to modern Postgres.
No unnest(), no array_agg(), no ARRAY constructor, no VALUES expression, no LATERAL joins, no tablefunc module. All the tools that would make this nice an simple. At least we have CTEs ...

与现代的Postgres相比,Redshift是相当有限的。没有unnest()、没有array_agg()、没有数组构造函数、没有值表达式、没有横向连接、没有tablefunc模块。所有的工具都可以使这个简单。至少我们有CTEs…

This should work and be fast and relatively simple to expand:

这应该是可行的,并且要快速和相对简单地扩展:

WITH ct AS (
   SELECT a.arr
        , count(name ~ arr[1] OR NULL) AS ct1
        , count(name ~ arr[2] OR NULL) AS ct2
        , count(name ~ arr[3] OR NULL) AS ct3
     -- , ... more
   FROM   foods
   CROSS  JOIN (SELECT '{ginger, wine, bread}'::text[]) AS a(arr)
   )
SELECT arr[1] AS comment, ct1 AS total FROM ct
UNION ALL SELECT arr[2], ct2 FROM ct
UNION ALL SELECT arr[3], ct3 FROM ct
--  ... more

I use the Posix operator ~ to replace LIKE, just because it's shorter and no need for added placeholder %. Performance is about the same for this simple form in Postgres, not sure about Redshift.

我使用Posix操作符~来替换LIKE,只是因为它更短,不需要添加占位符%。对于Postgres中的这个简单表单,性能大致相同,但对Redshift并不确定。

count(boolean_expression OR NULL) should be a bit faster than sum(boolean_expression::int).

count(boolean_expression或NULL)应该比sum快一点(boolean_expression::int)。

Indexes won't be able to improve performance of this single sequential scan over the whole table.

索引将无法改进对整个表的这种单次顺序扫描的性能。

#6


0  

A little bit of searching suggests that you could use your second approach for efficiency, and place the result into a CTE, which you then unnest(), as per: unpivot and PostgreSQL

稍微搜索一下就会发现,您可以使用第二种方法来提高效率,并将结果放入CTE中,然后将其unnest(),如:unpivot和PostgreSQL

#7


0  

Try this -

试试这个,

SELECT 'contains ginger' AS comment
      , Count(*) AS total
FROM foods
WHERE name LIKE '%ginger%'
UNION ALL
SELECT 'for wine lovers',
      , count(*)
FROM foods
WHERE name LIKE '%wine%'

#8


0  

Have you considered using cursors?

你考虑过使用游标吗?

Here is an example I wrote for SQL Server.

下面是我为SQL Server编写的一个示例。

You just need to have some table with all the values you want to make search (I called it SearchWordTable in the example below and the column name SearchWord) in the foods table.

您只需要有一些表,其中包含要进行搜索的所有值(我在下面的示例中将其称为SearchWordTable和列名SearchWord)。

CREATE TABLE #TemporaryTable 
(
    KeyWord nvarchar(50),
    ResultCount int
);

DECLARE @searchWord nvarchar(50)
DECLARE @count INT

DECLARE statistic_cursor CURSOR FOR   
SELECT SearchWord
FROM SearchWordTable

OPEN statistic_cursor  
FETCH NEXT FROM statistic_cursor INTO @searchWord  

WHILE @@FETCH_STATUS = 0  
BEGIN  
    SELECT @count = COUNT(1) FROM foods
    WHERE name LIKE '%'+@searchWord+'%'

    INSERT INTO #TemporaryTable (KeyWord, ResultCount) VALUES (@searchWord, @count)

    FETCH NEXT FROM product_cursor INTO @product  
END  

CLOSE product_cursor  
DEALLOCATE product_cursor

SELECT * FROM #TemporaryTable

DROP #TemporaryTable

#9


0  

I think the best option is to split the ingredients list into parts and then to count them.

我认为最好的选择是把成分表分成几个部分,然后再数一数。

"Pass0".."Pass4" and "Numbers" is just a Tally table to get a list of numbers 1..256 to emulate the unnest.

“Pass0”……““Pass4”和“Numbers”只是一个用来获取数字1列表的计数表。256模拟unnest。

"comments" is a a simple table you should have somewhere with ingredients and their comments

“comments”是一个简单的表格,你应该有一些成分和它们的注释

use your table "foods" instead of mine ;)

用你的“食物”代替我的;

let's take a look

让我们看一看

with
Pass0 as (select '1' as C union all select '1'), --2 rows
Pass1 as (select '1' as C from Pass0 as A, Pass0 as B),--4 rows
Pass2 as (select '1' as C from Pass1 as A, Pass1 as B),--16 rows
Pass3 as (select '1' as C from Pass2 as A, Pass2 as B),--256 rows
numbers as (
    select ROW_NUMBER() OVER(ORDER BY C) AS N FROM Pass3
),    
comments as (
    select 'ginger' ingredient, 'contains ginger' comment union all 
    select 'wine', 'for wine lovers' union all 
    select 'ale', 'a warm kind of beer' union all 
    select 'beer', 'your old friend'
),
foods as (
    select 01 id, 'ginger beer' name union all 
    select 02   ,'white wine' union all 
    select 03   ,'red wine' union all 
    select 04   ,'ginger wine' union all 
    select 05   ,'ginger ale' union all 
    select 06   ,'pale ale' union all 
    select 07   ,'ginger beer' union all 
),
ingredients as (
    select ingredient, COUNT(*) n
    from foods d
    CROSS JOIN LATERAL(
        select SPLIT_PART(d.name, ' ', n.n) ingredient
        from numbers n
        where SPLIT_PART(d.name, ' ', n.n)<>''
    ) ingredients
    group by ingredient
)
select i.*, isnull(c.comment, 'no comment..') comment
from ingredients i
left join comments c on c.ingredient = i.ingredient

ingredient  n   comment
ale         2   a warm kind of beer
beer        2   your old friend
ginger      4   contains ginger
pale        1   no comment..
red         1   no comment..
white       1   no comment..
wine        3   for wine lovers

#10


0  

Here you go. ????

给你。????

The WHERE filter reduces the rows going into the GROUP BY aggregation. It's not necessary for smaller data but will help if the table is in the billions of rows. Add additional patterns to the REGEXP filter and the CASE statement.

WHERE过滤器通过聚合减少进入组的行。对于较小的数据来说,这是不必要的,但如果表位于数十亿行中,这将有所帮助。向REGEXP过滤器和CASE语句添加其他模式。

SELECT CASE WHEN name LIKE '%ginger%' THEN 'contains ginger' 
            WHEN name LIKE '%wine%'   THEN 'for wine lovers'
       ELSE NULL END "comment"
      ,COUNT(*) total
FROM grouping_test
WHERE REGEXP_INSTR(name,'ginger|wine')
GROUP BY 1
;

#11


0  

From your sample it seems like your product names contain up to 2 words. It's more efficient to break by space and check if individual chunks match than like, and then manually reshape as said in the other response

从你的样品来看,你的产品名称似乎最多包含两个字。根据空间进行分解并检查各个块是否匹配,然后按照另一个响应中所说的手动重构,这样做会更有效

WITH counts as (
    SELECT 
      sum(('ginger' in (split_part(name,' ',1),split_part(name,' ',2)))::INT) AS contains_ginger,
      sum(('wine' in (split_part(name,' ',1),split_part(name,' ',2)))::INT) AS for_wine_lovers
    FROM foods
)
-- manually reshape

#12


0  

Try with SQL like this:

试试下面的SQL语句:

SELECT count(1) as total,'contains ginger' result
FROM foods where names LIKE '%ginger%' 
union all
SELECT count(1),'for wine lovers' 
FROM foods where names LIKE '%wine%'

#1


1  

option 1: manually reshape

选项1:手动重塑

CREATE TEMPORARY TABLE wide AS (
  SELECT
    sum((name LIKE '%ginger%')::INT) AS contains_ginger,
    sum((name LIKE '%wine%')::INT) AS for_wine_lovers
    ...
  FROM foods;
SELECT
  'contains ginger', contains_ginger FROM wide

UNION ALL
SELECT 
  'for wine lovers', contains_wine FROM wine

UNION ALL
...;

option 2: create a categories table & use a join

选项2:创建一个目录表并使用一个连接

-- not sure if redshift supports values, hence I'm using the union all to build the table
WITH categories (category_label, food_part) AS (
    SELECT 'contains ginger', 'ginger'
    union all
    SELECT 'for wine lovers', 'wine'
    ...
)
SELECT
categories.category_label, COUNT(*)
FROM categories
LEFT JOIN foods ON foods.name LIKE ('%' || categories.food_part || '%')
GROUP BY 1

Since your solution 2 you consider to be fast enough, option 1 should work for you.

由于您认为解决方案2足够快,所以选项1应该适合您。

Option 2 should also be fairly efficient, and it is much easier to write & extend, and as an added bonus, this query will let you know if no foods exist in a given category.

选项2也应该是相当高效的,并且更容易编写和扩展,作为附加的好处,这个查询将让您知道在给定的类别中是否不存在任何食物。

Option 3: Reshape & redistribute your data to better match the grouping keys.

选项3:重新配置和重新分配数据,以更好地匹配分组键。

You could also pre-process your dataset if the query execution time is very important. A lot the benefits of this depend on your data volume and data distribution. Do you only have a few hard categories, or will they be searched dynamically from some sort of interface.

如果查询执行时间非常重要,也可以预先处理数据集。这在很大程度上取决于您的数据量和数据分布。您是否只有几个硬类别,或者它们将从某种接口动态地搜索。

For example:

例如:

If the dataset were reshaped like this:

如果数据集的形状是这样的:

content   name 
--------  ----
ginger    01
ginger    04
beer      01
white     02
wine      02 
wine      04
wine      03

Then you could shard & distribute on content, and each instance could execute that part of the aggregation in parallel.

然后可以对内容进行切分和分发,每个实例可以并行地执行聚合的这一部分。

Here an equivalent query might look like this:

在这里,类似的查询可能是这样的:

WITH content_count AS (
  SELECT content, COUNT(*) total
  FROM reshaped_food_table 
  GROUP BY 1
)
SELECT
    CASE content 
      WHEN 'ginger' THEN 'contains ginger'
      WHEN 'wine' THEN 'for wine lovers'
      ELSE 'other' 
    END category
  , total
FROM content_count

#2


2  

In both the queries LIKE operator is used. Alternatively We can use Position to find the location of the hardcoded words in the name. If hardcoded words are available in the name then a number greater than 0 will be returned.

在这两个查询中都使用了操作符。或者,我们可以使用Position来查找名称中的硬编码词的位置。如果硬编码的单词在名称中可用,那么将返回大于0的数字。

SELECT 
       unnest(array['ginger', 'wine']) AS comments,
       unnest(array[ginger, wine]) AS count
FROM(
     (SELECT sum(contains_ginger) ginger , sum(contains_wine) wine
        FROM
             (SELECT CASE WHEN Position('ginger' in name)>0 
                          THEN 1 
                           END contains_ginger,
                     CASE WHEN Position('wine' in name) > 0 
                          THEN 1
                           END contains_wine
                 FROM foods) t) t1

#3


1  

I don't know about Redshift, but in Postgres I'd start with something like this:

我不知道红移,但在Postgres中我会以这样的方式开始:

WITH foods (id, name) AS (VALUES 
  (1, 'ginger beer'), (2, 'white wine'), (3, 'red wine'), (4, 'ginger wine'))
SELECT hardcoded.comment, count(*)
FROM (VALUES ('%ginger%', 'contains ginger'), ('%wine%', 'for wine lovers'))
  AS hardcoded (pattern, comment)
JOIN foods ON foods.name LIKE hardcoded.pattern
GROUP BY hardcoded.comment;

┌─────────────────┬───────┐
│     comment     │ count │
├─────────────────┼───────┤
│ contains ginger │     2 │
│ for wine lovers │     3 │
└─────────────────┴───────┘
(2 rows)

and if it's OK then I'd go on to create appropriate indexes on foods.name. That might include indexes on name and reverse(name); or perhaps (name gist_trgm_ops), but I don't expect Redshift to provide pg_trgm.

如果可以的话,我将继续在food .name上创建适当的索引。这可能包括关于名称和反向(名称)的索引;或者(name gist_trgm_ops),但是我不期望Redshift提供pg_trgm。

#4


1  

Try this for size:

试试这个尺寸:

Declare @searchTerms table (term varchar(100), comment varchar(100))
insert into @searchTerms values
('ginger','contains ginger')
,('wine','for wine lovers')
-- Add any others here

select t.comment, isnull(count(f.id),0) [total]
from @searchTerms t
left join food f on (f.name like '%'+t.term+'%')
group by t.term
order by 1

I'm not sure what the temp table syntax is for postgresql - this example is for MS SQL Server, but I'm sure you get the idea

我不确定postgresql的临时表语法是什么——这个示例是针对MS SQL Server的,但是我肯定您已经理解了

UPDATE: According to the online converter at SQLines the syntax is effectively the same

更新:根据SQLines的在线转换器,语法实际上是相同的

#5


1  

Redshift is rather limited in comparison to modern Postgres.
No unnest(), no array_agg(), no ARRAY constructor, no VALUES expression, no LATERAL joins, no tablefunc module. All the tools that would make this nice an simple. At least we have CTEs ...

与现代的Postgres相比,Redshift是相当有限的。没有unnest()、没有array_agg()、没有数组构造函数、没有值表达式、没有横向连接、没有tablefunc模块。所有的工具都可以使这个简单。至少我们有CTEs…

This should work and be fast and relatively simple to expand:

这应该是可行的,并且要快速和相对简单地扩展:

WITH ct AS (
   SELECT a.arr
        , count(name ~ arr[1] OR NULL) AS ct1
        , count(name ~ arr[2] OR NULL) AS ct2
        , count(name ~ arr[3] OR NULL) AS ct3
     -- , ... more
   FROM   foods
   CROSS  JOIN (SELECT '{ginger, wine, bread}'::text[]) AS a(arr)
   )
SELECT arr[1] AS comment, ct1 AS total FROM ct
UNION ALL SELECT arr[2], ct2 FROM ct
UNION ALL SELECT arr[3], ct3 FROM ct
--  ... more

I use the Posix operator ~ to replace LIKE, just because it's shorter and no need for added placeholder %. Performance is about the same for this simple form in Postgres, not sure about Redshift.

我使用Posix操作符~来替换LIKE,只是因为它更短,不需要添加占位符%。对于Postgres中的这个简单表单,性能大致相同,但对Redshift并不确定。

count(boolean_expression OR NULL) should be a bit faster than sum(boolean_expression::int).

count(boolean_expression或NULL)应该比sum快一点(boolean_expression::int)。

Indexes won't be able to improve performance of this single sequential scan over the whole table.

索引将无法改进对整个表的这种单次顺序扫描的性能。

#6


0  

A little bit of searching suggests that you could use your second approach for efficiency, and place the result into a CTE, which you then unnest(), as per: unpivot and PostgreSQL

稍微搜索一下就会发现,您可以使用第二种方法来提高效率,并将结果放入CTE中,然后将其unnest(),如:unpivot和PostgreSQL

#7


0  

Try this -

试试这个,

SELECT 'contains ginger' AS comment
      , Count(*) AS total
FROM foods
WHERE name LIKE '%ginger%'
UNION ALL
SELECT 'for wine lovers',
      , count(*)
FROM foods
WHERE name LIKE '%wine%'

#8


0  

Have you considered using cursors?

你考虑过使用游标吗?

Here is an example I wrote for SQL Server.

下面是我为SQL Server编写的一个示例。

You just need to have some table with all the values you want to make search (I called it SearchWordTable in the example below and the column name SearchWord) in the foods table.

您只需要有一些表,其中包含要进行搜索的所有值(我在下面的示例中将其称为SearchWordTable和列名SearchWord)。

CREATE TABLE #TemporaryTable 
(
    KeyWord nvarchar(50),
    ResultCount int
);

DECLARE @searchWord nvarchar(50)
DECLARE @count INT

DECLARE statistic_cursor CURSOR FOR   
SELECT SearchWord
FROM SearchWordTable

OPEN statistic_cursor  
FETCH NEXT FROM statistic_cursor INTO @searchWord  

WHILE @@FETCH_STATUS = 0  
BEGIN  
    SELECT @count = COUNT(1) FROM foods
    WHERE name LIKE '%'+@searchWord+'%'

    INSERT INTO #TemporaryTable (KeyWord, ResultCount) VALUES (@searchWord, @count)

    FETCH NEXT FROM product_cursor INTO @product  
END  

CLOSE product_cursor  
DEALLOCATE product_cursor

SELECT * FROM #TemporaryTable

DROP #TemporaryTable

#9


0  

I think the best option is to split the ingredients list into parts and then to count them.

我认为最好的选择是把成分表分成几个部分,然后再数一数。

"Pass0".."Pass4" and "Numbers" is just a Tally table to get a list of numbers 1..256 to emulate the unnest.

“Pass0”……““Pass4”和“Numbers”只是一个用来获取数字1列表的计数表。256模拟unnest。

"comments" is a a simple table you should have somewhere with ingredients and their comments

“comments”是一个简单的表格,你应该有一些成分和它们的注释

use your table "foods" instead of mine ;)

用你的“食物”代替我的;

let's take a look

让我们看一看

with
Pass0 as (select '1' as C union all select '1'), --2 rows
Pass1 as (select '1' as C from Pass0 as A, Pass0 as B),--4 rows
Pass2 as (select '1' as C from Pass1 as A, Pass1 as B),--16 rows
Pass3 as (select '1' as C from Pass2 as A, Pass2 as B),--256 rows
numbers as (
    select ROW_NUMBER() OVER(ORDER BY C) AS N FROM Pass3
),    
comments as (
    select 'ginger' ingredient, 'contains ginger' comment union all 
    select 'wine', 'for wine lovers' union all 
    select 'ale', 'a warm kind of beer' union all 
    select 'beer', 'your old friend'
),
foods as (
    select 01 id, 'ginger beer' name union all 
    select 02   ,'white wine' union all 
    select 03   ,'red wine' union all 
    select 04   ,'ginger wine' union all 
    select 05   ,'ginger ale' union all 
    select 06   ,'pale ale' union all 
    select 07   ,'ginger beer' union all 
),
ingredients as (
    select ingredient, COUNT(*) n
    from foods d
    CROSS JOIN LATERAL(
        select SPLIT_PART(d.name, ' ', n.n) ingredient
        from numbers n
        where SPLIT_PART(d.name, ' ', n.n)<>''
    ) ingredients
    group by ingredient
)
select i.*, isnull(c.comment, 'no comment..') comment
from ingredients i
left join comments c on c.ingredient = i.ingredient

ingredient  n   comment
ale         2   a warm kind of beer
beer        2   your old friend
ginger      4   contains ginger
pale        1   no comment..
red         1   no comment..
white       1   no comment..
wine        3   for wine lovers

#10


0  

Here you go. ????

给你。????

The WHERE filter reduces the rows going into the GROUP BY aggregation. It's not necessary for smaller data but will help if the table is in the billions of rows. Add additional patterns to the REGEXP filter and the CASE statement.

WHERE过滤器通过聚合减少进入组的行。对于较小的数据来说,这是不必要的,但如果表位于数十亿行中,这将有所帮助。向REGEXP过滤器和CASE语句添加其他模式。

SELECT CASE WHEN name LIKE '%ginger%' THEN 'contains ginger' 
            WHEN name LIKE '%wine%'   THEN 'for wine lovers'
       ELSE NULL END "comment"
      ,COUNT(*) total
FROM grouping_test
WHERE REGEXP_INSTR(name,'ginger|wine')
GROUP BY 1
;

#11


0  

From your sample it seems like your product names contain up to 2 words. It's more efficient to break by space and check if individual chunks match than like, and then manually reshape as said in the other response

从你的样品来看,你的产品名称似乎最多包含两个字。根据空间进行分解并检查各个块是否匹配,然后按照另一个响应中所说的手动重构,这样做会更有效

WITH counts as (
    SELECT 
      sum(('ginger' in (split_part(name,' ',1),split_part(name,' ',2)))::INT) AS contains_ginger,
      sum(('wine' in (split_part(name,' ',1),split_part(name,' ',2)))::INT) AS for_wine_lovers
    FROM foods
)
-- manually reshape

#12


0  

Try with SQL like this:

试试下面的SQL语句:

SELECT count(1) as total,'contains ginger' result
FROM foods where names LIKE '%ginger%' 
union all
SELECT count(1),'for wine lovers' 
FROM foods where names LIKE '%wine%'