I have 150,000 rows of data which I'm attempting to query in Google BigQuery.
我有150,000行数据,我试图在Google BigQuery中查询。
Column Text
contains various lengths of text, from which I want to query for particular keywords.
列文本包含各种长度的文本,我想从中查询特定关键字。
I've gotten as far as the query below which returns all rows containing a particular keyword (e.g. facebook):
我已经得到了下面的查询,它返回包含特定关键字的所有行(例如facebook):
SELECT Text From Data.Set_1
WHERE Text CONTAINS 'facebook'
Questions:
问题:
1) How do I improve the query so that it returns a total count of all occurrences of the keyword 'facebook' across 'Text' in a new column?
1)如何改进查询,以便在新列中返回“Text”中所有关键字“facebook”出现的总数?
2) How do I upscale this to multiple keywords (facebook, cnn, bbc, twitter) and return a total count of each keyword present in the data (eg facebook 42, cnn 54, bbc 88, twitter 49)?
2)如何将其升级为多个关键词(facebook,cnn,bbc,twitter)并返回数据中存在的每个关键词的总数(例如facebook 42,cnn 54,bbc 88,twitter 49)?
2 个解决方案
#1
0
for BigQuery Legacy SQL
for BigQuery Legacy SQL
SELECT
keyword,
COUNT(1) AS rows,
SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword))) AS occurences
FROM YourTable
CROSS JOIN keywords
WHERE Text CONTAINS keyword
GROUP BY keyword
Example to play with
玩的例子
SELECT
keyword,
COUNT(1) AS rows,
SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword))) AS occurences
FROM (
SELECT Text FROM
(SELECT 'facebookfacebookcnnbbccnn' AS Text),
(SELECT 'facebook' AS Text),
(SELECT 'cnn' AS Text)
) AS words
CROSS JOIN (
SELECT keyword FROM
(SELECT 'facebook' AS keyword),
(SELECT 'cnn' AS keyword),
(SELECT 'bbc' AS keyword)
) AS keywords
WHERE Text CONTAINS keyword
GROUP BY keyword
For BigQuery Standard SQL (see Enabling Standard SQL)
对于BigQuery Standard SQL(请参阅启用标准SQL)
SELECT
keyword,
COUNT(1) AS `rows`,
SUM((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword)) AS occurences
FROM YourTable
JOIN keywords
ON STRPOS(Text, keyword) > 0
GROUP BY keyword
Example to play with
玩的例子
WITH keywords AS (
SELECT 'facebook' AS keyword UNION ALL
SELECT 'cnn' AS keyword UNION ALL
SELECT 'bbc' AS keyword
),
words AS (
SELECT 'facebookfacebookcnnbbccnn' AS Text UNION ALL
SELECT 'facebook' AS Text UNION ALL
SELECT 'cnn' AS Text
)
SELECT
keyword,
COUNT(1) AS `rows`,
SUM((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword)) AS occurences
FROM words
JOIN keywords
ON STRPOS(Text, keyword) > 0
GROUP BY keyword
#2
0
You can use a derived table to include all the words you are looking for, and then use aggregation to count the matches:
您可以使用派生表来包含您要查找的所有单词,然后使用聚合来计算匹配项:
SELECT w.keyword, COUNT(s.Text)
From (SELECT 'facebook' as keyword UNION ALL
SELECT 'cnn'
) w LEFT JOIN
Data.Set_1 s
ON s.Text CONTAINS w.keyword
GROUP BY w.keyword;
Do note: This is not particularly efficient. The performance should be roughly linear in the number of keywords.
请注意:这不是特别有效。性能应该与关键字数量大致呈线性关系。
#1
0
for BigQuery Legacy SQL
for BigQuery Legacy SQL
SELECT
keyword,
COUNT(1) AS rows,
SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword))) AS occurences
FROM YourTable
CROSS JOIN keywords
WHERE Text CONTAINS keyword
GROUP BY keyword
Example to play with
玩的例子
SELECT
keyword,
COUNT(1) AS rows,
SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword))) AS occurences
FROM (
SELECT Text FROM
(SELECT 'facebookfacebookcnnbbccnn' AS Text),
(SELECT 'facebook' AS Text),
(SELECT 'cnn' AS Text)
) AS words
CROSS JOIN (
SELECT keyword FROM
(SELECT 'facebook' AS keyword),
(SELECT 'cnn' AS keyword),
(SELECT 'bbc' AS keyword)
) AS keywords
WHERE Text CONTAINS keyword
GROUP BY keyword
For BigQuery Standard SQL (see Enabling Standard SQL)
对于BigQuery Standard SQL(请参阅启用标准SQL)
SELECT
keyword,
COUNT(1) AS `rows`,
SUM((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword)) AS occurences
FROM YourTable
JOIN keywords
ON STRPOS(Text, keyword) > 0
GROUP BY keyword
Example to play with
玩的例子
WITH keywords AS (
SELECT 'facebook' AS keyword UNION ALL
SELECT 'cnn' AS keyword UNION ALL
SELECT 'bbc' AS keyword
),
words AS (
SELECT 'facebookfacebookcnnbbccnn' AS Text UNION ALL
SELECT 'facebook' AS Text UNION ALL
SELECT 'cnn' AS Text
)
SELECT
keyword,
COUNT(1) AS `rows`,
SUM((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword)) AS occurences
FROM words
JOIN keywords
ON STRPOS(Text, keyword) > 0
GROUP BY keyword
#2
0
You can use a derived table to include all the words you are looking for, and then use aggregation to count the matches:
您可以使用派生表来包含您要查找的所有单词,然后使用聚合来计算匹配项:
SELECT w.keyword, COUNT(s.Text)
From (SELECT 'facebook' as keyword UNION ALL
SELECT 'cnn'
) w LEFT JOIN
Data.Set_1 s
ON s.Text CONTAINS w.keyword
GROUP BY w.keyword;
Do note: This is not particularly efficient. The performance should be roughly linear in the number of keywords.
请注意:这不是特别有效。性能应该与关键字数量大致呈线性关系。