如何使用BigQuery计算GitHub上的推送事件?

时间:2021-11-08 15:05:27

I'm trying to use the public GitHub dataset on BigQuery to count events - PushEvents, in this case - on a per repository basis over time.

我正在尝试使用BigQuery上的公共GitHub数据集来计算事件 - 在这种情况下是PushEvents - 在每个存储库的基础上随着时间的推移。

SELECT COUNT(*)
FROM [githubarchive:github.timeline]
WHERE type = 'PushEvent' 
    AND repository_name = "account/repo"
GROUP BY pushed_at
ORDER BY pushed_at DESC

Basically just retrieve the count for a specified repo and event type, group the count by date and return the list. BigQuery validates the following, but then fails the query with a:

基本上只检索指定仓库和事件类型的计数,按日期对计数进行分组并返回列表。 BigQuery验证以下内容,但随后使用以下命令使查询失败:

Field 'pushed_at' not found.

As far as I can tell from GitHub's PushEvent documentation, however, pushed_at is an available field. Anybody have examples of related queries that execute properly? Any suggestions as to what's being done incorrectly here?

但是,据我所知,从GitHub的PushEvent文档中,push_at是一个可用字段。有人有正确执行的相关查询的例子吗?关于这里做错了什么的任何建议?

1 个解决方案

#1


The field is called repository_pushed_at, and you also probably meant to include it in the SELECT list, i.e.

该字段称为repository_pushed_at,您也可能将其包含在SELECT列表中,即

SELECT repository_pushed_at, COUNT(*)
FROM [githubarchive:github.timeline]
WHERE type = 'PushEvent' 
    AND repository_name = "account/repo"
GROUP BY repository_pushed_at
ORDER BY repository_pushed_at DESC

#1


The field is called repository_pushed_at, and you also probably meant to include it in the SELECT list, i.e.

该字段称为repository_pushed_at,您也可能将其包含在SELECT列表中,即

SELECT repository_pushed_at, COUNT(*)
FROM [githubarchive:github.timeline]
WHERE type = 'PushEvent' 
    AND repository_name = "account/repo"
GROUP BY repository_pushed_at
ORDER BY repository_pushed_at DESC