从谷歌分析中抓取实时访客

时间:2022-01-15 14:01:40

I have a lot of sites and want to build a dashboard showing the number of real time visitors on each of them on a single page. (would anyone else want this?) Right now the only way to view this information is to open a new tab for each site.

我有很多网站,我想创建一个仪表板,显示在一个页面上每个站点上的实时访问者数量。(还有人想要这个吗?)现在查看这些信息的唯一方法是为每个站点打开一个新的选项卡。

Google doesn't have a real-time API, so I'm wondering if it is possible to scrape this data. Eduardo Cereto found out that Google transfers the real-time data over the realtime/bind network request. Anyone more savvy have an idea of how I should start? Here's what I'm thinking:

谷歌没有实时API,所以我想知道是否有可能收集这些数据。Eduardo Cereto发现谷歌通过实时/绑定网络请求传输实时数据。有谁知道我该怎么开始?我在想什么:

  1. Figure out how to authenticate programmatically
  2. 了解如何以编程方式进行身份验证
  3. Inspect all of the realtime/bind requests to see how they change. Does each request have a unique key? Where does that come from? Below is my breakdown of the request:

    检查所有的实时/绑定请求,查看它们是如何更改的。每个请求都有唯一的密钥吗?这是从哪里来的?以下是我的申请:

    https://www.google.com/analytics/realtime/bind?VER=8

    https://www.google.com/analytics/realtime/bind?VER=8

    &key=[What is this? Where does it come from? 21 character lowercase alphanumeric, stays the same each request]

    关键=(这是什么?它从何而来?21字符小写字母数字,每次请求保持不变]

    &ds=[What is this? Where does it come from? 21 character lowercase alphanumeric, stays the same each request]

    ds =(这是什么?它从何而来?21字符小写字母数字,每次请求保持不变]

    &pageId=rt-standard%2Frt-overview

    &pageId frt-overview = rt-standard % 2

    &q=t%3A0%7C%3A1%3A0%3A%2Ct%3A11%7C%3A1%3A5%3A%2Cot%3A0%3A0%3A4%2Cot%3A0%3A0%3A3%2Ct%3A7%7C%3A1%3A10%3A6%3D%3DREFERRAL%3B%2Ct%3A10%7C%3A1%3A10%3A%2Ct%3A18%7C%3A1%3A10%3A%2Ct%3A4%7C5%7C2%7C%3A1%3A10%3A2!%3Dzz%3B%2C&f

    q = t % a0 % 7 c % 3 a1 % a0 ct % % % 2 3 3 a11 % 7 c % 3 a1 % 3 a5 % % 2 3床% 3 a0 % a0 % 3 a4 % 2床% a0 % a0 % 3 a3 % 2 ct % 3 a7 % 7 c % 3 a1 % 3 a10 % 3 a6 % 3 d % 3 dreferral % 3 b % 2 ct % 3 a10 % 7 c % 3 a1 % 3 a10 ct % % % 2 3 3那么% 7 c % 3 a1 % 3 a10 % % 2 ct % 3 3 a4 % 7 c5 % 7 c2 % 7 c % 3 a1 % 3 a10 % 3 a2 ! % 3 dzz c&f % 3 b % 2

    The q variable URI decodes to this (what the?): t:0|:1:0:,t:11|:1:5:,ot:0:0:4,ot:0:0:3,t:7|:1:10:6==REFERRAL;,t:10|:1:10:,t:18|:1:10:,t:4|5|2|:1:10:2!=zz;,&f

    问变量URI解码(什么?):师:0 |:1:0:,t:11 |:1:5:ot:0:0:4,ot:0:0:3,t:7 |:1:10:6 = =转诊;t:10 |:1:10:师:18 |:1:10:师:4 | 5 | 2 |:1:10:2 ! = zz;,f

    &RID=rpc

    掉= rpc

    &SID=[What is this? Where does it come from? 16 character uppercase alphanumeric, stays the same each request]

    sid =(这是什么?它从何而来?字符大写字母数字,保持相同的每个请求]

    &CI=0

    ci = 0

    &AID=[What is this? Where does it come from? integer, starts at 1, increments weirdly to 150 and then 298]

    援助=(这是什么?它从何而来?整数,从1开始,奇怪地递增到150然后是298]

    &TYPE=xmlhttp

    接线箱= xmlhttp

    &zx=[What is this? Where does it come from? 12 character lowercase alphanumeric, changes each request]

    zx =(这是什么?它从何而来?12个字符小写字母数字,修改每个请求]

    &t=1

    科技= 1

  4. Inspect all of the realtime/bind responses to see how they change. How does the data come in? It looks like some altered JSON. How many times do I need to connect to get the data? Where is the active visitors on site number in there? Here is a dump of sample data:

    检查所有的实时/绑定响应,看看它们是如何变化的。数据是怎么进来的?看起来有点像修改过的JSON。我需要连接多少次才能获得数据?网站上的活跃访问者在哪里?下面是一堆样本数据:

    19 [[151,["noop"] ] ] 388 [[152,["rt",[{"ot:0:0:4":{"timeUnit":"MINUTES","overTimeData":[{"values":[49,53,52,40,42,55,49,41,51,52,47,42,62,82,76,71,81,66,81,86,71,66,65,65,55,51,53,73,71,81],"name":"Total"}]},"ot:0:0:3":{"timeUnit":"SECONDS","overTimeData":[{"values":[0,1,1,1,1,0,1,0,1,1,1,0,2,0,2,2,1,0,0,0,0,0,2,1,1,2,1,2,0,5,1,0,2,1,1,1,2,0,2,1,0,5,1,1,2,0,0,0,0,0,0,0,0,0,1,1,0,3,2,0],"name":"Total"}]}}]]] ] 388 [[153,["rt",[{"ot:0:0:4":{"timeUnit":"MINUTES","overTimeData":[{"values":[52,53,52,40,42,55,49,41,51,52,47,42,62,82,76,71,81,66,81,86,71,66,65,65,55,51,53,73,71,81],"name":"Total"}]},"ot:0:0:3":{"timeUnit":"SECONDS","overTimeData":[{"values":[2,1,1,1,1,1,0,1,0,1,1,1,0,2,0,2,2,1,0,0,0,0,0,2,1,1,2,1,2,0,5,1,0,2,1,1,1,2,0,2,1,0,5,1,1,2,0,0,0,0,0,0,0,0,0,1,1,0,3,2],"name":"Total"}]}}]]] ] 388 [[154,["rt",[{"ot:0:0:4":{"timeUnit":"MINUTES","overTimeData":[{"values":[53,53,52,40,42,55,49,41,51,52,47,42,62,82,76,71,81,66,81,86,71,66,65,65,55,51,53,73,71,81],"name":"Total"}]},"ot:0:0:3":{"timeUnit":"SECONDS","overTimeData":[{"values":[0,3,1,1,1,1,1,0,1,0,1,1,1,0,2,0,2,2,1,0,0,0,0,0,2,1,1,2,1,2,0,5,1,0,2,1,1,1,2,0,2,1,0,5,1,1,2,0,0,0,0,0,0,0,0,0,1,1,0,3],"name":"Total"}]}}]]] ]

    19[[151年,["等待"]]]388[[152年(“rt”,[{“ot:0:0:4”:{“timeUnit”:“分钟”、“overTimeData”:[{“值”:[49,53岁,52岁,40岁,42岁,55岁,49岁,41岁,51岁,52岁,47岁,42岁,62,82,76,71,81,66,81,86,71,66,65,65年,55岁,51岁,53岁,73年,71年,81年),“名字”:“总”}]},“ot:0:0:3”:{“timeUnit”:“秒”,“overTimeData”:[{“值”:[0,1,1,1,- 1,0,1,0,1,1,1,0 2 0、2、2、1、0,0,0,0,0,2、1、1、2、1,2 0 5 1、0、2、1,1,1,2,0,2、1、0、5、1、1、2,0,0,0,0,0,0,0,0,0,1,1,0,3 2 0],“名字”:“总”}]} }]]]]388[[153年,[“rt”,[{“ot:0:0:4”:{“timeUnit”:“分钟”、“overTimeData”:[{“值”:[52,53岁,52岁,40岁,42岁,55岁,49岁,41岁,51岁,52岁,47岁,42岁,62,82,76,71,81,66,81,86,71,66,65,65年,55岁,51岁,53岁,73年,71年,81年),“名字”:“总”}]},“ot:0:0:3”:{“timeUnit”:“秒”,“overTimeData”:[{“值”:(2,1,1,1,1,- 1,0,1,0,1,1,1,0 2 0、2、2、1、0,0,0,0,0,2、1、1、2、1,2 0 5 1、0、2、1,1,1,2,0,2、1、0、5、1、1、2,0,0,0,0,0,0,0,0,0,1,1,0、3、2],“名字”:“总”}]} }]]]]388[[154年,[“rt”,[{“ot:0:0:4”:{“timeUnit”:“分钟”、“overTimeData”:[{“值”:[53,53岁,52岁,40岁,42岁,55岁,49岁,41岁,51岁,52岁,47岁,42岁,62,82,76,71,81,66,81,86,71,66,65,65年,55岁,51岁,53岁,73年,71年,81年),“名字”:“总”}]},“ot:0:0:3”:{“timeUnit”:“秒”,“overTimeData”:[{“值”:[0,1,1,1,1,- 1,0,1,0,1,1,1,0 2 0、2、2、1、0,0,0,0,0,2、1、1、2、1,2 0 5 1、0、2、1,1,1,2,0,2、1、0、5、1、1、2,0,0,0,0,0,0,0,0,0,1,1,0,3],“名字”:“总”}]} }]]]]

Let me know if you can help with any of the items above!

如果你能帮助我以上的任何项目!

从谷歌分析中抓取实时访客

4 个解决方案

#1


9  

To get the same, Google has launched new Real Time API. With this API you can easily retrieve real time online visitors as well as several Google Analytics with following dimensions and metrics. https://developers.google.com/analytics/devguides/reporting/realtime/dimsmets/

为了得到相同的结果,谷歌发布了新的实时API。有了这个API,您可以轻松地检索实时在线访问者以及几个具有以下维度和度量的谷歌分析。https://developers.google.com/analytics/devguides/reporting/realtime/dimsmets/

This is quite similar to Google Analytics API. To start development on this, https://developers.google.com/analytics/devguides/reporting/realtime/v3/devguide

这与谷歌分析API非常相似。要在此基础上开始开发,https://developers.google.com/analytics/devguides/reporting/realtime/v3/devguide。

#2


6  

With Google Chrome I can see the data on the Network Panel.

通过谷歌Chrome,我可以看到网络面板上的数据。

The request endpoint is https://www.google.com/analytics/realtime/bind

请求端点是https://www.google.com/analytics/realtime/bind

Seems like the connection stays open for 2.5 minutes, and during this time it just keeps getting more and more data.

看起来连接会保持2.5分钟,在这段时间里,它会不断地获得更多的数据。

After about 2.5 minutes the connection is closed and a new one is open.

大约2.5分钟后,连接关闭,新的连接打开。

On the Network panel you can only see the data for the connections that are terminated. So leave it open for 5 minutes or so and you can start to see the data.

在网络面板上,您只能看到终止连接的数据。所以把它打开5分钟左右,你就可以看到数据了。

I hope that can give you a place to start.

我希望能给你一个开始的地方。

#3


6  

Having google in the loop seems pretty redundant. Suggest you use a common element delivered on demand from the dashboard server and include this item by absolute URL on all pages to be monitored for a given site. The script outputting the item can read the IP of the browser asking and these can all be logged into a database and filtered for uniqueness giving a real time head count.

在循环中使用谷歌似乎是相当多余的。建议您使用仪表板服务器根据需要交付的公共元素,并在要监视的给定站点的所有页面上包含该项目的绝对URL。输出条目的脚本可以读取浏览器请求的IP,这些都可以登录到数据库中,并进行筛选,以获得实时的人头计数。

<?php
$user_ip = $_SERVER["REMOTE_ADDR"];
/// Some MySQL to insert $user_ip to the database table for website XXX  goes here


$file = 'tracking_image.gif';
$type = 'image/gif';
header('Content-Type:'.$type);
header('Content-Length: ' . filesize($file));
readfile($file);
?>

Ammendum: A database can also add a timestamp to every row of data it stores. This can be used to further filter results and provide the number of visitors in the last hour or minute.

Ammendum:数据库还可以向它存储的每一行数据添加时间戳。这可以用于进一步筛选结果,并在最后一小时或一分钟内提供访问者的数量。

Client side Javascript with AJAX for fine tuning or overkill The onblur and onfocus javascript commands can be used to tell if the the page is visible, pass the data back to the dashboard server via Ajax. http://www.thefutureoftheweb.com/demo/2007-05-16-detect-browser-window-focus/

使用AJAX的客户端Javascript用于微调或超调onblur和onfocus Javascript命令可以用来判断页面是否可见,通过AJAX将数据传递回仪表板服务器。http://www.thefutureoftheweb.com/demo/2007-05-16-detect-browser-window-focus/

When a visitor closes a page this can also be detected by the javascript onunload function in the body tag and Ajax can be used to send data back to the server one last time before the browser finally closes the page.

当访问者关闭页面时,也可以通过javascript onunload函数在body标记中检测到,Ajax可以在浏览器最后关闭页面之前将数据发送回服务器。

As you may also wish to collect some information about the visitor like Google analytics does this page https://panopticlick.eff.org/ has a lot of javascript that can be examined and adapted.

因为您可能也希望收集一些关于访问者的信息,比如谷歌analytics,所以这个页面https://panopticlick.eff.org/有很多可以检查和调整的javascript。

#4


4  

I needed/wanted realtime data for personal use so I reverse-engineered their system a little bit.

我需要实时数据供个人使用,所以我对他们的系统进行了一点反向工程。

Instead of binding to /bind I get data from /getData (no pun intended).

我没有绑定到/bind,而是从/getData获取数据(没有双关语)。

At /getData the minimum request is apparently: https://www.google.com/analytics/realtime/realtime/getData?pageId&key={{propertyID}}&q=t:0|:1

在/getData,最小请求显然是:https://www.google.com/analytics/realtime/realtime/getdata?

Here's a short explanation of the possible query parameters and syntax, please remember that these are all guesses and I don't know all of them:

以下是对可能的查询参数和语法的简要解释,请记住,这些都是猜测,我不知道它们的全部:

Query Syntax: pageId&key=propertyID&q=dataType:dimensions|:page|:limit:filters

查询语法:pageId&key = propertyID&q =数据类型:维度|:|页:限制:过滤器

Values:

价值观:

pageID: Required but seems to only be used for internal analytics.

propertyID: a{{accountID}}w{{webPropertyID}}p{{profileID}}, as specified at the Documentation link below. You can also find this in the URL of all analytics pages in the UI.


dataType:
    t: Current data
    ot: Overtime/Past
    c: Unknown, returns only a "count" value


dimensions (| separated or alone), most values are only applicable for t:
    1:  Country
    2:  City
    3:  Location code?
    4:  Latitude
    5:  Longitude
    6:  Traffic source type (Social, Referral, etc.)
    7:  Source
    8:  ?? Returns (not set)
    9:  Another location code? longer.
    10: Page URL
    11: Visitor Type (new/returning)
    12: ?? Returns (not set)
    13: ?? Returns (not set)
    14: Medium
    15: ?? Returns "1"

page:
    At first this seems to work for pagination but after further analysis it looks like it's also used to specify which of the 6 pages (Overview, Locations, Traffic Sources, Content, Events and Conversions) to return data for.

    For some reason 0 returns an impossibly high metrictotal

limit: Result limit per page, maximum of 50

filters:
    Syntax is as specified at the Documentation 2 link below except the OR is specified using | instead of a comma.6==CUSTOM;1==United%20States


You can also combine multiple queries in one request by comma separating them (i.e. q=t:1|2|:1|:10,t:6|:1|:10).

您还可以在一个请求中通过逗号分隔多个查询(即q=t:1|2|:1|:10,t:6|:1|:10)。

Following the above "documentation", if you wanted to build a query that requests the page URL and city of the top 10 active visitors with a traffic source type of CUSTOM located in the US you would use this URL: https://www.google.com/analytics/realtime/realtime/getData?key={{propertyID}}&pageId&q=t:10|2|:1|:10:6==CUSTOM;1==United%20States

在上面的“文档”之后,如果您想要构建一个查询,请求页面URL和前十位活动访问者的城市,在美国有一个流量源类型的定制,您将使用这个URL: https://www.google.com/analytics/realtime/realtime/getData?key={propertyID} &pageId&q=t:10|2|:1|:10:6==CUSTOM;1==United%20States。


Documentation

文档

Documentation 2

文档2


I hope that my answer is readable and (although it's a little late) sufficiently answers your question and helps others in the future.

我希望我的答案是可读的(虽然有点晚了),可以充分回答你的问题,并在将来帮助别人。

#1


9  

To get the same, Google has launched new Real Time API. With this API you can easily retrieve real time online visitors as well as several Google Analytics with following dimensions and metrics. https://developers.google.com/analytics/devguides/reporting/realtime/dimsmets/

为了得到相同的结果,谷歌发布了新的实时API。有了这个API,您可以轻松地检索实时在线访问者以及几个具有以下维度和度量的谷歌分析。https://developers.google.com/analytics/devguides/reporting/realtime/dimsmets/

This is quite similar to Google Analytics API. To start development on this, https://developers.google.com/analytics/devguides/reporting/realtime/v3/devguide

这与谷歌分析API非常相似。要在此基础上开始开发,https://developers.google.com/analytics/devguides/reporting/realtime/v3/devguide。

#2


6  

With Google Chrome I can see the data on the Network Panel.

通过谷歌Chrome,我可以看到网络面板上的数据。

The request endpoint is https://www.google.com/analytics/realtime/bind

请求端点是https://www.google.com/analytics/realtime/bind

Seems like the connection stays open for 2.5 minutes, and during this time it just keeps getting more and more data.

看起来连接会保持2.5分钟,在这段时间里,它会不断地获得更多的数据。

After about 2.5 minutes the connection is closed and a new one is open.

大约2.5分钟后,连接关闭,新的连接打开。

On the Network panel you can only see the data for the connections that are terminated. So leave it open for 5 minutes or so and you can start to see the data.

在网络面板上,您只能看到终止连接的数据。所以把它打开5分钟左右,你就可以看到数据了。

I hope that can give you a place to start.

我希望能给你一个开始的地方。

#3


6  

Having google in the loop seems pretty redundant. Suggest you use a common element delivered on demand from the dashboard server and include this item by absolute URL on all pages to be monitored for a given site. The script outputting the item can read the IP of the browser asking and these can all be logged into a database and filtered for uniqueness giving a real time head count.

在循环中使用谷歌似乎是相当多余的。建议您使用仪表板服务器根据需要交付的公共元素,并在要监视的给定站点的所有页面上包含该项目的绝对URL。输出条目的脚本可以读取浏览器请求的IP,这些都可以登录到数据库中,并进行筛选,以获得实时的人头计数。

<?php
$user_ip = $_SERVER["REMOTE_ADDR"];
/// Some MySQL to insert $user_ip to the database table for website XXX  goes here


$file = 'tracking_image.gif';
$type = 'image/gif';
header('Content-Type:'.$type);
header('Content-Length: ' . filesize($file));
readfile($file);
?>

Ammendum: A database can also add a timestamp to every row of data it stores. This can be used to further filter results and provide the number of visitors in the last hour or minute.

Ammendum:数据库还可以向它存储的每一行数据添加时间戳。这可以用于进一步筛选结果,并在最后一小时或一分钟内提供访问者的数量。

Client side Javascript with AJAX for fine tuning or overkill The onblur and onfocus javascript commands can be used to tell if the the page is visible, pass the data back to the dashboard server via Ajax. http://www.thefutureoftheweb.com/demo/2007-05-16-detect-browser-window-focus/

使用AJAX的客户端Javascript用于微调或超调onblur和onfocus Javascript命令可以用来判断页面是否可见,通过AJAX将数据传递回仪表板服务器。http://www.thefutureoftheweb.com/demo/2007-05-16-detect-browser-window-focus/

When a visitor closes a page this can also be detected by the javascript onunload function in the body tag and Ajax can be used to send data back to the server one last time before the browser finally closes the page.

当访问者关闭页面时,也可以通过javascript onunload函数在body标记中检测到,Ajax可以在浏览器最后关闭页面之前将数据发送回服务器。

As you may also wish to collect some information about the visitor like Google analytics does this page https://panopticlick.eff.org/ has a lot of javascript that can be examined and adapted.

因为您可能也希望收集一些关于访问者的信息,比如谷歌analytics,所以这个页面https://panopticlick.eff.org/有很多可以检查和调整的javascript。

#4


4  

I needed/wanted realtime data for personal use so I reverse-engineered their system a little bit.

我需要实时数据供个人使用,所以我对他们的系统进行了一点反向工程。

Instead of binding to /bind I get data from /getData (no pun intended).

我没有绑定到/bind,而是从/getData获取数据(没有双关语)。

At /getData the minimum request is apparently: https://www.google.com/analytics/realtime/realtime/getData?pageId&key={{propertyID}}&q=t:0|:1

在/getData,最小请求显然是:https://www.google.com/analytics/realtime/realtime/getdata?

Here's a short explanation of the possible query parameters and syntax, please remember that these are all guesses and I don't know all of them:

以下是对可能的查询参数和语法的简要解释,请记住,这些都是猜测,我不知道它们的全部:

Query Syntax: pageId&key=propertyID&q=dataType:dimensions|:page|:limit:filters

查询语法:pageId&key = propertyID&q =数据类型:维度|:|页:限制:过滤器

Values:

价值观:

pageID: Required but seems to only be used for internal analytics.

propertyID: a{{accountID}}w{{webPropertyID}}p{{profileID}}, as specified at the Documentation link below. You can also find this in the URL of all analytics pages in the UI.


dataType:
    t: Current data
    ot: Overtime/Past
    c: Unknown, returns only a "count" value


dimensions (| separated or alone), most values are only applicable for t:
    1:  Country
    2:  City
    3:  Location code?
    4:  Latitude
    5:  Longitude
    6:  Traffic source type (Social, Referral, etc.)
    7:  Source
    8:  ?? Returns (not set)
    9:  Another location code? longer.
    10: Page URL
    11: Visitor Type (new/returning)
    12: ?? Returns (not set)
    13: ?? Returns (not set)
    14: Medium
    15: ?? Returns "1"

page:
    At first this seems to work for pagination but after further analysis it looks like it's also used to specify which of the 6 pages (Overview, Locations, Traffic Sources, Content, Events and Conversions) to return data for.

    For some reason 0 returns an impossibly high metrictotal

limit: Result limit per page, maximum of 50

filters:
    Syntax is as specified at the Documentation 2 link below except the OR is specified using | instead of a comma.6==CUSTOM;1==United%20States


You can also combine multiple queries in one request by comma separating them (i.e. q=t:1|2|:1|:10,t:6|:1|:10).

您还可以在一个请求中通过逗号分隔多个查询(即q=t:1|2|:1|:10,t:6|:1|:10)。

Following the above "documentation", if you wanted to build a query that requests the page URL and city of the top 10 active visitors with a traffic source type of CUSTOM located in the US you would use this URL: https://www.google.com/analytics/realtime/realtime/getData?key={{propertyID}}&pageId&q=t:10|2|:1|:10:6==CUSTOM;1==United%20States

在上面的“文档”之后,如果您想要构建一个查询,请求页面URL和前十位活动访问者的城市,在美国有一个流量源类型的定制,您将使用这个URL: https://www.google.com/analytics/realtime/realtime/getData?key={propertyID} &pageId&q=t:10|2|:1|:10:6==CUSTOM;1==United%20States。


Documentation

文档

Documentation 2

文档2


I hope that my answer is readable and (although it's a little late) sufficiently answers your question and helps others in the future.

我希望我的答案是可读的(虽然有点晚了),可以充分回答你的问题,并在将来帮助别人。