Take my profile for example, or any question number of views on this site, what is the process of logging the number of visits per page or object on a website, which I presumably think includes:
以我的个人资料为例,或者在本网站上查看任意数量的观点,记录网站上每页或对象的访问次数的过程是什么,我认为这包括:
- Counting registered users once (this must be reflected in the db, which pages / objects the user has visited). this will also not include unregistered users
- IP: log the visit of each IP per page / object; this could be troublesome as you might have 2 different people checking the same website; or you really do want to track repeat visitors.
- Cookie: this will probably result in that people with multiple computers would be counted twice
- other method goes here ....
对注册用户进行一次计数(这必须反映在数据库中,用户访问过哪些页面/对象)。这也不包括未注册的用户
IP:记录每页/每个IP的访问量;这可能很麻烦,因为你可能有2个不同的人检查同一个网站;或者你真的想跟踪重复访客。
Cookie:这可能会导致拥有多台计算机的人被计算两次
其他方法在这里....
The question is, what is the process and best practice to count user requests?
问题是,计算用户请求的过程和最佳实践是什么?
EDIT
I've added the computer languages to the list of tags as they are of interest to me. Feel free to include any libraries, modules, and/or extensions that achieve the task.
我已将计算机语言添加到标签列表中,因为它们对我很感兴趣。随意包含任何完成任务的库,模块和/或扩展。
The question could be rephrased into:
这个问题可以改为:
- How does someone go about measuring the number of imprints when a user goes on a page? The question is not intended to be similar to what Google analytics does, rather it should be something similar to when you click on a * question or profile and see the number of views.
当用户进入页面时,有人如何测量印记的数量?这个问题并不打算与Google分析所做的类似,而是应该类似于点击*问题或个人资料并查看观看次数时的问题。
6 个解决方案
#1
17
The "correct" answer varies according to the situation; primarily the most desired statistic and the availability of resources to gather and process them: eg:
“正确”答案因情况而异;主要是最理想的统计数据和收集和处理它们的资源的可用性:例如:
Server Side
Raw web server logs
All webservers have some facility to log requests. The trouble with them is that it requires a lot of processing to get meaningful data out and, for your example scenario, they won't record application specific details; like whether or not the request was associated with a registered user.
所有Web服务器都有一些记录请求的工具。它们的问题在于它需要大量处理才能获得有意义的数据,对于您的示例场景,它们不会记录特定于应用程序的细节;喜欢请求是否与注册用户相关联。
This option won't work for what you're interested in.
此选项不适用于您感兴趣的内容。
File based application logs
The application programmer can apply custom code to the application to record the stuff you're most interested in to a log file. This is similiar to the webserver log; except that it can be application aware and record things like the member making the request.
应用程序员可以将自定义代码应用于应用程序,以将您最感兴趣的内容记录到日志文件中。这类似于webserver日志;除了它可以是应用程序感知并记录像发出请求的成员之类的东西。
The programmers may also need to build scripts which extract from these logs the stuff you're most interested. This option might be suited to a high traffic site with lots of disk space and sysadmins who know how to ensure the logs get rotated and pruned from the production servers before bad things happen.
程序员可能还需要构建脚本,从这些日志中提取您最感兴趣的内容。此选项可能适用于具有大量磁盘空间的高流量站点和系统管理员,他们知道如何确保日志在发生错误之前从生产服务器进行轮换和修剪。
Database based application logs
The application programmer can write custom code for the application which records every request in a database. This makes it relatively easy to run reports and makes the data instantly accessible. This solution incurs more system overhead at the time of each request so better suited to lesser traffic sites, or scenarios where the data is highly valued.
应用程序程序员可以为应用程序编写自定义代码,该代码记录数据库中的每个请求。这使得运行报告相对容易,并使数据可以立即访问。该解决方案在每个请求时产生更多的系统开销,因此更适合于较少的流量站点,或者数据被高度重视的场景。
Client Side
Javascript postback
This is a consideration on top of the above options. Google analytics does this.
这是上述选项之外的考虑因素。谷歌分析就是这样做的。
Each page includes some javascript code which tells the client to report back to the webserver that the page was viewed. The data might be recorded in a database, or written to file.
每个页面都包含一些javascript代码,告诉客户端向Web服务器报告该页面已被查看。数据可能记录在数据库中,也可能写入文件。
Has an strong advantage of improving accuracy in scenarios where impressions get lost due to heavy caching/proxying between the client and server.
在由于客户端和服务器之间的高速缓存/代理而导致印象丢失的情况下,具有提高准确性的强大优势。
Cookies
Every time a request is received from someone who doesn't present a cookie then you assume they're new and record that hit as 'anonymous' and return a uniquely identifying cookie after they login. It depends on your application as to how accurate this proves. Some applications don't lend themselves to caching so it will be quite accurate; others (high traffic) encourage caching which will reduce the accuracy. Obviously it's not much use till they re-authenticate whenever they switch browsers/location.
每次收到来自未提供cookie的人的请求时,您都会认为他们是新的并且记录为“匿名”,并在登录后返回唯一标识的cookie。这取决于您的应用程序,证明这是多么准确。有些应用程序不适合缓存,所以它会非常准确;其他人(高流量)鼓励缓存,这将降低准确性。显然,只要他们在切换浏览器/位置时重新进行身份验证,它就没有多大用处。
What's most interesting to you?
Then there's the question of what statistics are important to you. For example, in some situations you're keen to know:
那么问题是统计数据对你很重要。例如,在某些情况下,您很想知道:
- how many times a page was viewed, period,
- how many times a page was viewed, by a known user
- how many of your known users have viewed a specific page
查看页面的次数,期间,
已知用户查看页面的次数
有多少已知用户查看过特定页面
Thence you typically want to break it down into periods of time to see trending. Respectively:
因此,您通常希望将其分解为一段时间才能看到趋势。分别:
- are we getting more views from random people?
- or we getting more views from registered users?
- or has pretty much every one who is going to see the page now seen it?
我们从随机的人那里得到更多的观点吗?
或者我们从注册用户那里获得更多观点?
或几乎每个人都会看到现在看到的页面?
So back to your question: best practice for "number of imprints when a user goes on a page"?
回到你的问题:“用户进入页面时的印记数”的最佳实践?
It depends on your application.
这取决于您的应用程序。
My guess is that you're best off with a database backed application which records what is most interesting to your application and uses cookies to trace the member's sessions.
我的猜测是,您最好使用数据库支持的应用程序,该应用程序记录您的应用程序最有趣的内容,并使用cookie来跟踪成员的会话。
#2
4
The best practice for a hit counter depends on how much traffic you expect your site to receive. As wybiral suggested, you can implement something that writes to a database after every request. This might include the IP address if you want to count unique visitors, or it could be a simple as just incrementing a running total for each page or for each (page, user) pair.
点击计数器的最佳做法取决于您希望您的网站获得多少流量。正如wybiral建议的那样,您可以在每次请求后实现写入数据库的内容。如果您想要计算唯一身份访问者,这可能包括IP地址,或者只是增加每个页面或每个(页面,用户)对的运行总计可能很简单。
But that requires a database write for every request, even if you just want to serve a static page. Ideally speaking, a scalable web app should serve as much as possible from an in-memory cache. Database or disk I/O should be avoided as much as possible.
但是,即使您只想提供静态页面,也需要为每个请求编写数据库。理想情况下,可扩展的Web应用程序应尽可能地从内存缓存中提供服务。应尽可能避免使用数据库或磁盘I / O.
So the ideal set up would be to build up some representation of the server's activity in-memory and then occasionally (say every 15 minutes) write those events to the database. You could conceivably queue up thousands of requests and then store them with a single database write.
所以理想的设置是在内存中建立服务器活动的一些表示,然后偶尔(比如每15分钟)将这些事件写入数据库。您可以想象排队数千个请求,然后使用单个数据库写入存储它们。
There's a tutorial describing how to do exactly this in python using Celery and Carrot: http://packages.python.org/celery/tutorials/clickcounter.html. It also includes some examples of how to set up your database tables using Django models and what code to call whenever someone accesses a page.
有一个教程描述如何使用Celery和Carrot在python中完成这个:http://packages.python.org/celery/tutorials/clickcounter.html。它还包括一些如何使用Django模型设置数据库表的示例,以及每当有人访问页面时要调用的代码。
This tutorial will certainly be helpful to you regardless of what you choose to implement, although this level of architecture might be overkill if you don't expect thousands of hits each hour.
无论您选择实施什么,本教程肯定对您有所帮助,尽管如果您不希望每小时有数千次点击,这种级别的体系结构可能会过度。
#3
1
Use a database to keep a record of the unique IPs (if the IP doesn't exist in the DB, create it, otherwise continue as planned) and then query the database for the number of those entities. Index this with IP and URL to store views for individual pages. You wont have to worry about tracking registered users this way, they will be totaled into the unique IP count. As far as multiple people from one IP, there's not much you can do there short of requiring an account and counting user->to->page-views similarly.
使用数据库保留唯一IP的记录(如果数据库中不存在IP,创建它,否则按计划继续),然后在数据库中查询这些实体的数量。使用IP和URL对其进行索引,以存储各个页面的视图。您不必担心以这种方式跟踪注册用户,它们将被计入唯一的IP计数。对于来自一个IP的多个人,除了需要帐户和计算用户 - >到 - >页面视图之外,没有太多可以做的事情。
#4
1
I would suggest using a persistent key/value store like Redis. If you use a list with the list key being the serialized identifier, you can store other serialized entries and use llen to find the list size.
我建议使用像Redis这样的持久键/值存储。如果使用列表键为序列化标识符的列表,则可以存储其他序列化条目并使用llen查找列表大小。
Example (python) after initializing your Redis store:
初始化Redis存储后的示例(python):
def intializeAndPush(serializedKey, serializedValue):
if not redisStore.exists(serializedKey):
redisStore.push(serializedKey, serializedValue)
else:
if serializedValue not in redisStore.lrange(serializedKey, 0, -1):
redisStore.push(serializedKey, serializedValue)
def getSizeOf(serializedKey):
if redisStore.exists(serializedKey):
return redisStore.llen(serializedKey)
else:
return 0
Using this technique, you can use anything as serializedKey or serializedValue. If you want to store IPs with today's date or serialized login information, both are just as simple. Also, only unique serializedValues are stored since writes are locked on read (at least as I recall).
使用此技术,您可以使用serializedKey或serializedValue。如果您想存储具有今天日期或序列化登录信息的IP,两者都同样简单。此外,由于写入在读取时被锁定(至少我记得),因此只存储唯一的serializedValues。
#5
0
I will try and implement pixel tracking to track views on your page/object. This method is used by google (google analytics) and other high profile media companies.
我将尝试实现像素跟踪以跟踪页面/对象上的视图。谷歌(谷歌分析)和其他知名媒体公司使用此方法。
#6
0
Pixel tracking will be fine, since you can have point the trackingpixel to a HttpHandler specific for that purpose. That way you can seperate the load and even use some kind of queue for highload scenarios.
像素跟踪会很好,因为您可以将跟踪像素指向特定于此目的的HttpHandler。这样你就可以分离负载,甚至可以使用某种队列来处理高负载情况。
Also, you can incorporate user specific information in the tracking pixel such as WHO has visited the page.
此外,您可以将用户特定信息合并到跟踪像素中,例如WHO访问过该页面。
eg:
<a href="fakeimages/imba.gif?uid=123&info2=a&info3=b" style="height:1px;width:1px;" />
Then you need to handle the request going to fakeimages/*.gif with a specific HttpHandler / php redirect/controller (whatever language you're using) and process the infos.
然后你需要使用特定的HttpHandler / php重定向/控制器(无论你使用什么语言)处理伪造/ * .gif的请求并处理信息。
regards
#1
17
The "correct" answer varies according to the situation; primarily the most desired statistic and the availability of resources to gather and process them: eg:
“正确”答案因情况而异;主要是最理想的统计数据和收集和处理它们的资源的可用性:例如:
Server Side
Raw web server logs
All webservers have some facility to log requests. The trouble with them is that it requires a lot of processing to get meaningful data out and, for your example scenario, they won't record application specific details; like whether or not the request was associated with a registered user.
所有Web服务器都有一些记录请求的工具。它们的问题在于它需要大量处理才能获得有意义的数据,对于您的示例场景,它们不会记录特定于应用程序的细节;喜欢请求是否与注册用户相关联。
This option won't work for what you're interested in.
此选项不适用于您感兴趣的内容。
File based application logs
The application programmer can apply custom code to the application to record the stuff you're most interested in to a log file. This is similiar to the webserver log; except that it can be application aware and record things like the member making the request.
应用程序员可以将自定义代码应用于应用程序,以将您最感兴趣的内容记录到日志文件中。这类似于webserver日志;除了它可以是应用程序感知并记录像发出请求的成员之类的东西。
The programmers may also need to build scripts which extract from these logs the stuff you're most interested. This option might be suited to a high traffic site with lots of disk space and sysadmins who know how to ensure the logs get rotated and pruned from the production servers before bad things happen.
程序员可能还需要构建脚本,从这些日志中提取您最感兴趣的内容。此选项可能适用于具有大量磁盘空间的高流量站点和系统管理员,他们知道如何确保日志在发生错误之前从生产服务器进行轮换和修剪。
Database based application logs
The application programmer can write custom code for the application which records every request in a database. This makes it relatively easy to run reports and makes the data instantly accessible. This solution incurs more system overhead at the time of each request so better suited to lesser traffic sites, or scenarios where the data is highly valued.
应用程序程序员可以为应用程序编写自定义代码,该代码记录数据库中的每个请求。这使得运行报告相对容易,并使数据可以立即访问。该解决方案在每个请求时产生更多的系统开销,因此更适合于较少的流量站点,或者数据被高度重视的场景。
Client Side
Javascript postback
This is a consideration on top of the above options. Google analytics does this.
这是上述选项之外的考虑因素。谷歌分析就是这样做的。
Each page includes some javascript code which tells the client to report back to the webserver that the page was viewed. The data might be recorded in a database, or written to file.
每个页面都包含一些javascript代码,告诉客户端向Web服务器报告该页面已被查看。数据可能记录在数据库中,也可能写入文件。
Has an strong advantage of improving accuracy in scenarios where impressions get lost due to heavy caching/proxying between the client and server.
在由于客户端和服务器之间的高速缓存/代理而导致印象丢失的情况下,具有提高准确性的强大优势。
Cookies
Every time a request is received from someone who doesn't present a cookie then you assume they're new and record that hit as 'anonymous' and return a uniquely identifying cookie after they login. It depends on your application as to how accurate this proves. Some applications don't lend themselves to caching so it will be quite accurate; others (high traffic) encourage caching which will reduce the accuracy. Obviously it's not much use till they re-authenticate whenever they switch browsers/location.
每次收到来自未提供cookie的人的请求时,您都会认为他们是新的并且记录为“匿名”,并在登录后返回唯一标识的cookie。这取决于您的应用程序,证明这是多么准确。有些应用程序不适合缓存,所以它会非常准确;其他人(高流量)鼓励缓存,这将降低准确性。显然,只要他们在切换浏览器/位置时重新进行身份验证,它就没有多大用处。
What's most interesting to you?
Then there's the question of what statistics are important to you. For example, in some situations you're keen to know:
那么问题是统计数据对你很重要。例如,在某些情况下,您很想知道:
- how many times a page was viewed, period,
- how many times a page was viewed, by a known user
- how many of your known users have viewed a specific page
查看页面的次数,期间,
已知用户查看页面的次数
有多少已知用户查看过特定页面
Thence you typically want to break it down into periods of time to see trending. Respectively:
因此,您通常希望将其分解为一段时间才能看到趋势。分别:
- are we getting more views from random people?
- or we getting more views from registered users?
- or has pretty much every one who is going to see the page now seen it?
我们从随机的人那里得到更多的观点吗?
或者我们从注册用户那里获得更多观点?
或几乎每个人都会看到现在看到的页面?
So back to your question: best practice for "number of imprints when a user goes on a page"?
回到你的问题:“用户进入页面时的印记数”的最佳实践?
It depends on your application.
这取决于您的应用程序。
My guess is that you're best off with a database backed application which records what is most interesting to your application and uses cookies to trace the member's sessions.
我的猜测是,您最好使用数据库支持的应用程序,该应用程序记录您的应用程序最有趣的内容,并使用cookie来跟踪成员的会话。
#2
4
The best practice for a hit counter depends on how much traffic you expect your site to receive. As wybiral suggested, you can implement something that writes to a database after every request. This might include the IP address if you want to count unique visitors, or it could be a simple as just incrementing a running total for each page or for each (page, user) pair.
点击计数器的最佳做法取决于您希望您的网站获得多少流量。正如wybiral建议的那样,您可以在每次请求后实现写入数据库的内容。如果您想要计算唯一身份访问者,这可能包括IP地址,或者只是增加每个页面或每个(页面,用户)对的运行总计可能很简单。
But that requires a database write for every request, even if you just want to serve a static page. Ideally speaking, a scalable web app should serve as much as possible from an in-memory cache. Database or disk I/O should be avoided as much as possible.
但是,即使您只想提供静态页面,也需要为每个请求编写数据库。理想情况下,可扩展的Web应用程序应尽可能地从内存缓存中提供服务。应尽可能避免使用数据库或磁盘I / O.
So the ideal set up would be to build up some representation of the server's activity in-memory and then occasionally (say every 15 minutes) write those events to the database. You could conceivably queue up thousands of requests and then store them with a single database write.
所以理想的设置是在内存中建立服务器活动的一些表示,然后偶尔(比如每15分钟)将这些事件写入数据库。您可以想象排队数千个请求,然后使用单个数据库写入存储它们。
There's a tutorial describing how to do exactly this in python using Celery and Carrot: http://packages.python.org/celery/tutorials/clickcounter.html. It also includes some examples of how to set up your database tables using Django models and what code to call whenever someone accesses a page.
有一个教程描述如何使用Celery和Carrot在python中完成这个:http://packages.python.org/celery/tutorials/clickcounter.html。它还包括一些如何使用Django模型设置数据库表的示例,以及每当有人访问页面时要调用的代码。
This tutorial will certainly be helpful to you regardless of what you choose to implement, although this level of architecture might be overkill if you don't expect thousands of hits each hour.
无论您选择实施什么,本教程肯定对您有所帮助,尽管如果您不希望每小时有数千次点击,这种级别的体系结构可能会过度。
#3
1
Use a database to keep a record of the unique IPs (if the IP doesn't exist in the DB, create it, otherwise continue as planned) and then query the database for the number of those entities. Index this with IP and URL to store views for individual pages. You wont have to worry about tracking registered users this way, they will be totaled into the unique IP count. As far as multiple people from one IP, there's not much you can do there short of requiring an account and counting user->to->page-views similarly.
使用数据库保留唯一IP的记录(如果数据库中不存在IP,创建它,否则按计划继续),然后在数据库中查询这些实体的数量。使用IP和URL对其进行索引,以存储各个页面的视图。您不必担心以这种方式跟踪注册用户,它们将被计入唯一的IP计数。对于来自一个IP的多个人,除了需要帐户和计算用户 - >到 - >页面视图之外,没有太多可以做的事情。
#4
1
I would suggest using a persistent key/value store like Redis. If you use a list with the list key being the serialized identifier, you can store other serialized entries and use llen to find the list size.
我建议使用像Redis这样的持久键/值存储。如果使用列表键为序列化标识符的列表,则可以存储其他序列化条目并使用llen查找列表大小。
Example (python) after initializing your Redis store:
初始化Redis存储后的示例(python):
def intializeAndPush(serializedKey, serializedValue):
if not redisStore.exists(serializedKey):
redisStore.push(serializedKey, serializedValue)
else:
if serializedValue not in redisStore.lrange(serializedKey, 0, -1):
redisStore.push(serializedKey, serializedValue)
def getSizeOf(serializedKey):
if redisStore.exists(serializedKey):
return redisStore.llen(serializedKey)
else:
return 0
Using this technique, you can use anything as serializedKey or serializedValue. If you want to store IPs with today's date or serialized login information, both are just as simple. Also, only unique serializedValues are stored since writes are locked on read (at least as I recall).
使用此技术,您可以使用serializedKey或serializedValue。如果您想存储具有今天日期或序列化登录信息的IP,两者都同样简单。此外,由于写入在读取时被锁定(至少我记得),因此只存储唯一的serializedValues。
#5
0
I will try and implement pixel tracking to track views on your page/object. This method is used by google (google analytics) and other high profile media companies.
我将尝试实现像素跟踪以跟踪页面/对象上的视图。谷歌(谷歌分析)和其他知名媒体公司使用此方法。
#6
0
Pixel tracking will be fine, since you can have point the trackingpixel to a HttpHandler specific for that purpose. That way you can seperate the load and even use some kind of queue for highload scenarios.
像素跟踪会很好,因为您可以将跟踪像素指向特定于此目的的HttpHandler。这样你就可以分离负载,甚至可以使用某种队列来处理高负载情况。
Also, you can incorporate user specific information in the tracking pixel such as WHO has visited the page.
此外,您可以将用户特定信息合并到跟踪像素中,例如WHO访问过该页面。
eg:
<a href="fakeimages/imba.gif?uid=123&info2=a&info3=b" style="height:1px;width:1px;" />
Then you need to handle the request going to fakeimages/*.gif with a specific HttpHandler / php redirect/controller (whatever language you're using) and process the infos.
然后你需要使用特定的HttpHandler / php重定向/控制器(无论你使用什么语言)处理伪造/ * .gif的请求并处理信息。
regards