捕获Django中的隐含感兴趣信号

时间:2022-04-07 20:29:43

To set the background: I'm interested in:

设置背景:我对以下内容感兴趣:

  • Capturing implicit signals of interest in books as users browse around a site. The site is written in django (python) using mysql, memcached, ngnix, and apache
  • 当用户浏览网站时,捕获书籍中感兴趣的隐含信号。该站点使用mysql,memcached,ngnix和apache在django(python)中编写

Let's say, for instance, my site sells books. As a user browses around my site I'd like to keep track of which books they've viewed, and how many times they've viewed them.

比方说,我的网站出售书籍。当用户浏览我的网站时,我想跟踪他们浏览过的书籍,以及他们浏览过多少次。

Not that I'd store the data this way, but ideally I could have on-the-fly access to a structure like:

并不是说我会以这种方式存储数据,但理想情况下我可以实时访问以下结构:

{user_id : {book_id: number_of_views, book_id_2: number_of_views}}

I realize there are a few approaches here:

我意识到这里有一些方法:

  • Some flat-file log
  • 一些平面文件日志

  • Writing an object to a database every time
  • 每次都将对象写入数据库

  • Writing to an object in memcached
  • 写入memcached中的对象

I don't really know the performance implications, but I'd rather not be writing to a database on every single page view, and the lag writing to a log and computing the structure later seems not quick enough to give good recommendations on-the-fly as you use the site, and the memcached appraoch seems fine, but there's a cost in keeping this obj in memory: you might lose it, and it never gets written somewhere 'permanent'.

我真的不知道性能影响,但我宁愿不在每个页面视图上写入数据库,并且写入日志和计算结构的延迟似乎不够快,无法提供良好的建议。 -fly当你使用这个网站时,memcached appraoch似乎很好,但是将这个obj保留在内存中是有代价的:你可能会失去它,并且它永远不会写在“永久”的某个地方。

What approach would you suggest? (doesn't have to be one of the above) Thanks!

你会建议什么方法? (不一定是上述之一)谢谢!

4 个解决方案

#1


If this data is not an unimportant statistic that might or might not be available I'd suggest taking the simple approach and using a model. It will surely hit the database everytime.

如果这些数据不是一个可能或可能不可用的不重要统计数据,我建议采用简单的方法并使用模型。它肯定会每次都打到数据库。

Unless you are absolutely positively sure these queries are actually degrading overall experience there is no need to worry about it. Even if you optimize this one, there's a good chance other unexpected queries are wasting more CPU time. I assume you wouldn't be asking this question if you were testing all other queries. So why risk premature optimization on this one?

除非您绝对肯定这些查询实际上会降低整体体验,否则无需担心。即使您优化了这一点,其他意外查询也很有可能浪费更多的CPU时间。如果您正在测试所有其他查询,我假设您不会问这个问题。那么为什么要冒这个风险过早优化呢?

An advantage of the model approach would be having an API in place. When you have tested and decided to optimize you can keep this API and change the underlying model with something else (which will most probably be more complex than a model).

模型方法的一个优点是具有适当的API。当您测试并决定优化后,您可以保留此API并使用其他内容更改基础模型(这很可能比模型更复杂)。

I'd definitely go with a model first and see how it performs. (and also how other parts of the project perform)

我肯定会首先使用模型,看看它是如何表现的。 (以及项目的其他部分如何执行)

#2


What approach would you suggest? (doesn't have to be one of the above) Thanks!

你会建议什么方法? (不一定是上述之一)谢谢!

hmmmm ...this like been in a four walled room with only one door and saying i want to get out of room but not through the only door...

嗯...这就像是在一个只有一扇门的四面墙房间里说我想离开房间而不是通过唯一的门......

There was an article i was reading sometime back (can't get the link now) that says memcache can handle huge (facebook uses it) sets of data in memory with very little degradation in performance...my advice is you will need to explore more on memcache, i think it will do the trick.

有一篇文章我正在阅读(现在无法获得链接),说memcache可以处理巨大的(facebook使用它)内存中的数据集,性能几乎没有降低...我的建议是你需要在memcache上探索更多内容,我认为它可以解决问题。

#3


Either a document datastore (mongo/couchdb), or a persistent key value store (tokyodb, memcachedb etc) may be explored.

可以探索文档数据存储(mongo / couchdb)或持久键值存储(tokyodb,memcachedb等)。

No definite recommendations from me as the final solution depends on multiple factors - load, your willingness to learn/deploy a new technology, size of the data...

作为最终解决方案,我没有明确的建议取决于多种因素 - 负载,您学习/部署新技术的意愿,数据的大小......

#4


Seems to me that one approach could be to use memcached to keep the counter, but have a cron running regularly to store the value from memcached to the db or disk. That way you'd get all the performance of memcached, but in the case of a crash you wouldn't lose more than a couple of minutes' data.

在我看来,一种方法可能是使用memcached来保留计数器,但是有一个cron定期运行以将值从memcached存储到db或disk。这样你就可以获得memcached的所有性能,但是在崩溃的情况下,你不会丢失超过几分钟的数据。

#1


If this data is not an unimportant statistic that might or might not be available I'd suggest taking the simple approach and using a model. It will surely hit the database everytime.

如果这些数据不是一个可能或可能不可用的不重要统计数据,我建议采用简单的方法并使用模型。它肯定会每次都打到数据库。

Unless you are absolutely positively sure these queries are actually degrading overall experience there is no need to worry about it. Even if you optimize this one, there's a good chance other unexpected queries are wasting more CPU time. I assume you wouldn't be asking this question if you were testing all other queries. So why risk premature optimization on this one?

除非您绝对肯定这些查询实际上会降低整体体验,否则无需担心。即使您优化了这一点,其他意外查询也很有可能浪费更多的CPU时间。如果您正在测试所有其他查询,我假设您不会问这个问题。那么为什么要冒这个风险过早优化呢?

An advantage of the model approach would be having an API in place. When you have tested and decided to optimize you can keep this API and change the underlying model with something else (which will most probably be more complex than a model).

模型方法的一个优点是具有适当的API。当您测试并决定优化后,您可以保留此API并使用其他内容更改基础模型(这很可能比模型更复杂)。

I'd definitely go with a model first and see how it performs. (and also how other parts of the project perform)

我肯定会首先使用模型,看看它是如何表现的。 (以及项目的其他部分如何执行)

#2


What approach would you suggest? (doesn't have to be one of the above) Thanks!

你会建议什么方法? (不一定是上述之一)谢谢!

hmmmm ...this like been in a four walled room with only one door and saying i want to get out of room but not through the only door...

嗯...这就像是在一个只有一扇门的四面墙房间里说我想离开房间而不是通过唯一的门......

There was an article i was reading sometime back (can't get the link now) that says memcache can handle huge (facebook uses it) sets of data in memory with very little degradation in performance...my advice is you will need to explore more on memcache, i think it will do the trick.

有一篇文章我正在阅读(现在无法获得链接),说memcache可以处理巨大的(facebook使用它)内存中的数据集,性能几乎没有降低...我的建议是你需要在memcache上探索更多内容,我认为它可以解决问题。

#3


Either a document datastore (mongo/couchdb), or a persistent key value store (tokyodb, memcachedb etc) may be explored.

可以探索文档数据存储(mongo / couchdb)或持久键值存储(tokyodb,memcachedb等)。

No definite recommendations from me as the final solution depends on multiple factors - load, your willingness to learn/deploy a new technology, size of the data...

作为最终解决方案,我没有明确的建议取决于多种因素 - 负载,您学习/部署新技术的意愿,数据的大小......

#4


Seems to me that one approach could be to use memcached to keep the counter, but have a cron running regularly to store the value from memcached to the db or disk. That way you'd get all the performance of memcached, but in the case of a crash you wouldn't lose more than a couple of minutes' data.

在我看来,一种方法可能是使用memcached来保留计数器,但是有一个cron定期运行以将值从memcached存储到db或disk。这样你就可以获得memcached的所有性能,但是在崩溃的情况下,你不会丢失超过几分钟的数据。