I apologise if this has been asked before, but I can't seem to find an answer to a question that I have about calculating on the fly vs storing fields in a database.
如果之前有人问过我这个问题,我很抱歉,但我似乎找不到一个答案来回答我在动态计算和在数据库中存储字段时遇到的问题。
I read a few articles that suggested it was preferable to calculate when you can, but I would just like to know if that still applies to the following 2 examples.
我读了一些文章,建议最好在你能计算的时候计算,但是我想知道这是否仍然适用于以下两个例子。
Example 1. Say you are storing data relating to a car. You store the fuel tank size in litres, and how many litres it uses per 100km. You also want to know how many KMs it can travel, which can be calculated from the tank size and economy. I see 2 ways of doing this:
例1。假设你正在存储与汽车有关的数据。你将燃料箱的尺寸以升为单位存储,以及每100公里使用多少升。你还想知道它能跑多少公里,这可以从坦克的大小和经济性来计算。我看到了两种方法:
- When a car is added or updated, calculate the amount of KMs and store this as a static field in the database.
- 当汽车被添加或更新时,计算KMs的数量,并将其作为静态字段存储在数据库中。
- Every time a car is accessed, calculate the amount of KMs on the fly.
- 每次访问一辆汽车时,计算飞行中的KMs数量。
Because the cars economy/tank size doesn't change (although it could be edited), the KMs is a pretty static value. I don't see why we would calculate it every single time the car is accessed. Wouldn't this waste cpu time as opposed to simply storing it in a separate field in the database and calculating only when a car is added or updated?
由于汽车的经济性/坦克大小没有变化(尽管可以编辑),KMs是一个相当静态的值。我不明白我们为什么每次都要计算它。这难道不是在浪费cpu时间,而不是简单地将它存储在数据库中的单独字段中,只在添加或更新汽车时计算吗?
My next example, which is almost an entirely different question (but on the same topic), relates to counting children.
我的下一个例子,几乎是一个完全不同的问题(但是在同一个话题上),与数孩子有关。
Let's say we have a app which has categories and items. We have a view where we display all the categories, and a count of all the items inside each category. Again, I'm wondering what's better. To perform a MySQL query to count all the items in each category every single time the page is accessed? Or store the count in a field in the categories table and update when an item is added / deleted?
假设我们有一个有类别和项目的应用。我们有一个显示所有类别的视图,以及每个类别中所有项目的计数。再一次,我想知道什么更好。要执行一个MySQL查询,每次访问该页面时,都要计算每个类别中的所有项?或将计数存储在categories表中的字段中,并在添加/删除项时进行更新?
I know it is redundant to store anything that can be calculated, but I worry that calculating fields or counting records might be slow as opposed to storing the data in a field. If it's not then please let me know, I just want to learn about when to use either method. On a small scale I guess it wouldn't matter either way, but apps like Facebook, would they really count the amount of friends you have every time someone views your profile or would they just store it as a field?
我知道存储任何可以计算的东西都是多余的,但我担心计算字段或计数记录可能会比在字段中存储数据慢。如果不是,请告诉我,我只是想知道什么时候使用这两种方法。在小范围内,我猜这两种方式都没有关系,但是像Facebook这样的应用程序,它们真的会计算每次有人查看你的个人资料时你有多少朋友吗,还是仅仅把它作为一个字段来存储?
I'd appreciate any responses to both of these scenarios, and any resource that might explain the benefits of calculating vs storing.
对于这两种情况的任何响应,以及任何可能解释计算vs存储的好处的资源,我都很感激。
Thanks in advance,
提前谢谢,
Christian
基督教
4 个解决方案
#1
8
One thing to notice is the way you use your data. If several applications, or several layers of your application (maybe old code and new code in the same app) is accessing your data you'll reduce the risk of errors in computing by pre-calculating in the database. Then your calculated data will always be the same, no matter which application is requesting it.
需要注意的一件事是您使用数据的方式。如果多个应用程序或应用程序的多个层(可能是同一个应用程序中的旧代码和新代码)正在访问您的数据,那么通过在数据库中预计算,您将减少计算错误的风险。那么,无论哪个应用程序请求,您的计算数据总是相同的。
For your first example, there is no reason that someone someday will have to change the way your KMs will need to be computed. I would store it in database (via triggers or via PHP on the insert/update -- because MySQl triggers are... well they are... not as good as some other DB triggers).
对于您的第一个示例,没有理由某天某人必须更改计算KMs的方式。我将它存储在数据库中(通过触发器或在insert/update中通过PHP),因为MySQl触发器……他们是……不像其他DB触发器那么好)。
Now if we taking your second example it's really not sure someone will not want some day to add some filters on that categories computing. For example, take only children which are between 2 and 5. Then all your pre-computed results serves nothing. If you need some optimizations and caches of theses things it's maybe more an application-layer cache you would need, something like memcache, or pre-computed results stored in a cache table. But this cache is an application cache, which is related in a certain way on your application parameters (requests with different filters would use a different record in the cache).
现在,如果我们举第二个例子,它真的不确定是否有人不希望某天在类别计算中添加一些过滤器。例如,以2到5岁的独生子女为例。那么你所有的预先计算的结果都毫无用处。如果需要对这些东西进行一些优化和缓存,那么可能需要更多的应用程序层缓存,比如memcache,或者存储在缓存表中的预计算结果。但是这个缓存是一个应用程序缓存,它以某种方式与您的应用程序参数相关(具有不同过滤器的请求将在缓存中使用不同的记录)。
Note that with MySQl you've got as well a nice query cache which will prevent the same query to be computed too much.
请注意,在MySQl中,您还有一个很好的查询缓存,这将防止同样的查询被计算得太多。
#2
11
Introducing redundancy into the database is a valid means of optimization. As with all optimizations, don't do it unless you have confirmed that this is where the bottleneck actually is.
在数据库中引入冗余是一种有效的优化方法。与所有优化一样,不要这样做,除非您已经确认这是瓶颈所在。
#3
8
Others have touched on the technical aspects, so let me give you another viewpoint to consider:
其他人已经涉及到技术方面,所以让我给你另一个观点来考虑:
For every anomaly you introduce, you are making the development process slower.
对于您引入的每一个异常,您都在使开发过程变慢。
Denormalized data, aggregates, prejoined data etcetera are all examples of stuff that greatly complicates development, because you have to:
非规范化数据、聚合数据、预连接数据等等都是使开发变得复杂的例子,因为你必须:
- Keep rewriting the aggregation logic whenever you change the detailed tables
- 每当您更改详细的表时,请继续重写聚合逻辑
- Test more (and often seemingly unrelated parts of your application)
- 测试更多(通常看起来不相关的应用程序部分)
- Write more documentation
- 写更多的文档
- Complicates upgrades and patches
- 使升级和补丁
In many cases, it's worth it and in some cases absolutely necessary, but it would be very stupid to sacrifice development speed if you don't have to.
在许多情况下,这是值得的,在某些情况下是绝对必要的,但是如果您不需要牺牲开发速度,那将是非常愚蠢的。
#4
1
In both examples, the values you're talking about are static, and calculating static values is just a nonsense. Furthermore, if we assume that the tables are more queried than updated, calculating data is also a loss of performance.
在这两个示例中,您所讨论的值都是静态的,计算静态值简直是胡说八道。此外,如果我们假设表是查询的,而不是更新的,那么计算数据也会降低性能。
#1
8
One thing to notice is the way you use your data. If several applications, or several layers of your application (maybe old code and new code in the same app) is accessing your data you'll reduce the risk of errors in computing by pre-calculating in the database. Then your calculated data will always be the same, no matter which application is requesting it.
需要注意的一件事是您使用数据的方式。如果多个应用程序或应用程序的多个层(可能是同一个应用程序中的旧代码和新代码)正在访问您的数据,那么通过在数据库中预计算,您将减少计算错误的风险。那么,无论哪个应用程序请求,您的计算数据总是相同的。
For your first example, there is no reason that someone someday will have to change the way your KMs will need to be computed. I would store it in database (via triggers or via PHP on the insert/update -- because MySQl triggers are... well they are... not as good as some other DB triggers).
对于您的第一个示例,没有理由某天某人必须更改计算KMs的方式。我将它存储在数据库中(通过触发器或在insert/update中通过PHP),因为MySQl触发器……他们是……不像其他DB触发器那么好)。
Now if we taking your second example it's really not sure someone will not want some day to add some filters on that categories computing. For example, take only children which are between 2 and 5. Then all your pre-computed results serves nothing. If you need some optimizations and caches of theses things it's maybe more an application-layer cache you would need, something like memcache, or pre-computed results stored in a cache table. But this cache is an application cache, which is related in a certain way on your application parameters (requests with different filters would use a different record in the cache).
现在,如果我们举第二个例子,它真的不确定是否有人不希望某天在类别计算中添加一些过滤器。例如,以2到5岁的独生子女为例。那么你所有的预先计算的结果都毫无用处。如果需要对这些东西进行一些优化和缓存,那么可能需要更多的应用程序层缓存,比如memcache,或者存储在缓存表中的预计算结果。但是这个缓存是一个应用程序缓存,它以某种方式与您的应用程序参数相关(具有不同过滤器的请求将在缓存中使用不同的记录)。
Note that with MySQl you've got as well a nice query cache which will prevent the same query to be computed too much.
请注意,在MySQl中,您还有一个很好的查询缓存,这将防止同样的查询被计算得太多。
#2
11
Introducing redundancy into the database is a valid means of optimization. As with all optimizations, don't do it unless you have confirmed that this is where the bottleneck actually is.
在数据库中引入冗余是一种有效的优化方法。与所有优化一样,不要这样做,除非您已经确认这是瓶颈所在。
#3
8
Others have touched on the technical aspects, so let me give you another viewpoint to consider:
其他人已经涉及到技术方面,所以让我给你另一个观点来考虑:
For every anomaly you introduce, you are making the development process slower.
对于您引入的每一个异常,您都在使开发过程变慢。
Denormalized data, aggregates, prejoined data etcetera are all examples of stuff that greatly complicates development, because you have to:
非规范化数据、聚合数据、预连接数据等等都是使开发变得复杂的例子,因为你必须:
- Keep rewriting the aggregation logic whenever you change the detailed tables
- 每当您更改详细的表时,请继续重写聚合逻辑
- Test more (and often seemingly unrelated parts of your application)
- 测试更多(通常看起来不相关的应用程序部分)
- Write more documentation
- 写更多的文档
- Complicates upgrades and patches
- 使升级和补丁
In many cases, it's worth it and in some cases absolutely necessary, but it would be very stupid to sacrifice development speed if you don't have to.
在许多情况下,这是值得的,在某些情况下是绝对必要的,但是如果您不需要牺牲开发速度,那将是非常愚蠢的。
#4
1
In both examples, the values you're talking about are static, and calculating static values is just a nonsense. Furthermore, if we assume that the tables are more queried than updated, calculating data is also a loss of performance.
在这两个示例中,您所讨论的值都是静态的,计算静态值简直是胡说八道。此外,如果我们假设表是查询的,而不是更新的,那么计算数据也会降低性能。