I'm dealing with a database for about 25,000 users who add about 6 rows on average every day (employees keeping logs for work orders). Basically the database grows indefinite and contains millions of rows (to divide among these 25,000 users).
我正在处理一个大约25,000个用户的数据库,每天平均增加约6行(员工保留工作订单的日志)。基本上,数据库不确定地增长并包含数百万行(在这25,000个用户之间划分)。
After a user logs in, I would like the system to display some of their totals such as miles driven in truck number xyz for their entire work career, total time worked on order item xyz and so on. Basically, every time a user logs in, these totals need to be present instantly. In addition, once a user adds a row for a work order, the totals need to reflect this change instantly.
在用户登录后,我希望系统显示一些总计,例如在整个工作生涯中用卡车编号xyz驱动的里程,在订单项xyz上工作的总时间等等。基本上,每次用户登录时,都需要立即显示这些总数。此外,一旦用户为工单添加行,总计需要立即反映此更改。
Is it advised to build a totals table per user that gets updated with every entry. Or should I just query the database and have it calculate the total on the fly each time a user logs in (no total tables). Would that however create a bottleneck if users log in every second and the database needs to spit out a total based on millions of rows? How does google do it? :)
是否建议为每个用户构建一个总计表,并使用每个条目进行更新。或者我应该只查询数据库并让它在每次用户登录时动态计算总数(没有总表)。如果用户每秒登录并且数据库需要根据数百万行吐出总数,那么会产生瓶颈吗?谷歌如何做到这一点? :)
Thanks.
4 个解决方案
#1
5
You might find that a simple query is fast enough with an appropiate index (e.g. index user_id
). This should reduce the number of rows that need to be scanned.
您可能会发现使用适当的索引(例如索引user_id),一个简单的查询就足够快了。这应该减少需要扫描的行数。
But if this is not fast enough, you could calculate the result for all users overnight, and cache this result in another table. You can then do the following:
但如果这还不够快,您可以在一夜之间计算所有用户的结果,并将此结果缓存到另一个表中。然后,您可以执行以下操作:
- Get the total up to the last cache update directly from the cache table.
- Get the total since the last cache update from the main table.
- Add these two numbers to get the overall total.
直接从缓存表获取最后一次缓存更新的总计。
从主表获取自上次缓存更新以来的总计。
添加这两个数字以获得总数。
Another option is to use triggers to keep the pre-calculated result accurate, even when rows are inserted, updated or deleted.
另一种选择是使用触发器来保持预先计算的结果准确,即使插入,更新或删除行也是如此。
#2
0
Rather than do a join a the million row table, i think you can create a summary table. it can be populated running a cron at night for example.
我认为你可以创建一个汇总表,而不是加入百万行表。例如,它可以在晚上运行一个cron。
#3
0
If you want it "instant", then stay away from keeping the totals in tables as then you have to worry about updating them through some process every time the underlying data changes.
如果你想要它“即时”,那么就不要把总数保留在表中,那么每次基础数据发生变化时你都要担心通过某个过程更新它们。
As long as your indexes are good, and you have some decent hardware then I don't see a problem with querying for these totals every time.
只要你的索引很好,并且你有一些不错的硬件,那么每次查询这些总数时我都没有看到问题。
As far as Google, they have lots and lots of servers, basically keep their entire index in RAM, and have virtually unlimited computing power.
对谷歌而言,它们拥有大量的服务器,基本上将其整个索引保存在RAM中,并具有几乎无限的计算能力。
#4
0
If you actually find that after indexing your tables the search/update is too slow for your liking, consider splitting the logs table into several. Depending on your design and interest in speed up it could be spliced multiple ways:
如果你真的发现在索引表之后搜索/更新对你的喜好来说太慢了,可以考虑将日志表拆分成几个。根据您的设计和对加速的兴趣,它可以通过多种方式拼接:
log_truck_miles (driver, truck_id, miles)
log_work_times (worker, job_id, minutes) ...etc.
Another way you could split is quantize worker IDs -- log entries for user_id below 5,000 go into table log_0_5
. 5,000 to 10,000 go to log_5_10
您可以拆分的另一种方法是量化工作者ID - user_id低于5,000的日志条目进入表log_0_5。 5,000到10,000转到log_5_10
#1
5
You might find that a simple query is fast enough with an appropiate index (e.g. index user_id
). This should reduce the number of rows that need to be scanned.
您可能会发现使用适当的索引(例如索引user_id),一个简单的查询就足够快了。这应该减少需要扫描的行数。
But if this is not fast enough, you could calculate the result for all users overnight, and cache this result in another table. You can then do the following:
但如果这还不够快,您可以在一夜之间计算所有用户的结果,并将此结果缓存到另一个表中。然后,您可以执行以下操作:
- Get the total up to the last cache update directly from the cache table.
- Get the total since the last cache update from the main table.
- Add these two numbers to get the overall total.
直接从缓存表获取最后一次缓存更新的总计。
从主表获取自上次缓存更新以来的总计。
添加这两个数字以获得总数。
Another option is to use triggers to keep the pre-calculated result accurate, even when rows are inserted, updated or deleted.
另一种选择是使用触发器来保持预先计算的结果准确,即使插入,更新或删除行也是如此。
#2
0
Rather than do a join a the million row table, i think you can create a summary table. it can be populated running a cron at night for example.
我认为你可以创建一个汇总表,而不是加入百万行表。例如,它可以在晚上运行一个cron。
#3
0
If you want it "instant", then stay away from keeping the totals in tables as then you have to worry about updating them through some process every time the underlying data changes.
如果你想要它“即时”,那么就不要把总数保留在表中,那么每次基础数据发生变化时你都要担心通过某个过程更新它们。
As long as your indexes are good, and you have some decent hardware then I don't see a problem with querying for these totals every time.
只要你的索引很好,并且你有一些不错的硬件,那么每次查询这些总数时我都没有看到问题。
As far as Google, they have lots and lots of servers, basically keep their entire index in RAM, and have virtually unlimited computing power.
对谷歌而言,它们拥有大量的服务器,基本上将其整个索引保存在RAM中,并具有几乎无限的计算能力。
#4
0
If you actually find that after indexing your tables the search/update is too slow for your liking, consider splitting the logs table into several. Depending on your design and interest in speed up it could be spliced multiple ways:
如果你真的发现在索引表之后搜索/更新对你的喜好来说太慢了,可以考虑将日志表拆分成几个。根据您的设计和对加速的兴趣,它可以通过多种方式拼接:
log_truck_miles (driver, truck_id, miles)
log_work_times (worker, job_id, minutes) ...etc.
Another way you could split is quantize worker IDs -- log entries for user_id below 5,000 go into table log_0_5
. 5,000 to 10,000 go to log_5_10
您可以拆分的另一种方法是量化工作者ID - user_id低于5,000的日志条目进入表log_0_5。 5,000到10,000转到log_5_10