存储和分析历史数据 - 什么样的数据库？

I'm currently designing a system that watches the ranks / views of youtube videos. of LOTS of youtube videos (> 500.000 and growing) on a daily basis.

我目前正在设计一个监视YouTube视频排名/视图的系统。很多youtube视频(> 500.000并且每天都在增长)。

I'm currently considering storing this in a MySQL database, but what disturbs me, is that the table would grow into billions and trillions of rows, which I don't think would perform well.

我目前正在考虑将其存储在MySQL数据库中,但令我感到不安的是,该表将增长到数十亿和数万亿行,我认为这些行不会很好。

I need to analyse this data, for example:

我需要分析这些数据,例如:

Which videos grew a lot in the time between X and Y

在X和Y之间的时间里,哪些视频增长了很多

Plot the clicks per day

绘制每日点击次数

Plot the clicks per week ...

绘制每周点击次数......

some more things I don't know yet about

还有一些我还不知道的事情

So, what came into my web 2.0 mind was, is there a way a NoSQL database could handle this better? I didn't quite learn these (almost) new databases and don't know what they are capable of.

那么,我的Web 2.0思想是什么,NoSQL数据库是否有办法更好地处理这个问题?我不太了解这些(几乎)新数据库,也不知道它们的功能。

What would your advice be, what type of database to use? Relational or not? If not, which NoSQL database?

您的建议是什么,使用什么类型的数据库?关系与否?如果没有,哪个NoSQL数据库?

PS: first priority is the fast evaluation and insertion of the results, second is high availability (or just replication)

PS:首要任务是快速评估和插入结果,其次是高可用性(或只是复制)

1 个解决方案

#1

It is very difficult to give an advice for a database system, because it always depends. However, considering that Facebook is built on MySQL, it shows that there probably performance is not a limit on MySQL for you.

为数据库系统提供建议非常困难,因为它总是取决于。然而,考虑到Facebook建立在MySQL上,它表明可能性能不是对你的MySQL限制。

What is helpful and you'll probably have done, is creating a structure of how your table structure should look like. Then also think of queries you would like to run against the tables.

什么是有用的,你可能已经做了,是创建一个表结构应该如何的结构。然后还要考虑要对表运行的查询。

If you have the right indexes (which is the main and crucial factor query speed relies on), you will not have to worry about performance in MySQL. What you should consider are (what I've had to experience), that there are many interesting things how MySQL deals with indexes. Let me give a few examples I had to figure out during the time:

如果您有正确的索引(这是查询速度依赖的主要和关键因素),您将不必担心MySQL的性能。你应该考虑的是(我必须经历的),MySQL有很多有趣的东西如何处理索引。让我举几个例子,我必须在此期间弄清楚:

if you want to use an index for a range scan, the index cannot be used for ORDER BY anymore

如果要使用索引进行范围扫描,则索引不能再用于ORDER BY

a range column has to be the last in an concatenated index for the full index to be used, same for ORDER BY again

范围列必须是要使用的完整索引的连锁索引中的最后一列,对于ORDER BY再次相同

For more information, a useful link on mysqlperformanceblog.com: http://www.mysqlperformanceblog.com/2009/09/12/3-ways-mysql-uses-indexes/

有关更多信息,请访问mysqlperformanceblog.com:http://www.mysqlperformanceblog.com/2009/09/12/3-ways-mysql-uses-indexes/

In general, if the structure of the database is well thought and the indexing is good, in my experience it does not matter actually if you only have 10.000 rows or 10 billion, the query time would be about the same.

一般来说,如果数据库的结构经过充分考虑并且索引很好,根据我的经验,如果你只有10.000行或100亿,实际上并不重要,查询时间大致相同。

#1