用于在网站上跟踪用户的数据库

Consider a website getting approximately 50K unique visitors daily. Now, I want to track the user visiting the website using pixel tracking. Before starting any development, I wish to decide about the storage database that I will be using for the project.

考虑一个网站每天约有50万独立访客。现在,我想跟踪使用像素跟踪访问网站的用户。在开始任何开发之前,我希望决定我将用于项目的存储数据库。

Clearly, this will be a write intensive database with too many writes and very few searches while someone (admin) tries to see the analytics data.

显然,这将是一个写密集型数据库,当有人(管理员)试图查看分析数据时,写入次数太多,搜索次数很少。

So, what type of database - mysql or no sql should I use for this project ?

那么,我应该为这个项目使用什么类型的数据库 - mysql或者没有sql?

Please comment if I am unclear in asking my doubt.

如果我不清楚我的疑问,请发表评论。

Thanks !

2 个解决方案

#1

Given the provided load estimates and reasonable retention policy, say 2 years of data, I believe that a regular relational database should do. MySQL supports partitioning of the tables and archival of the partitions.

鉴于提供的负载估计和合理的保留策略,比如2年的数据,我相信常规的关系数据库应该这样做。 MySQL支持分区表和分区存档。

The user visit data can be naturally partitioned by date. The analytic queries for that kind of data are also usually involve a date or a date range. To avoid performance problems of managing too many too small partitions, I suggest range partitioning by week.

用户访问数据可以按日期自然分区。对这类数据的分析查询通常还涉及日期或日期范围。为了避免管理太多太小分区的性能问题,我建议按周划分范围。

If your data grows by, say, 2 orders of magnitude (10M records per day rather than 100K), you should seriously consider some Big Data solution. A combination of Flume/Hadoop/Hive would allow to reuse your analytic query with minimal modifications.

如果您的数据增长了2个数量级(每天10M记录而不是100K),您应该认真考虑一些大数据解决方案。 Flume / Hadoop / Hive的组合允许以最少的修改重用您的分析查询。

#2

From the scaling perspective the easiest thing is to write the information to files (simple log files). Then you could take Hadoop and process the data. First with no-cluster (Hadoop in embedded mode). Then you can add as many nodes as you like.

从扩展的角度来看,最简单的方法是将信息写入文件(简单的日志文件)。然后你可以使用Hadoop并处理数据。首先使用无集群(嵌入模式下的Hadoop)。然后,您可以根据需要添加任意数量的节点。

But critical question is also how you wanna process your data. So what is your analytical expectation. You wanna visualize your data ? How fast do you wanna get yours answers ? How fast should new data be integrated. You always have the same questions or wanna play with the data ? Wanna mix with some other data, etc...

但关键问题也在于您如何处理数据。那么你的分析期望是什么?您想要想象您的数据吗?你想以多快的速度得到你的答案?新数据的集成速度有多快。你总是有同样的问题或想玩数据吗?想要与其他一些数据等混合......

Mysql is probably more mature in terms of analytical tools on top of it. And its likely to be faster as long as you data size isn't too big. With Hadoop you could use Hive and consorts to help you processing the data, but visualisation might not be as trivial.

Mysql在分析工具方面可能更为成熟。只要数据大小不是太大,它就可能更快。使用Hadoop,您可以使用Hive和consorts来帮助您处理数据,但可视化可能不是那么简单。

#1

鉴于提供的负载估计和合理的保留策略,比如2年的数据,我相信常规的关系数据库应该这样做。 MySQL支持分区表和分区存档。

用户访问数据可以按日期自然分区。对这类数据的分析查询通常还涉及日期或日期范围。为了避免管理太多太小分区的性能问题,我建议按周划分范围。

秒客网

用于在网站上跟踪用户的数据库

2 个解决方案

#1

#2

#1

#2

相关文章