I'm developing software for conducting online surveys. When a lot of users are filling in a survey simultaneously, I'm experiencing trouble handling the high database write load. My current table (MySQL, InnoDB) for storing survey data has the following columns: dataID, userID, item_1 .. item_n. The item_* columns have different data types corresponding to the type of data acquired with the specific items. Most item columns are TINYINT(1), but there are also some TEXT item columns. Large surveys can have more than a hundred items, leading to a table with more than a hundred columns. The users answers around 20 items in one http post and the corresponding row has to be updated accordingly. The user may skip a lot of items, leading to a lot of NULL values in the row.
我正在开发进行在线调查的软件。当许多用户同时填写调查时,我遇到了处理高数据库写负载的麻烦。我当前用于存储调查数据的表(MySQL, InnoDB)有以下列:dataID, userID, item_1。item_n。item_*列具有与特定项目获得的数据类型对应的不同数据类型。大多数项目列都是TINYINT(1),但也有一些文本项目列。大型调查可以包含100多个项目,从而形成一个包含100多个列的表。用户在一个http post中回答大约20个项目,相应的行必须相应地更新。用户可能会跳过很多项,导致行中有很多空值。
I'm considering the following solution to my write load problem. Instead of having a single table with many columns, I set up several tables corresponding to the used data types, e.g.: data_tinyint_1, data_smallint_6, data_text. Each of these tables would have only the following columns: userID, itemID, value (the value column has the data type corresponding to its table). For one http post with e.g. 20 items, I then might have to create 19 rows in data_tinyint_1 and one row in data_text (instead of updating one large row with many columns). However, for every item, I need to determine its data type (via two table joins) so I know in which table to create the new row. My zend framework based application code will get more complicated with this approach.
我正在考虑下面的解决方案来解决我的写加载问题。我没有使用具有多个列的单个表,而是设置了几个与使用的数据类型相对应的表,例如:data_tinyint_1、data_smallint_6、data_text。每个表将只有以下列:userID、itemID、value(值列具有与其表对应的数据类型)。对于一个包含20个条目的http post,我可能需要在data_tinyint_1中创建19行,在data_text中创建一行(而不是更新一个包含许多列的大行)。但是,对于每个项目,我需要确定它的数据类型(通过两个表连接),因此我知道在哪个表中创建新行。我的基于zend框架的应用程序代码将使用这种方法变得更加复杂。
My questions:
我的问题:
- Will my solution be better for heavy write load?
- 我的解决方案对于高写负载更好吗?
- Do you have a better solution?
- 你有更好的解决方案吗?
2 个解决方案
#1
2
Since you're getting to a point of abstracting this schema to mimic actual datatypes, it might stand to reason that you should simply create new table sets per-survey instead. Benefit will be that the locking will lessen and you could isolate heavy loads to outside machines, if the load becomes unbearable.
由于您正在抽象这个模式以模拟实际的数据类型,所以您应该为每个调查创建新的表集。这样做的好处是,如果负载变得无法承受,锁将会减少,并且您可以将沉重的负载隔离到外部机器。
The single-survey database structure then can more accurately reflect your real world conditions and data input handlers. It ought to make your abstraction headaches go away.
然后,单调查数据库结构可以更准确地反映实际情况和数据输入处理程序。它应该会让你的抽象头痛消失。
There's nothing wrong with creating tables on the fly. In some configurations, soft sharding is preferable.
动态创建表没有错。在某些配置中,软分片更可取。
#2
1
This looks like obvious solution would be to use document database for fast writes and then bulk-insert answers to MySQL asynchronously using cron or something like that. You can create view in the document database for quick statistics, but allow filtering and other complicated stuff only in MySQ if you're not a fan of document DBMSs.
看起来很明显的解决方案是使用文档数据库进行快速写入,然后使用cron或类似的东西异步地向MySQL添加答案。您可以在文档数据库中创建视图来进行快速统计,但如果您不喜欢文档dbms,则只能在MySQ中允许过滤和其他复杂的内容。
#1
2
Since you're getting to a point of abstracting this schema to mimic actual datatypes, it might stand to reason that you should simply create new table sets per-survey instead. Benefit will be that the locking will lessen and you could isolate heavy loads to outside machines, if the load becomes unbearable.
由于您正在抽象这个模式以模拟实际的数据类型,所以您应该为每个调查创建新的表集。这样做的好处是,如果负载变得无法承受,锁将会减少,并且您可以将沉重的负载隔离到外部机器。
The single-survey database structure then can more accurately reflect your real world conditions and data input handlers. It ought to make your abstraction headaches go away.
然后,单调查数据库结构可以更准确地反映实际情况和数据输入处理程序。它应该会让你的抽象头痛消失。
There's nothing wrong with creating tables on the fly. In some configurations, soft sharding is preferable.
动态创建表没有错。在某些配置中,软分片更可取。
#2
1
This looks like obvious solution would be to use document database for fast writes and then bulk-insert answers to MySQL asynchronously using cron or something like that. You can create view in the document database for quick statistics, but allow filtering and other complicated stuff only in MySQ if you're not a fan of document DBMSs.
看起来很明显的解决方案是使用文档数据库进行快速写入,然后使用cron或类似的东西异步地向MySQL添加答案。您可以在文档数据库中创建视图来进行快速统计,但如果您不喜欢文档dbms,则只能在MySQ中允许过滤和其他复杂的内容。