I have the following choice:
我有以下选择:
I have a huge table (9999999999999 rows), let's call it tableHuge, and I would like to split it into multiple tables (to optimize queries). This table contains dates (days of the month), and most queries are made using a specified month as search key in select. This leads me to the following choices:
我有一个巨大的表(9999999999999行),我们称之为tableHuge,我想将其拆分为多个表(以优化查询)。此表包含日期(一月中的几天),并且大多数查询是使用指定的月份作为select中的搜索键。这引出了以下选择:
Choice one: Split the table into multiple tables, using a month as his tail (like lessHugeTable_01, lessHugeTable_02, etc.). Then I can take care in my app to access table that I need. The main downside is loosing the ability to join, in cases that includes more than one month (or join with an union... well.. complications).
选择一:将表分成多个表,使用一个月作为尾巴(如lessHugeTable_01,lessHugeTable_02等)。然后我可以在我的应用程序中小心访问我需要的表。主要的缺点是失去加入的能力,包括超过一个月的情况(或加入工会......好......并发症)。
Choice two: Use table partitioning.
选择二:使用表分区。
Since I never used partitioning before (so I don't have knowledge to compare), I would like some advice on how to do it, pros and cons if possible (except obvious things like "if your manual partition table gets broken you loose only that data while in table part you loose whole data").
因为我以前从未使用过分区(所以我没有比较的知识),我想要一些关于如何做的建议,如果可能的话,优点和缺点(除了明显的事情,例如“如果你的手动分区表被破坏你只是松散表格部分中的数据会丢失整个数据“)。
Thank you for your time.
感谢您的时间。
1 个解决方案
#1
3
The answer here is really "depends".
这里的答案真的是“依赖”。
More specifically it depends on the nature of your data, what accesses your data and how that data is accessed.
更具体地说,它取决于数据的性质,访问数据的方式以及访问数据的方式。
From the sounds of it you might be best off with a table partitioned by year and month. I am making wild assumptions here that you will need to access older data less frequently/never and hence will be able to archive it off to keep data volumes down in your main table (like I said "depends"!);
从它的声音中你可能最好用一个月和月分隔的表。我在这里做了一些疯狂的假设,你需要不那么频繁地/永远地访问旧数据,因此可以将其存档以保持主表中的数据量(就像我说“取决于”!);
If your table is, and always will be, accessed by one application alone into which you can build logic to handle your 'tail' naming conventions then you might want to go down the multiple tables route.
如果您的表是,并且将始终由一个应用程序访问,您可以构建逻辑来处理您的“尾部”命名约定,那么您可能想要沿着多表路由。
Here is how I see pros and cons stacking up:
以下是我看到利弊堆积的方式:
Multiple Tables Pros
- Smaller individual table if just selecting data for a single month
- Errr. I can only think of one actually
如果只选择一个月的数据,则表格较小
Errr。我实际上只能想到一个
Multiple Tables Cons
- Difficulty in querying/updating multi-month datasets
- What happens if you get data from January in the February table? "But it will never happen!". Really? Really?!
- If multiple applications need to access these tables then they will all have to have your 'tail' naming convention logic in place i.e. lessHugeTable_02 has data from February in it.
查询/更新多月数据集的难度
如果从2月表中的1月份获得数据会发生什么? “但它永远不会发生!”真?真?!
如果多个应用程序需要访问这些表,那么它们都必须具有“尾部”命名约定逻辑,即lessHugeTable_02具有来自二月的数据。
and now partitioning:
现在分区:
Partitioning Tables Pros
- You're letting MySQL take care of your data sharding for you. So no "this month = this table" logic needed in your application
- No risk of January data getting into the February table
- Joining becomes easier since you have a single logical (if not physical) table
- If you are using MySQL 5.5 or newer then you can truncate partitions. Really handy for any housekeeping you might want to do
您正在让MySQL为您处理数据分片。因此,您的应用程序中不需要“本月=此表”逻辑
没有1月数据进入2月表的风险
由于您拥有单个逻辑(如果不是物理)表,因此加入变得更容易
如果您使用的是MySQL 5.5或更高版本,则可以截断分区。对于您可能想做的任何家务管理来说真的很方便
Partitioning Tables Cons
- Potentially you have a much much larger dataset to query. If you run a query that spans multiple partitions then it will probably take a while. Choose your partition key wisely!
- Probably more but I am running out of time and wild assumptions!
您可能要查询的数据集要大得多。如果您运行跨越多个分区的查询,那么可能需要一段时间。明智地选择你的分区键!
可能更多,但我没有时间和疯狂的假设!
PS There is a good answer on some of the points here
PS这里有一些很好的答案
#1
3
The answer here is really "depends".
这里的答案真的是“依赖”。
More specifically it depends on the nature of your data, what accesses your data and how that data is accessed.
更具体地说,它取决于数据的性质,访问数据的方式以及访问数据的方式。
From the sounds of it you might be best off with a table partitioned by year and month. I am making wild assumptions here that you will need to access older data less frequently/never and hence will be able to archive it off to keep data volumes down in your main table (like I said "depends"!);
从它的声音中你可能最好用一个月和月分隔的表。我在这里做了一些疯狂的假设,你需要不那么频繁地/永远地访问旧数据,因此可以将其存档以保持主表中的数据量(就像我说“取决于”!);
If your table is, and always will be, accessed by one application alone into which you can build logic to handle your 'tail' naming conventions then you might want to go down the multiple tables route.
如果您的表是,并且将始终由一个应用程序访问,您可以构建逻辑来处理您的“尾部”命名约定,那么您可能想要沿着多表路由。
Here is how I see pros and cons stacking up:
以下是我看到利弊堆积的方式:
Multiple Tables Pros
- Smaller individual table if just selecting data for a single month
- Errr. I can only think of one actually
如果只选择一个月的数据,则表格较小
Errr。我实际上只能想到一个
Multiple Tables Cons
- Difficulty in querying/updating multi-month datasets
- What happens if you get data from January in the February table? "But it will never happen!". Really? Really?!
- If multiple applications need to access these tables then they will all have to have your 'tail' naming convention logic in place i.e. lessHugeTable_02 has data from February in it.
查询/更新多月数据集的难度
如果从2月表中的1月份获得数据会发生什么? “但它永远不会发生!”真?真?!
如果多个应用程序需要访问这些表,那么它们都必须具有“尾部”命名约定逻辑,即lessHugeTable_02具有来自二月的数据。
and now partitioning:
现在分区:
Partitioning Tables Pros
- You're letting MySQL take care of your data sharding for you. So no "this month = this table" logic needed in your application
- No risk of January data getting into the February table
- Joining becomes easier since you have a single logical (if not physical) table
- If you are using MySQL 5.5 or newer then you can truncate partitions. Really handy for any housekeeping you might want to do
您正在让MySQL为您处理数据分片。因此,您的应用程序中不需要“本月=此表”逻辑
没有1月数据进入2月表的风险
由于您拥有单个逻辑(如果不是物理)表,因此加入变得更容易
如果您使用的是MySQL 5.5或更高版本,则可以截断分区。对于您可能想做的任何家务管理来说真的很方便
Partitioning Tables Cons
- Potentially you have a much much larger dataset to query. If you run a query that spans multiple partitions then it will probably take a while. Choose your partition key wisely!
- Probably more but I am running out of time and wild assumptions!
您可能要查询的数据集要大得多。如果您运行跨越多个分区的查询,那么可能需要一段时间。明智地选择你的分区键!
可能更多,但我没有时间和疯狂的假设!
PS There is a good answer on some of the points here
PS这里有一些很好的答案