I have a MySQL-InnoDB table with 350,000+ rows, containing a couple of things like id, otherId, shortTitle and so on. Now I'm in need of a Bool/ Bit field for perhaps a couple of hundreds or thousands of those rows. Should I just add that bool field into the table, or should I best create a new table referencing the IDs of the old table -- thereby not risking to cause performance issues on all the old existing functions that access the first table?
我有一个包含350,000多行的MySQL-InnoDB表,包含一些内容,如id,otherId,shortTitle等。现在我需要一个Bool / Bit字段,可能需要几百或几千行。我应该只将bool字段添加到表中,还是应该最好创建一个引用旧表ID的新表 - 从而不会在访问第一个表的所有旧现有函数上导致性能问题?
(Side info: I'm never using "SELECT * ...". The main table has lots of reading, rarely writing.)
(旁边信息:我从不使用“SELECT * ...”。主表有很多阅读,很少写。)
4 个解决方案
#1
Adding a field can indeed hamper performance a little, since your table row grow larger, but it's hardly a problem for a BIT
field.
添加字段确实会妨碍性能,因为您的表行会变大,但对于BIT字段来说这几乎不是问题。
Most probably, you will have exactly same row count per page, which means having no performance decrease at all.
最有可能的是,每页的行数完全相同,这意味着根本没有性能下降。
On the other hand, using an extra JOIN
to access the row value in another table will be much slower.
另一方面,使用额外的JOIN来访问另一个表中的行值会慢得多。
I'd add the column right into the table.
我将列添加到表中。
#2
What does the new column denote?
新列表示什么?
From the data modelling perspective, if the column belongs with the data under whichever normal form is in use, then put it with the data; performance impact be damned. If the column doesn't directly belong to the table, then put it in a second table with a foreign key.
从数据建模的角度来看,如果列属于正在使用的正常形式下的数据,则将其与数据放在一起;性能影响是该死的。如果列不直接属于表,则将其放在带有外键的第二个表中。
Realistically, the performance impact of adding a new column on a table with ~350,000 isn't going to be particularly huge. Have you tried issuing the ALTER TABLE
statement against a copy, perhaps on a local workstation?
实际上,在约350,000的表上添加新列的性能影响不会特别大。您是否尝试针对副本发出ALTER TABLE语句,可能是在本地工作站上?
#3
I don't know why people insist in called 350K-row tables big. In the mainframe world, that's how big the DBMS configuration tables are :-).
我不知道为什么人们坚持称350K排表大。在大型机领域,这就是DBMS配置表的大小:-)。
That said, you should be designing your tables in third normal form. If, and only if, you have performance problems, then should you consider de-normalizing.
也就是说,您应该以第三范式设计表格。如果且仅当您遇到性能问题时,那么您应该考虑去标准化。
If you have a column that will apply only to certain of the rows, it's (probably) not going to be 3NF to put it in the same table. You should have a separate table with a foreign key into your 'primary' table.
如果你有一个只适用于某些行的列,那么它(可能)不会是3NF将它放在同一个表中。您应该在“主”表中有一个带有外键的单独表。
Keep in mind that's if the boolean field actually doesn't apply to some of the rows. That's a different situation to the field applying to all rows but not being known for some. In that case, a nullable column in the primary table would be better. But that doesn't sound like what you're describing.
请记住,如果布尔字段实际上不适用于某些行。对于适用于所有行但不为某些行所知的字段,这是一种不同的情况。在这种情况下,主表中的可空列会更好。但这听起来并不像你所描述的那样。
#4
Requiring a bit field for the next entries only sounds like you want to implement inheritance. If that is the case, I would add it to a new table to keep things readable. Otherwise, it doesn't matter if you add it to the main table or not, unless your queries are not using indexes, in which case I would change that first before making any other decisions regarding performance.
要求下一个条目的位字段听起来像是要实现继承。如果是这种情况,我会将其添加到新表中以保持可读性。否则,如果将其添加到主表中并不重要,除非您的查询没有使用索引,在这种情况下,我会在做出关于性能的任何其他决定之前先更改它。
#1
Adding a field can indeed hamper performance a little, since your table row grow larger, but it's hardly a problem for a BIT
field.
添加字段确实会妨碍性能,因为您的表行会变大,但对于BIT字段来说这几乎不是问题。
Most probably, you will have exactly same row count per page, which means having no performance decrease at all.
最有可能的是,每页的行数完全相同,这意味着根本没有性能下降。
On the other hand, using an extra JOIN
to access the row value in another table will be much slower.
另一方面,使用额外的JOIN来访问另一个表中的行值会慢得多。
I'd add the column right into the table.
我将列添加到表中。
#2
What does the new column denote?
新列表示什么?
From the data modelling perspective, if the column belongs with the data under whichever normal form is in use, then put it with the data; performance impact be damned. If the column doesn't directly belong to the table, then put it in a second table with a foreign key.
从数据建模的角度来看,如果列属于正在使用的正常形式下的数据,则将其与数据放在一起;性能影响是该死的。如果列不直接属于表,则将其放在带有外键的第二个表中。
Realistically, the performance impact of adding a new column on a table with ~350,000 isn't going to be particularly huge. Have you tried issuing the ALTER TABLE
statement against a copy, perhaps on a local workstation?
实际上,在约350,000的表上添加新列的性能影响不会特别大。您是否尝试针对副本发出ALTER TABLE语句,可能是在本地工作站上?
#3
I don't know why people insist in called 350K-row tables big. In the mainframe world, that's how big the DBMS configuration tables are :-).
我不知道为什么人们坚持称350K排表大。在大型机领域,这就是DBMS配置表的大小:-)。
That said, you should be designing your tables in third normal form. If, and only if, you have performance problems, then should you consider de-normalizing.
也就是说,您应该以第三范式设计表格。如果且仅当您遇到性能问题时,那么您应该考虑去标准化。
If you have a column that will apply only to certain of the rows, it's (probably) not going to be 3NF to put it in the same table. You should have a separate table with a foreign key into your 'primary' table.
如果你有一个只适用于某些行的列,那么它(可能)不会是3NF将它放在同一个表中。您应该在“主”表中有一个带有外键的单独表。
Keep in mind that's if the boolean field actually doesn't apply to some of the rows. That's a different situation to the field applying to all rows but not being known for some. In that case, a nullable column in the primary table would be better. But that doesn't sound like what you're describing.
请记住,如果布尔字段实际上不适用于某些行。对于适用于所有行但不为某些行所知的字段,这是一种不同的情况。在这种情况下,主表中的可空列会更好。但这听起来并不像你所描述的那样。
#4
Requiring a bit field for the next entries only sounds like you want to implement inheritance. If that is the case, I would add it to a new table to keep things readable. Otherwise, it doesn't matter if you add it to the main table or not, unless your queries are not using indexes, in which case I would change that first before making any other decisions regarding performance.
要求下一个条目的位字段听起来像是要实现继承。如果是这种情况,我会将其添加到新表中以保持可读性。否则,如果将其添加到主表中并不重要,除非您的查询没有使用索引,在这种情况下,我会在做出关于性能的任何其他决定之前先更改它。