Consider that there is a bunch of tables which link to "countries" or "currencies" tables.
考虑到有一堆表链接到“国家”或“货币”表。
For making data easier to read I'd like make CHAR field with country code (eg US, GB, AU) and currency code (USD, AUD) a primary keys in each of those 2 tables and all other tables will use this CHAR as a foregin key.
为了使数据更容易阅读,我想使用国家代码(例如US,GB,AU)和货币代码(USD,AUD)的CHAR字段作为这两个表中每个表的主键,所有其他表将使用此CHAR作为一个外键。
Database is mysql with innodb engine.
数据库是带有innodb引擎的mysql。
Is it going to cause performance issues? Is it something i should avoid?
是否会导致性能问题?这是我应该避免的吗?
3 个解决方案
#1
21
Performance isn't really the main issue, at least not for me. The issue is more about surrogate vs natural keys.
性能不是主要问题,至少对我而言不是。问题更多的是代理与自然键。
Country codes aren't static. They can and do change. Countries change names (eg Ethiopia to Eritrea). They come into being (eg the breakup of Yugoslavia or the Soviet Union) and they cease to exist (eg West and East Germany). When this happens the ISO standard code changes.
国家/地区代码不是静态的他们可以而且确实会改变。国家改名(例如埃塞俄比亚到厄立特里亚)。它们应运而生(例如南斯拉夫或苏联解体)并且它们不复存在(例如西德和东德)。发生这种情况时,ISO标准代码会更改。
More in Name Changes Since 1990: Countries, Cities, and More
更多自1990年以来的名称变化:国家,城市等
Surrogate keys tend to be better because when these events happen the keys don't change, only columns in the reference table do.
代理键往往更好,因为当这些事件发生时,键不会改变,只有参考表中的列才会改变。
For that reason I'd be more inclined to create country and currency tables with an int primary key instead.
出于这个原因,我更倾向于使用int主键创建国家/地区和货币表。
That being said, varchar key fields will use more space and have certain performance disadvantages that probably won't be an issue unless you're performing a huge number of queries.
话虽这么说,varchar关键字段将使用更多空间并具有某些性能缺点,除非您执行大量查询,否则这些缺点可能不会成为问题。
For completeness, you may want to refer to Database Development Mistakes Made by AppDevelopers.
为了完整起见,您可能需要参考AppDevelopers制作的数据库开发错误。
#2
2
James Skidmore's link is important to read.
James Skidmore的链接很重要。
If you're limiting yourself to country and currency codes (2 and 3 characters, respectively), you may very well be able to get away with declaring the columns char(2) and char(3).
如果你将自己限制在国家和货币代码(分别为2和3个字符),你可以很好地宣布列char(2)和char(3)。
I would guess that would not be a no-no. If you're using an 8-bit character encoding, you're looking at columns the size of smallint or mediumint, respectively.
我猜这不会是禁忌。如果您使用的是8位字符编码,则分别查看smallint或mediumint大小的列。
#3
0
My answer is that there isn't a clear-cut answer. Just pick an approach within your project and be consistent. Both have their pluses and minuses.
我的答案是没有明确的答案。只需在项目中选择一种方法并保持一致。两者都有其优点和缺点。
@cletus makes a good point about using generated keys, but when you run into a situation where the data is relatively static, like country codes, introducing a generated key for them seems overly complex. Despite real world politics, having country codes appear and disappear isn't really going to be much of an issue for most business problems (but if your data actively concerns all 190-210 countries, follow that advice).
@cletus对使用生成的密钥提出了一个很好的观点,但是当你遇到数据相对静态的情况时,比如国家代码,为它们引入生成的密钥似乎过于复杂。尽管存在真实的世界政治,但对于大多数商业问题来说,让国家代码出现和消失并不是真正的问题(但如果您的数据主动关注所有190-210个国家,请遵循该建议)。
Using surrogate keys universally is a good and popular strategy. But remember, it comes in response to modeling databases using natural keys for everything. Ack! Open up a 15 year old database book. Using natural keys everywhere definitely gets you into difficult situations, as initial understanding of the problem domains prove wrong. You do want to have consistency in your modelling practices, but using different techniques for clearly different situations is OK.
普遍使用代理键是一种很好的流行策略。但请记住,它是为使用自然键建模数据库而做出的。确认!打开一本15岁的数据库书。在任何地方使用自然键肯定会让你陷入困境,因为对问题域的初步理解证明是错误的。您确实希望在建模实践中保持一致,但是对于明显不同的情况使用不同的技术是可以的。
I suspect that performance for most modern databases on var(2) foreign keys will be the same (or better) than int fields. Databases have for years supported textual foreign keys.
我怀疑var(2)外键上大多数现代数据库的性能与int字段相同(或更好)。数据库多年来一直支持文本外键。
Given that we have no other information about the project, if you preference is to use the country codes as foreign keys, and you have the option to do so, I'd say it's OK. It'll be easier to work with the data. It is a little against current practices, but-- in this case-- it's not going to back you into some corner.
鉴于我们没有关于该项目的其他信息,如果您希望使用国家/地区代码作为外键,并且您可以选择这样做,我会说没关系。处理数据会更容易。这有点违背当前的做法,但是 - 在这种情况下 - 它不会让你回到某个角落。
#1
21
Performance isn't really the main issue, at least not for me. The issue is more about surrogate vs natural keys.
性能不是主要问题,至少对我而言不是。问题更多的是代理与自然键。
Country codes aren't static. They can and do change. Countries change names (eg Ethiopia to Eritrea). They come into being (eg the breakup of Yugoslavia or the Soviet Union) and they cease to exist (eg West and East Germany). When this happens the ISO standard code changes.
国家/地区代码不是静态的他们可以而且确实会改变。国家改名(例如埃塞俄比亚到厄立特里亚)。它们应运而生(例如南斯拉夫或苏联解体)并且它们不复存在(例如西德和东德)。发生这种情况时,ISO标准代码会更改。
More in Name Changes Since 1990: Countries, Cities, and More
更多自1990年以来的名称变化:国家,城市等
Surrogate keys tend to be better because when these events happen the keys don't change, only columns in the reference table do.
代理键往往更好,因为当这些事件发生时,键不会改变,只有参考表中的列才会改变。
For that reason I'd be more inclined to create country and currency tables with an int primary key instead.
出于这个原因,我更倾向于使用int主键创建国家/地区和货币表。
That being said, varchar key fields will use more space and have certain performance disadvantages that probably won't be an issue unless you're performing a huge number of queries.
话虽这么说,varchar关键字段将使用更多空间并具有某些性能缺点,除非您执行大量查询,否则这些缺点可能不会成为问题。
For completeness, you may want to refer to Database Development Mistakes Made by AppDevelopers.
为了完整起见,您可能需要参考AppDevelopers制作的数据库开发错误。
#2
2
James Skidmore's link is important to read.
James Skidmore的链接很重要。
If you're limiting yourself to country and currency codes (2 and 3 characters, respectively), you may very well be able to get away with declaring the columns char(2) and char(3).
如果你将自己限制在国家和货币代码(分别为2和3个字符),你可以很好地宣布列char(2)和char(3)。
I would guess that would not be a no-no. If you're using an 8-bit character encoding, you're looking at columns the size of smallint or mediumint, respectively.
我猜这不会是禁忌。如果您使用的是8位字符编码,则分别查看smallint或mediumint大小的列。
#3
0
My answer is that there isn't a clear-cut answer. Just pick an approach within your project and be consistent. Both have their pluses and minuses.
我的答案是没有明确的答案。只需在项目中选择一种方法并保持一致。两者都有其优点和缺点。
@cletus makes a good point about using generated keys, but when you run into a situation where the data is relatively static, like country codes, introducing a generated key for them seems overly complex. Despite real world politics, having country codes appear and disappear isn't really going to be much of an issue for most business problems (but if your data actively concerns all 190-210 countries, follow that advice).
@cletus对使用生成的密钥提出了一个很好的观点,但是当你遇到数据相对静态的情况时,比如国家代码,为它们引入生成的密钥似乎过于复杂。尽管存在真实的世界政治,但对于大多数商业问题来说,让国家代码出现和消失并不是真正的问题(但如果您的数据主动关注所有190-210个国家,请遵循该建议)。
Using surrogate keys universally is a good and popular strategy. But remember, it comes in response to modeling databases using natural keys for everything. Ack! Open up a 15 year old database book. Using natural keys everywhere definitely gets you into difficult situations, as initial understanding of the problem domains prove wrong. You do want to have consistency in your modelling practices, but using different techniques for clearly different situations is OK.
普遍使用代理键是一种很好的流行策略。但请记住,它是为使用自然键建模数据库而做出的。确认!打开一本15岁的数据库书。在任何地方使用自然键肯定会让你陷入困境,因为对问题域的初步理解证明是错误的。您确实希望在建模实践中保持一致,但是对于明显不同的情况使用不同的技术是可以的。
I suspect that performance for most modern databases on var(2) foreign keys will be the same (or better) than int fields. Databases have for years supported textual foreign keys.
我怀疑var(2)外键上大多数现代数据库的性能与int字段相同(或更好)。数据库多年来一直支持文本外键。
Given that we have no other information about the project, if you preference is to use the country codes as foreign keys, and you have the option to do so, I'd say it's OK. It'll be easier to work with the data. It is a little against current practices, but-- in this case-- it's not going to back you into some corner.
鉴于我们没有关于该项目的其他信息,如果您希望使用国家/地区代码作为外键,并且您可以选择这样做,我会说没关系。处理数据会更容易。这有点违背当前的做法,但是 - 在这种情况下 - 它不会让你回到某个角落。