为什么我必须设置数据库中每个单个文本列的最大长度?

时间:2020-12-25 16:56:44

Why is it that every RDBMS insists that you tell it what the max length of a text field is going to be... why can't it just infer this information form the data that's put into the database?

为什么每个RDBMS都坚持要告诉它文本字段的最大长度是什么...为什么它不能只是将这些信息从放入数据库的数据中推断出来?

I've mostly worked with MS SQL Server, but every other database I know also demands that you set these arbitrary limits on your data schema. The reality is that this is not particulay helpful or friendly to work with becuase the business requirements change all the time and almost every day some end-user is trying to put a lot of text into that column.

我主要使用MS SQL Server,但我知道的每个其他数据库也要求您在数据模式上设置这些任意限制。实际情况是,由于业务需求一直在变化,而且几乎每天都有一些最终用户试图将大量文​​本放入该列,因此这对于工作并不是特别有用或友好。

Does any one with some inner working knowledge of a RDBMS know why we just don't infer the limits from the data that's put into the storage? I'm not talking about guessing the type information, but guessing the limits of a particular text column.

是否有任何具有RDBMS内部工作知识的人知道我们为什么不推断存储数据的限制?我不是在猜测类型信息,而是猜测特定文本列的限制。

I mean, there's a reason why I don't use nvarchar(max) on every text column in the database.

我的意思是,我没有在数据库中的每个文本列上使用nvarchar(max)。

9 个解决方案

#1


5  

Because computers (and databases) are stupid. Computers don't guess very well and, unless you tell them, they can't tell that a column is going to be used for a phone number or a copy of War and Peace. Obviously, the DB could be designed to so that every column could contain an infinite amount of data -- or at least as much as disk space allows -- but that would be a very inefficient design. In order to get efficiency, then, we make a trade-off and make the designer tell the database how much we expect to put in the column. Presumably, there could be a default so that if you don't specify one, it simply uses it. Unfortunately, any default would probably be inappropriate for the vast majority of people from an efficiency perspective.

因为计算机(和数据库)是愚蠢的。计算机不能很好地猜测,除非你告诉他们,否则他们无法分辨出一列将被用于电话号码或战争与和平的副本。显然,数据库可以设计成每列都可以包含无限量的数据 - 或者至少与磁盘空间允许的数据一样多 - 但这将是一个非常低效的设计。为了提高效率,我们进行权衡,让设计师告诉数据库我们希望在列中放多少。据推测,可能存在默认值,因此如果您未指定默认值,则只使用它。不幸的是,从效率的角度来看,任何违约都可能不适合绝大多数人。

#2


2  

It has to do with speed. If the max size of a string is specified you can optimize the way information is stored for faster i/o on it. When speed is key the last thing you want is a sudden shuffling of all your data just because you changed a state abbreviation to the full name.

它与速度有关。如果指定了字符串的最大大小,则可以优化信息的存储方式,以便更快地对其进行i / o操作。当速度是关键时,你想要的最后一件事就是因为你将状态缩写改为全名而突然改变了所有数据。

With the max size set the database can allocate the max space to every entity in that column and regardless of the changes to the value no address space needs to change.

使用max size set,数据库可以为该列中的每个实体分配最大空间,无论值的更改如何,都不需要更改地址空间。

#3


1  

This post not only answers your question about whether to use nvarchar(max) everywhere, but it also gives some insight into why databases historically didn't allow this.

这篇文章不仅回答了你是否在任何地方使用nvarchar(max)的问题,而且还提供了一些有关数据库历史上不允许这样做的见解。

#4


1  

This is like saying, why can't we just tell the database we want a table and let it infer what type and how many columns we need from the data we give it.

这就像说,为什么我们不能告诉数据库我们想要一个表,让它从我们提供的数据中推断出我们需要什么类型和多少列。

Simply, we know better than the database will. Supposed you have a one in a million chance of putting a 2,000 character string into the database, most of the time, it's 100 characters. The database would probably blow up or refuse the 2k character string. It simply cannot know that you're going to need 2k length if for the first three years you've only entered 100 length strings.

简单地说,我们比数据库更了解。假设您有一个百万分之一的机会将2,000个字符的字符串放入数据库,大多数情况下,它是100个字符。数据库可能会爆炸或拒绝2k字符串。它只是不知道你需要2k长度,如果前三年你只输入了100个长度字符串。

Also, the length of the characters are used to optimize row placement so that rows can be read/skipped faster.

此外,字符的长度用于优化行放置,以便可以更快地读取/跳过行。

#5


0  

I think it is because the RDBMS use random data access. To do random data access, they must know which address in the hard disk they must jump into to fastly read the data. If every row of a single column have different data length, they can not infer what is the start point of the address they have to jump directly to get it. The only way is they have to load all data and check it.

我认为这是因为RDBMS使用随机数据访问。要进行随机数据访问,他们必须知道必须跳入硬盘中的哪个地址才能快速读取数据。如果单个列的每一行都有不同的数据长度,则无法推断出他们必须直接跳转才能获得它的地址的起点。唯一的方法是他们必须加载所有数据并检查它。

If RDBMS change the data length of a column to a fixed number (for example, max length of all rows) everytime you add, update and delete. It is an extremely time consuming

如果每次添加,更新和删除时,RDBMS都会将列的数据长度更改为固定数(例如,所有行的最大长度)。这是非常耗时的

#6


0  

What would the DB base its guess on? If the business requirements change regularly, it's going to be just as surprised as you. If there's a reason you don't use nvarchar(max), there's probably a reason it doesn't default to that as well...

数据库的基础是什么?如果业务需求经常变化,那就会像你一样惊讶。如果有一个原因你没有使用nvarchar(max),那么可能还有一个原因就是它没有默认...

#7


0  

check this tread http://www.sqlservercentral.com/Forums/Topic295948-146-1.aspx

检查这一步http://www.sqlservercentral.com/Forums/Topic295948-146-1.aspx

#8


0  

For the sake of an example, I'm going to step into some quicksand and suggest you compare it with applications allocating memory (RAM). Why don't programmers ask for/allocate all the memory they need when the program starts up? Because often they don't know how much they'll need. This can lead to apps grabbing more and more memory as they run, and perhaps also releasing memory. And you have multiple apps running at the same time, and new apps starting, and old apps closing. And apps always want contiguous blocks of memory, they work poorly (if at all) if their memory is scattered all over the address space. Over time, this leads to fragmented memory, and all those garbage collection issues that people have been tearing their hair out over for decades.

为了举个例子,我将介绍一些流沙并建议你将它与分配内存(RAM)的应用程序进行比较。为什么程序员在程序启动时不需要/分配所需的所有内存?因为他们经常不知道他们需要多少钱。这可能导致应用程序在运行时抓取越来越多的内存,也许还会释放内存。并且您有多个应用程序同时运行,新应用程序启动,旧应用程序关闭。应用程序总是需要连续的内存块,如果内存遍布地址空间,它们的工作效果很差(如果有的话)。随着时间的推移,这会导致内存碎片化,以及人们几十年来一直在撕扯头发的所有垃圾收集问题。

Jump back to databases. Do you want that to happen to your hard drives? (Remember, hard drive performance is very, very slow when compared with memory operations...)

跳回数据库。你想要在你的硬盘上发生这种情况吗? (请记住,与内存操作相比,硬盘性能非常非常慢......)

#9


0  

Sounds like your business rule is: Enter as much information as you want in any text box so you don't get mad at the DBA.

听起来你的业务规则是:在任何文本框中输入任意数量的信息,这样你就不会对DBA生气。

You don't allow users to enter 5000 character addresses since they won't fit on the envelope.

您不允许用户输入5000个字符地址,因为它们不适合信封。

That's why Twitter has a text limit and saves everyone the trouble of reading through a bunch of mindless drivel that just goes on and on and never gets to the point, but only manages to infuriate the reader making them wonder why you have such disreguard for their time by choosing a self-centered and inhumane lifestyle focused on promoting the act of copying and pasting as much data as the memory buffer gods will allow...

这就是为什么Twitter有一个文本限制,并节省每个人阅读一堆无意义的驱动器的麻烦,只是继续下去,从来没有达到目的,但只是设法激怒读者让他们想知道为什么你有他们的这样的dereguard时间选择一种以自我为中心和不人道的生活方式,专注于促进复制和粘贴数据的行为,就像记忆缓冲神允许的那样......

#1


5  

Because computers (and databases) are stupid. Computers don't guess very well and, unless you tell them, they can't tell that a column is going to be used for a phone number or a copy of War and Peace. Obviously, the DB could be designed to so that every column could contain an infinite amount of data -- or at least as much as disk space allows -- but that would be a very inefficient design. In order to get efficiency, then, we make a trade-off and make the designer tell the database how much we expect to put in the column. Presumably, there could be a default so that if you don't specify one, it simply uses it. Unfortunately, any default would probably be inappropriate for the vast majority of people from an efficiency perspective.

因为计算机(和数据库)是愚蠢的。计算机不能很好地猜测,除非你告诉他们,否则他们无法分辨出一列将被用于电话号码或战争与和平的副本。显然,数据库可以设计成每列都可以包含无限量的数据 - 或者至少与磁盘空间允许的数据一样多 - 但这将是一个非常低效的设计。为了提高效率,我们进行权衡,让设计师告诉数据库我们希望在列中放多少。据推测,可能存在默认值,因此如果您未指定默认值,则只使用它。不幸的是,从效率的角度来看,任何违约都可能不适合绝大多数人。

#2


2  

It has to do with speed. If the max size of a string is specified you can optimize the way information is stored for faster i/o on it. When speed is key the last thing you want is a sudden shuffling of all your data just because you changed a state abbreviation to the full name.

它与速度有关。如果指定了字符串的最大大小,则可以优化信息的存储方式,以便更快地对其进行i / o操作。当速度是关键时,你想要的最后一件事就是因为你将状态缩写改为全名而突然改变了所有数据。

With the max size set the database can allocate the max space to every entity in that column and regardless of the changes to the value no address space needs to change.

使用max size set,数据库可以为该列中的每个实体分配最大空间,无论值的更改如何,都不需要更改地址空间。

#3


1  

This post not only answers your question about whether to use nvarchar(max) everywhere, but it also gives some insight into why databases historically didn't allow this.

这篇文章不仅回答了你是否在任何地方使用nvarchar(max)的问题,而且还提供了一些有关数据库历史上不允许这样做的见解。

#4


1  

This is like saying, why can't we just tell the database we want a table and let it infer what type and how many columns we need from the data we give it.

这就像说,为什么我们不能告诉数据库我们想要一个表,让它从我们提供的数据中推断出我们需要什么类型和多少列。

Simply, we know better than the database will. Supposed you have a one in a million chance of putting a 2,000 character string into the database, most of the time, it's 100 characters. The database would probably blow up or refuse the 2k character string. It simply cannot know that you're going to need 2k length if for the first three years you've only entered 100 length strings.

简单地说,我们比数据库更了解。假设您有一个百万分之一的机会将2,000个字符的字符串放入数据库,大多数情况下,它是100个字符。数据库可能会爆炸或拒绝2k字符串。它只是不知道你需要2k长度,如果前三年你只输入了100个长度字符串。

Also, the length of the characters are used to optimize row placement so that rows can be read/skipped faster.

此外,字符的长度用于优化行放置,以便可以更快地读取/跳过行。

#5


0  

I think it is because the RDBMS use random data access. To do random data access, they must know which address in the hard disk they must jump into to fastly read the data. If every row of a single column have different data length, they can not infer what is the start point of the address they have to jump directly to get it. The only way is they have to load all data and check it.

我认为这是因为RDBMS使用随机数据访问。要进行随机数据访问,他们必须知道必须跳入硬盘中的哪个地址才能快速读取数据。如果单个列的每一行都有不同的数据长度,则无法推断出他们必须直接跳转才能获得它的地址的起点。唯一的方法是他们必须加载所有数据并检查它。

If RDBMS change the data length of a column to a fixed number (for example, max length of all rows) everytime you add, update and delete. It is an extremely time consuming

如果每次添加,更新和删除时,RDBMS都会将列的数据长度更改为固定数(例如,所有行的最大长度)。这是非常耗时的

#6


0  

What would the DB base its guess on? If the business requirements change regularly, it's going to be just as surprised as you. If there's a reason you don't use nvarchar(max), there's probably a reason it doesn't default to that as well...

数据库的基础是什么?如果业务需求经常变化,那就会像你一样惊讶。如果有一个原因你没有使用nvarchar(max),那么可能还有一个原因就是它没有默认...

#7


0  

check this tread http://www.sqlservercentral.com/Forums/Topic295948-146-1.aspx

检查这一步http://www.sqlservercentral.com/Forums/Topic295948-146-1.aspx

#8


0  

For the sake of an example, I'm going to step into some quicksand and suggest you compare it with applications allocating memory (RAM). Why don't programmers ask for/allocate all the memory they need when the program starts up? Because often they don't know how much they'll need. This can lead to apps grabbing more and more memory as they run, and perhaps also releasing memory. And you have multiple apps running at the same time, and new apps starting, and old apps closing. And apps always want contiguous blocks of memory, they work poorly (if at all) if their memory is scattered all over the address space. Over time, this leads to fragmented memory, and all those garbage collection issues that people have been tearing their hair out over for decades.

为了举个例子,我将介绍一些流沙并建议你将它与分配内存(RAM)的应用程序进行比较。为什么程序员在程序启动时不需要/分配所需的所有内存?因为他们经常不知道他们需要多少钱。这可能导致应用程序在运行时抓取越来越多的内存,也许还会释放内存。并且您有多个应用程序同时运行,新应用程序启动,旧应用程序关闭。应用程序总是需要连续的内存块,如果内存遍布地址空间,它们的工作效果很差(如果有的话)。随着时间的推移,这会导致内存碎片化,以及人们几十年来一直在撕扯头发的所有垃圾收集问题。

Jump back to databases. Do you want that to happen to your hard drives? (Remember, hard drive performance is very, very slow when compared with memory operations...)

跳回数据库。你想要在你的硬盘上发生这种情况吗? (请记住,与内存操作相比,硬盘性能非常非常慢......)

#9


0  

Sounds like your business rule is: Enter as much information as you want in any text box so you don't get mad at the DBA.

听起来你的业务规则是:在任何文本框中输入任意数量的信息,这样你就不会对DBA生气。

You don't allow users to enter 5000 character addresses since they won't fit on the envelope.

您不允许用户输入5000个字符地址,因为它们不适合信封。

That's why Twitter has a text limit and saves everyone the trouble of reading through a bunch of mindless drivel that just goes on and on and never gets to the point, but only manages to infuriate the reader making them wonder why you have such disreguard for their time by choosing a self-centered and inhumane lifestyle focused on promoting the act of copying and pasting as much data as the memory buffer gods will allow...

这就是为什么Twitter有一个文本限制,并节省每个人阅读一堆无意义的驱动器的麻烦,只是继续下去,从来没有达到目的,但只是设法激怒读者让他们想知道为什么你有他们的这样的dereguard时间选择一种以自我为中心和不人道的生活方式,专注于促进复制和粘贴数据的行为,就像记忆缓冲神允许的那样......