I'm working on a database for a small web app at my school using SQL Server 2005
.
I see a couple of schools of thought on the issue of varchar
vs nvarchar
:
我正在使用SQL Server 2005为学校的一个小型web应用开发一个数据库。在varchar vs nvarchar的问题上,我看到了两种观点:
- Use
varchar
unless you deal with a lot of internationalized data, then usenvarchar
. - 使用varchar除非您处理大量国际化数据,然后使用nvarchar。
- Just use
nvarchar
for everything. - 只要用nvarchar就可以了。
I'm beginning to see the merits of view 2. I know that nvarchar does take up twice as much space, but that isn't necessarily a huge deal since this is only going to store data for a few hundred students. To me it seems like it would be easiest not to worry about it and just allow everything to use nvarchar. Or is there something I'm missing?
我开始看到视图2的优点。我知道nvarchar占用的空间是原来的两倍,但这并不是什么大不了的事情,因为这只会为几百名学生存储数据。对我来说,这似乎是最容易的,不用担心它,只允许一切使用nvarchar。还是我漏掉了什么?
14 个解决方案
#1
141
Always use nvarchar.
总是使用nvarchar。
You may never need the double-byte characters for most applications. However, if you need to support double-byte languages and you only have single-byte support in your database schema it's really expensive to go back and modify throughout your application.
对于大多数应用程序,您可能永远不需要双字节字符。但是,如果您需要支持双字节语言,并且您的数据库模式中只有单字节支持,那么在整个应用程序中返回和修改是非常昂贵的。
The cost of migrating one application from varchar to nvarchar will be much more than the little bit of extra disk space you'll use in most applications.
将一个应用程序从varchar迁移到nvarchar的成本将远远超过在大多数应用程序中使用的少量额外磁盘空间。
#2
219
Disk space is not the issue... but memory and performance will be. Double the page reads, double index size, strange LIKE and = constant behaviour etc
磁盘空间不是问题所在……但是记忆和性能将会是。双读页面,双索引大小,奇怪的LIKE和=常量行为等
Do you need to store Chinese etc script? Yes or no...
你需要存储中文等脚本吗?是或否……
And from MS BOL "Storage and Performance Effects of Unicode"
以及MS BOL中的“Unicode的存储和性能影响”
Edit:
编辑:
Recent SO question highlighting how bad nvarchar performance can be...
最近关于nvarchar性能有多糟糕的问题……
SQL Server uses high CPU when searching inside nvarchar strings
SQL Server在nvarchar字符串中搜索时使用高CPU。
#3
59
Be consistent! JOIN-ing a VARCHAR to NVARCHAR has a big performance hit.
是一致的!将一个VARCHAR连接到NVARCHAR是一个巨大的性能打击。
#4
39
nvarchar is going to have significant overhead in memory, storage, working set and indexing, so if the specs dictate that it really will never be necessary, don't bother.
nvarchar将在内存、存储、工作集和索引方面有很大的开销,所以如果规格说明它真的是不必要的,不要麻烦。
I would not have a hard and fast "always nvarchar" rule because it can be a complete waste in many situations - particularly ETL from ASCII/EBCDIC or identifiers and code columns which are often keys and foreign keys.
我不会有一个硬而快速的“总是nvarchar”规则,因为它在许多情况下是完全的浪费——特别是来自ASCII/EBCDIC或标识符和代码列的ETL,它们通常是键和外键。
On the other hand, there are plenty of cases of columns, where I would be sure to ask this question early and if I didn't get a hard and fast answer immediately, I would make the column nvarchar.
另一方面,有很多列的情况,我肯定会提前问这个问题,如果我没有得到一个快速的答案,我会把列写成nvarchar。
#5
20
For your application, nvarchar is fine because the database size is small. Saying "always use nvarchar" is a vast oversimplification. If you're not required to store things like Kanji or other crazy characters, use VARCHAR, it'll use a lot less space. My predecessor at my current job designed something using NVARCHAR when it wasn't needed. We recently switched it to VARCHAR and saved 15 GB on just that table (it was highly written to). Furthermore, if you then have an index on that table and you want to include that column or make a composite index, you've just made your index file size larger.
对于您的应用程序,nvarchar没问题,因为数据库大小很小。说“永远使用nvarchar”是一种过于简单化的说法。如果你不需要存储像汉字或其他疯狂的字符,使用VARCHAR,它会使用更少的空间。我的前任在我目前的工作中设计了一些不用NVARCHAR的东西。我们最近将它切换到VARCHAR,仅在该表上就节省了15gb(它是高度编写的)。此外,如果您在该表上有一个索引,并且想要包含该列或创建一个复合索引,那么您只需将索引文件的大小放大。
Just be thoughtful in your decision; in SQL development and data definitions there seems to rarely be a "default answer" (other than avoid cursors at all costs, of course).
在做决定时要考虑周全;在SQL开发和数据定义中,似乎很少有“默认答案”(当然,除了不惜一切代价避免游标之外)。
#6
10
Since your application is small, there is essentially no appreciable cost increase to using nvarchar over varchar, and you save yourself potential headaches down the road if you have a need to store unicode data.
由于您的应用程序很小,所以在varchar上使用nvarchar本质上不会有明显的成本增加,而且如果需要存储unicode数据,您还可以省去将来可能遇到的麻烦。
#7
9
I hesitate to add yet another answer here as there are already quite a few, but a few points need to be made that have either not been made or not been made clearly.
我不愿在这里再增加一个答案,因为已经有相当多的答案,但是有一些要点需要提出,它们不是没有提出,就是没有明确提出。
First: Do not always use NVARCHAR
. That is a very dangerous, and often costly, attitude / approach. And it is no better to say "Never use cursors" since they are sometimes the most efficient means of solving a particular problem, and the common work-around of doing a WHILE
loop will almost always be slower than a properly done Cursor.
第一:不要总是使用NVARCHAR。这是一种非常危险且代价高昂的态度/方法。而且,最好说“永远不要使用游标”,因为它们有时是解决某个特定问题的最有效的方法,而执行WHILE循环的常见工作几乎总是比正确完成的游标速度慢。
The only time you should use the term "always" is when advising to "always do what is best for the situation". Granted that is often difficult to determine, especially when trying to balance short-term gains in development time (manager: "we need this feature -- that you didn't know about until just now -- a week ago!") with long-term maintenance costs (manager who initially pressured team to complete a 3-month project in a 3-week sprint: "why are we having these performance problems? How could we have possibly done X which has no flexibility? We can't afford a sprint or two to fix this. What can we get done in a week so we can get back to our priority items? And we definitely need to spend more time in design so this doesn't keep happening!").
你唯一应该用“总是”这个词的时候,就是当建议“总是做最适合这种情况的事情”的时候。假定通常很难确定,尤其是当试图平衡短期收益在开发时间(经理:“我们需要这个特性——你不知道,直到现在,一周前!”)与长期维护成本(经理最初迫使团队完成为期3个月项目冲刺:“为什么我们有这些性能问题?我们怎么可能做了没有弹性的X ?我们负担不起一两个sprint来解决这个问题。一周内我们能做些什么才能回到我们的优先事项上呢?我们确实需要花更多的时间在设计上,这样才不会一直发生!
Second: @gbn's answer touches on some very important points to consider when making certain data modeling decisions when the path isn't 100% clear. But there is even more to consider:
第二:@gbn的答案涉及在路径不是100%清晰的情况下做出某些数据建模决策时需要考虑的一些非常重要的问题。但还有更多需要考虑的问题:
- size of transaction log files
- 事务日志文件的大小
- time it takes to replicate (if using replication)
- 复制所需的时间(如果使用复制)
- time it takes to ETL (if ETLing)
- ETL的时间(如果ETLing)
- time it takes to ship logs to a remote system and restore (if using Log Shipping)
- 将日志传送到远程系统并恢复(如果使用日志传送)所需的时间
- size of backups
- 备份的大小
- length of time it takes to complete the backup
- 完成备份所需的时间长度
- length of time it takes to do a restore (this might be important some day ;-)
- 恢复的时间长度(这可能是重要的一天;-)
- size needed for tempdb
- tempdb所需尺寸
- performance of triggers (for inserted and deleted tables that are stored in tempdb)
- 触发器的性能(用于插入和删除的表,这些表存储在tempdb中)
- performance of row versioning (if using SNAPSHOT ISOLATION, since the version store is in tempdb)
- 行版本控制的性能(如果使用快照隔离,因为版本存储在tempdb中)
- ability to get new disk space when the CFO says that they just spent $1 million on a SAN last year and so they will not authorize another $250k for additional storage
- 首席财务官说他们去年刚花了100万美元在SAN上,所以他们不会再授权25万美元用于额外的存储
- length of time it takes to do INSERT and UPDATE operations
- 插入和更新操作所需的时间长度
- length of time it takes to do index maintenance
- 进行索引维护所需的时间长度
- etc, etc, etc.
- 等,等,等。
Wasting space has a huge cascade effect on the entire system. I wrote an article going into explicit detail on this topic: Disk Is Cheap! ORLY? (free registration required; sorry I don't control that policy).
浪费空间会对整个系统产生巨大的级联效应。我写了一篇关于这个主题的详细文章:磁盘很便宜!奥利吗?(免费登记要求;对不起,我不能控制那个政策)。
Third: While some answers are incorrectly focusing on the "this is a small app" aspect, and some are correctly suggesting to "use what is appropriate", none of the answers have provided real guidance to the O.P. An important detail mentioned in the Question is that this is a web page for their school. Great! So we can suggest that:
第三:虽然一些答案是错误地把注意力集中在“这是一个小程序”方面,和一些正确的建议“用适当的”是什么,答案都没有提供实际指导开口保险单中提到的问题是一个重要的细节:这是他们学校的网页。太棒了!所以我们可以这样建议:
- Fields for Student and/or Faculty names should probably be
NVARCHAR
since, over time, it is only getting more likely that names from other cultures will be showing up in those places. - 学生和/或教员的名字应该是NVARCHAR,因为随着时间的推移,来自其他文化的名字更有可能出现在这些地方。
- But for street address and city names? The purpose of the app was not stated (it would have been helpful) but assuming the address records, if any, pertain to just to a particular geographical region (i.e. a single language / culture), then use
VARCHAR
with the appropriate Code Page (which is determined from the Collation of the field). - 但是对于街道和城市名称呢?应用的目的并不是说(这将是有益的),但如果地址记录,如果有的话,属于只是为了一个特定的地理区域(如一个语言/文化),然后使用VARCHAR和适当的代码页(这是确定排序的字段)。
- If storing State and/or Country ISO codes (no need to store
INT
/TINYINT
since ISO codes are fixed length, human readable, and well, standard :) useCHAR(2)
for two letter codes andCHAR(3)
if using 3 letter codes. - 如果存储状态和/或国家ISO代码(不需要存储INT / TINYINT,因为ISO代码是固定长度的,人类可读的,标准的:)使用CHAR(2)表示两个字母代码和CHAR(3),如果使用三个字母代码。
- If storing postal codes (i.e. zip codes), use
VARCHAR
since it is an international standard to never use any letter outside of A-Z. And yes, still useVARCHAR
even if only storing US zip codes and not INT since zip codes are not numbers, they are strings, and some of them have a leading "0". - 如果存储邮政编码(即邮政编码),请使用VARCHAR,因为它是一个国际标准,从不使用任何字母以外的A-Z。是的,仍然使用VARCHAR即使只存储我们的邮政编码而不是INT,因为邮政编码不是数字,它们是字符串,其中一些有一个前导“0”。
- If storing email addresses and/or URLs, use
NVARCHAR
since both of those can now contain Unicode characters. - 如果要存储电子邮件地址和/或url,请使用NVARCHAR,因为这两个地址现在都可以包含Unicode字符。
- and so on....
- 等等....
Fourth: Now that you have NVARCHAR
data taking up twice as much space than it needs to for data that fits nicely into VARCHAR
("fits nicely" = doesn't turn into "?") and somehow, as if by magic, the application did grow and now there are millions of records in at least one of these fields where most rows are standard ASCII but some contain Unicode characters so you have to keep NVARCHAR
, consider the following:
第四:现在您已经NVARCHAR数据占用空间的两倍比它需要的数据很好地符合VARCHAR(“完美契合”=不变成“?”),不知何故,不可思议地,应用程序并成长,现在有数百万条记录至少在其中一个领域最行标准ASCII但是一些包含Unicode字符,所以你必须保持NVARCHAR,考虑以下:
-
If you are using SQL Server 2008 or newer, and are on Enterprise Edition, then you can enable Data Compression. Data Compression can (but won't "always") compress Unicode data in
NCHAR
andNVARCHAR
fields. The determining factors are:如果您正在使用SQL Server 2008或更新版本,并且使用企业版,那么您可以启用数据压缩。数据压缩可以(但不会“总是”)压缩NCHAR和NVARCHAR字段中的Unicode数据。决定性的因素是:
-
NCHAR(1 - 4000)
andNVARCHAR(1 - 4000)
use the Standard Compression Scheme for Unicode, but only starting in SQL Server 2008 R2, AND only for IN ROW data, not OVERFLOW! This appears to be better than the regular ROW / PAGE compression algorithm. - NCHAR(1 - 4000)和NVARCHAR(1 - 4000)使用Unicode的标准压缩方案,但是只从SQL Server 2008 R2开始,并且只用于行数据,而不是溢出!这似乎比常规的行/页压缩算法要好。
-
NVARCHAR(MAX)
andXML
(and I guess alsoVARBINARY(MAX)
,TEXT
, andNTEXT
) data that is IN ROW (not off row in LOB or OVERFLOW pages) can be at least PAGE compressed, and maybe also ROW compressed (not sure about this last one). - NVARCHAR(MAX)和XML(我猜也有VARBINARY(MAX)、TEXT和NTEXT)中的数据(不是LOB中的off ROW或OVERFLOW页面)至少可以压缩页面,也可以压缩行(不确定最后一个)。
- Any OFF ROW data, LOB or OVERLOW = No Compression For You!
- 任何离线数据,LOB或过低=不压缩!
-
-
If using a version older than 2008 or not on Enterprise Edition, you can have two fields: one
VARCHAR
and oneNVARCHAR
. For example, let's say you are storing URLs which are mostly all base ASCII characters (values 0 - 127) and hence fit intoVARCHAR
, but sometimes have Unicode characters. Your schema can include the following 3 fields:如果在企业版上使用比2008年更早的版本,您可以有两个字段:一个VARCHAR和一个NVARCHAR。例如,假设您存储的url大部分都是基本的ASCII字符(值0 - 127),因此适合VARCHAR,但有时也有Unicode字符。您的模式可以包括以下3个领域:
... URLa VARCHAR(2048) NULL, URLu NVARCHAR(2048) NULL, URL AS (ISNULL(CONVERT(NVARCHAR([URLa])), [URLu])), CONSTRAINT [CK_TableName_OneUrlMax] CHECK ( ([URLa] IS NOT NULL OR [URLu] IS NOT NULL) AND ([URLa] IS NULL OR [URLu] IS NULL)) );
In this model you only SELECT from the
[URL]
computed column. For inserting and updating, you determine which field to use by seeing if converting alters the incoming value, which has to be ofNVARCHAR
type:在这个模型中,您只能从[URL]计算的列中选择。对于插入和更新,您可以通过查看转换是否改变输入值来确定要使用哪个字段,该值必须是NVARCHAR类型:
INSERT INTO TableName (..., URLa, URLu) VALUES (..., IIF (CONVERT(VARCHAR(2048), @URL) = @URL, @URL, NULL), IIF (CONVERT(VARCHAR(2048), @URL) <> @URL, NULL, @URL) );
#8
7
For that last few years all of our projects have used NVARCHAR for everything, since all of these projects are multilingual. Imported data from external sources (e.g. an ASCII file, etc.) is up-converted to Unicode before being inserted into the database.
在过去的几年里,我们所有的项目都使用了NVARCHAR,因为所有这些项目都是多语言的。从外部源(例如ASCII文件等)导入的数据在插入数据库之前被向上转换为Unicode。
I've yet to encounter any performance-related issues from the larger indexes, etc. The indexes do use more memory, but memory is cheap.
我还没有遇到任何与性能相关的问题,比如较大的索引等等。索引确实需要更多的内存,但是内存很便宜。
Whether you use stored procedures or construct SQL on the fly ensure that all string constants are prefixed with N (e.g. SET @foo = N'Hello world.';) so the constant is also Unicode. This avoids any string type conversion at runtime.
无论您是使用存储过程还是在fly中构造SQL,都要确保所有的字符串常量都是以N为前缀的(例如,SET @foo = N' hello world),所以这个常量也是Unicode的。这避免了在运行时进行任何字符串类型转换。
YMMV.
YMMV。
#9
6
Generally speaking; Start out with the most expensive datatype that has the least constraints. Put it in production. If performance starts to be an issue, find out what's actually being stored in those nvarchar
columns. Is there any characters in there that wouldn't fit into varchar
? If not, switch to varchar. Don't try to pre-optimize before you know where the pain is. My guess is that the choice between nvarchar/varchar is not what's going to slow down your application in the foreseable future. There will be other parts of the application where performance tuning will give you much more bang for the bucks.
一般来说;从最昂贵的具有最少约束的数据类型开始。把它放在生产。如果性能开始成为问题,请找出那些nvarchar列中实际存储的内容。里面有没有不适合varchar的角色?如果没有,切换到varchar。在你知道痛苦在哪里之前,不要试图预先优化。我的猜测是,nvarchar/varchar之间的选择不是在可预见的未来中减慢应用程序的速度。在应用程序的其他部分中,性能调优将为您带来更大的好处。
#10
6
I can speak from experience on this, beware of nvarchar
. Unless you absolutely require it this data field type destroys performance on larger database. I inherited a database that was hurting in terms of performance and space. We were able to reduce a 30GB database in size by 70%! There were some other modifications made to help with performance but I'm sure the varchar
's helped out significantly with that as well. If your database has the potential for growing tables to a million + records stay away from nvarchar
at all costs.
从我的经验来看,小心nvarchar。除非您绝对需要,否则这种数据字段类型会破坏大型数据库的性能。我继承了一个在性能和空间方面有问题的数据库。我们能够将一个30GB的数据库缩小70%!还有一些其他的修改来帮助性能,但是我相信varchar也在这方面有很大的帮助。如果您的数据库有可能将表增加到100万多条记录,那么请不惜任何代价远离nvarchar。
#11
4
I deal with this question at work often:
我在工作中经常遇到这个问题:
-
FTP feeds of inventory and pricing - Item descriptions and other text were in nvarchar when varchar worked fine. Converting these to varchar reduced file size almost in half and really helped with uploads.
当varchar正常工作时,文件目录和价格的FTP提要-项目描述和其他文本在nvarchar中。将这些文件转换为varchar将文件大小减少了几乎一半,并且确实有助于上传。
-
The above scenario worked fine until someone put a special character in the item description (maybe trademark, can't remember)
上面的场景很好,直到有人在商品描述中加入了一个特殊的字符(可能是商标,记不住)
I still do not use nvarchar every time over varchar. If there is any doubt or potential for special characters, I use nvarchar. I find I use varchar mostly when I am in 100% control of what is populating the field.
我仍然不会每次在varchar上使用nvarchar。如果对特殊角色有任何疑问或潜力,我使用nvarchar。我发现我使用varchar的时候大部分是在我100%控制这个领域的时候。
#12
3
Why, in all this discussion, has there been no mention of UTF-8? Being able to store the full unicode span of characters does not mean one has to always allocate two-bytes-per-character (or "code point" to use the UNICODE term). All of ASCII is UTF-8. Does SQL Server check for VARCHAR() fields that the text is strict ASCII (i.e. top byte bit zero)? I would hope not.
在所有这些讨论中,为什么没有提到UTF-8?能够存储完整的unicode字符跨度并不意味着必须总是为每个字符分配2字节(或“代码点”来使用unicode术语)。所有ASCII码都是UTF-8。SQL Server是否检查VARCHAR()字段的文本是否是严格的ASCII(即最高字节位为零)?我希望不是这样。
If then you want to store unicode and want compatibility with older ASCII-only applications, I would think using VARCHAR() and UTF-8 would be the magic bullet: It only uses more space when it needs to.
如果您希望存储unicode并希望与只支持ascii的旧应用程序兼容,那么我认为使用VARCHAR()和UTF-8将是一种神奇的方法:它只在需要时使用更多的空间。
For those of you unfamiliar with UTF-8, might I recommend a primer.
对于那些不熟悉UTF-8的人,我可以推荐一本入门书吗?
#13
1
There'll be exceptional instances when you'll want to deliberately restrict the data type to ensure it doesn't contain characters from a certain set. For example, I had a scenario where I needed to store the domain name in a database. Internationalisation for domain names wasn't reliable at the time so it was better to restrict the input at the base level, and help to avoid any potential issues.
当您想要故意限制数据类型以确保它不包含特定集合中的字符时,将会出现异常情况。例如,我有一个场景,我需要在数据库中存储域名。域名的国际化在当时并不可靠,所以最好在基本层面上限制输入,并帮助避免任何潜在的问题。
#14
0
If you are using NVARCHAR
just because a system stored procedure requires it, the most frequent occurrence being inexplicably sp_executesql
, and your dynamic SQL is very long, you would be better off from performance perspective doing all string manipulations (concatenation, replacement etc.) in VARCHAR
then converting the end result to NVARCHAR
and feeding it into the proc parameter. So no, do not always use NVARCHAR
!
如果您正在使用NVARCHAR仅仅因为一个系统存储过程需要,最常见的发生被莫名其妙地sp_executesql,和动态SQL很长,你会更好从性能的角度做字符串操作在VARCHAR(串联、更换等),然后将结果转换为NVARCHAR喂养成proc参数。所以不,不要总是使用NVARCHAR!
#1
141
Always use nvarchar.
总是使用nvarchar。
You may never need the double-byte characters for most applications. However, if you need to support double-byte languages and you only have single-byte support in your database schema it's really expensive to go back and modify throughout your application.
对于大多数应用程序,您可能永远不需要双字节字符。但是,如果您需要支持双字节语言,并且您的数据库模式中只有单字节支持,那么在整个应用程序中返回和修改是非常昂贵的。
The cost of migrating one application from varchar to nvarchar will be much more than the little bit of extra disk space you'll use in most applications.
将一个应用程序从varchar迁移到nvarchar的成本将远远超过在大多数应用程序中使用的少量额外磁盘空间。
#2
219
Disk space is not the issue... but memory and performance will be. Double the page reads, double index size, strange LIKE and = constant behaviour etc
磁盘空间不是问题所在……但是记忆和性能将会是。双读页面,双索引大小,奇怪的LIKE和=常量行为等
Do you need to store Chinese etc script? Yes or no...
你需要存储中文等脚本吗?是或否……
And from MS BOL "Storage and Performance Effects of Unicode"
以及MS BOL中的“Unicode的存储和性能影响”
Edit:
编辑:
Recent SO question highlighting how bad nvarchar performance can be...
最近关于nvarchar性能有多糟糕的问题……
SQL Server uses high CPU when searching inside nvarchar strings
SQL Server在nvarchar字符串中搜索时使用高CPU。
#3
59
Be consistent! JOIN-ing a VARCHAR to NVARCHAR has a big performance hit.
是一致的!将一个VARCHAR连接到NVARCHAR是一个巨大的性能打击。
#4
39
nvarchar is going to have significant overhead in memory, storage, working set and indexing, so if the specs dictate that it really will never be necessary, don't bother.
nvarchar将在内存、存储、工作集和索引方面有很大的开销,所以如果规格说明它真的是不必要的,不要麻烦。
I would not have a hard and fast "always nvarchar" rule because it can be a complete waste in many situations - particularly ETL from ASCII/EBCDIC or identifiers and code columns which are often keys and foreign keys.
我不会有一个硬而快速的“总是nvarchar”规则,因为它在许多情况下是完全的浪费——特别是来自ASCII/EBCDIC或标识符和代码列的ETL,它们通常是键和外键。
On the other hand, there are plenty of cases of columns, where I would be sure to ask this question early and if I didn't get a hard and fast answer immediately, I would make the column nvarchar.
另一方面,有很多列的情况,我肯定会提前问这个问题,如果我没有得到一个快速的答案,我会把列写成nvarchar。
#5
20
For your application, nvarchar is fine because the database size is small. Saying "always use nvarchar" is a vast oversimplification. If you're not required to store things like Kanji or other crazy characters, use VARCHAR, it'll use a lot less space. My predecessor at my current job designed something using NVARCHAR when it wasn't needed. We recently switched it to VARCHAR and saved 15 GB on just that table (it was highly written to). Furthermore, if you then have an index on that table and you want to include that column or make a composite index, you've just made your index file size larger.
对于您的应用程序,nvarchar没问题,因为数据库大小很小。说“永远使用nvarchar”是一种过于简单化的说法。如果你不需要存储像汉字或其他疯狂的字符,使用VARCHAR,它会使用更少的空间。我的前任在我目前的工作中设计了一些不用NVARCHAR的东西。我们最近将它切换到VARCHAR,仅在该表上就节省了15gb(它是高度编写的)。此外,如果您在该表上有一个索引,并且想要包含该列或创建一个复合索引,那么您只需将索引文件的大小放大。
Just be thoughtful in your decision; in SQL development and data definitions there seems to rarely be a "default answer" (other than avoid cursors at all costs, of course).
在做决定时要考虑周全;在SQL开发和数据定义中,似乎很少有“默认答案”(当然,除了不惜一切代价避免游标之外)。
#6
10
Since your application is small, there is essentially no appreciable cost increase to using nvarchar over varchar, and you save yourself potential headaches down the road if you have a need to store unicode data.
由于您的应用程序很小,所以在varchar上使用nvarchar本质上不会有明显的成本增加,而且如果需要存储unicode数据,您还可以省去将来可能遇到的麻烦。
#7
9
I hesitate to add yet another answer here as there are already quite a few, but a few points need to be made that have either not been made or not been made clearly.
我不愿在这里再增加一个答案,因为已经有相当多的答案,但是有一些要点需要提出,它们不是没有提出,就是没有明确提出。
First: Do not always use NVARCHAR
. That is a very dangerous, and often costly, attitude / approach. And it is no better to say "Never use cursors" since they are sometimes the most efficient means of solving a particular problem, and the common work-around of doing a WHILE
loop will almost always be slower than a properly done Cursor.
第一:不要总是使用NVARCHAR。这是一种非常危险且代价高昂的态度/方法。而且,最好说“永远不要使用游标”,因为它们有时是解决某个特定问题的最有效的方法,而执行WHILE循环的常见工作几乎总是比正确完成的游标速度慢。
The only time you should use the term "always" is when advising to "always do what is best for the situation". Granted that is often difficult to determine, especially when trying to balance short-term gains in development time (manager: "we need this feature -- that you didn't know about until just now -- a week ago!") with long-term maintenance costs (manager who initially pressured team to complete a 3-month project in a 3-week sprint: "why are we having these performance problems? How could we have possibly done X which has no flexibility? We can't afford a sprint or two to fix this. What can we get done in a week so we can get back to our priority items? And we definitely need to spend more time in design so this doesn't keep happening!").
你唯一应该用“总是”这个词的时候,就是当建议“总是做最适合这种情况的事情”的时候。假定通常很难确定,尤其是当试图平衡短期收益在开发时间(经理:“我们需要这个特性——你不知道,直到现在,一周前!”)与长期维护成本(经理最初迫使团队完成为期3个月项目冲刺:“为什么我们有这些性能问题?我们怎么可能做了没有弹性的X ?我们负担不起一两个sprint来解决这个问题。一周内我们能做些什么才能回到我们的优先事项上呢?我们确实需要花更多的时间在设计上,这样才不会一直发生!
Second: @gbn's answer touches on some very important points to consider when making certain data modeling decisions when the path isn't 100% clear. But there is even more to consider:
第二:@gbn的答案涉及在路径不是100%清晰的情况下做出某些数据建模决策时需要考虑的一些非常重要的问题。但还有更多需要考虑的问题:
- size of transaction log files
- 事务日志文件的大小
- time it takes to replicate (if using replication)
- 复制所需的时间(如果使用复制)
- time it takes to ETL (if ETLing)
- ETL的时间(如果ETLing)
- time it takes to ship logs to a remote system and restore (if using Log Shipping)
- 将日志传送到远程系统并恢复(如果使用日志传送)所需的时间
- size of backups
- 备份的大小
- length of time it takes to complete the backup
- 完成备份所需的时间长度
- length of time it takes to do a restore (this might be important some day ;-)
- 恢复的时间长度(这可能是重要的一天;-)
- size needed for tempdb
- tempdb所需尺寸
- performance of triggers (for inserted and deleted tables that are stored in tempdb)
- 触发器的性能(用于插入和删除的表,这些表存储在tempdb中)
- performance of row versioning (if using SNAPSHOT ISOLATION, since the version store is in tempdb)
- 行版本控制的性能(如果使用快照隔离,因为版本存储在tempdb中)
- ability to get new disk space when the CFO says that they just spent $1 million on a SAN last year and so they will not authorize another $250k for additional storage
- 首席财务官说他们去年刚花了100万美元在SAN上,所以他们不会再授权25万美元用于额外的存储
- length of time it takes to do INSERT and UPDATE operations
- 插入和更新操作所需的时间长度
- length of time it takes to do index maintenance
- 进行索引维护所需的时间长度
- etc, etc, etc.
- 等,等,等。
Wasting space has a huge cascade effect on the entire system. I wrote an article going into explicit detail on this topic: Disk Is Cheap! ORLY? (free registration required; sorry I don't control that policy).
浪费空间会对整个系统产生巨大的级联效应。我写了一篇关于这个主题的详细文章:磁盘很便宜!奥利吗?(免费登记要求;对不起,我不能控制那个政策)。
Third: While some answers are incorrectly focusing on the "this is a small app" aspect, and some are correctly suggesting to "use what is appropriate", none of the answers have provided real guidance to the O.P. An important detail mentioned in the Question is that this is a web page for their school. Great! So we can suggest that:
第三:虽然一些答案是错误地把注意力集中在“这是一个小程序”方面,和一些正确的建议“用适当的”是什么,答案都没有提供实际指导开口保险单中提到的问题是一个重要的细节:这是他们学校的网页。太棒了!所以我们可以这样建议:
- Fields for Student and/or Faculty names should probably be
NVARCHAR
since, over time, it is only getting more likely that names from other cultures will be showing up in those places. - 学生和/或教员的名字应该是NVARCHAR,因为随着时间的推移,来自其他文化的名字更有可能出现在这些地方。
- But for street address and city names? The purpose of the app was not stated (it would have been helpful) but assuming the address records, if any, pertain to just to a particular geographical region (i.e. a single language / culture), then use
VARCHAR
with the appropriate Code Page (which is determined from the Collation of the field). - 但是对于街道和城市名称呢?应用的目的并不是说(这将是有益的),但如果地址记录,如果有的话,属于只是为了一个特定的地理区域(如一个语言/文化),然后使用VARCHAR和适当的代码页(这是确定排序的字段)。
- If storing State and/or Country ISO codes (no need to store
INT
/TINYINT
since ISO codes are fixed length, human readable, and well, standard :) useCHAR(2)
for two letter codes andCHAR(3)
if using 3 letter codes. - 如果存储状态和/或国家ISO代码(不需要存储INT / TINYINT,因为ISO代码是固定长度的,人类可读的,标准的:)使用CHAR(2)表示两个字母代码和CHAR(3),如果使用三个字母代码。
- If storing postal codes (i.e. zip codes), use
VARCHAR
since it is an international standard to never use any letter outside of A-Z. And yes, still useVARCHAR
even if only storing US zip codes and not INT since zip codes are not numbers, they are strings, and some of them have a leading "0". - 如果存储邮政编码(即邮政编码),请使用VARCHAR,因为它是一个国际标准,从不使用任何字母以外的A-Z。是的,仍然使用VARCHAR即使只存储我们的邮政编码而不是INT,因为邮政编码不是数字,它们是字符串,其中一些有一个前导“0”。
- If storing email addresses and/or URLs, use
NVARCHAR
since both of those can now contain Unicode characters. - 如果要存储电子邮件地址和/或url,请使用NVARCHAR,因为这两个地址现在都可以包含Unicode字符。
- and so on....
- 等等....
Fourth: Now that you have NVARCHAR
data taking up twice as much space than it needs to for data that fits nicely into VARCHAR
("fits nicely" = doesn't turn into "?") and somehow, as if by magic, the application did grow and now there are millions of records in at least one of these fields where most rows are standard ASCII but some contain Unicode characters so you have to keep NVARCHAR
, consider the following:
第四:现在您已经NVARCHAR数据占用空间的两倍比它需要的数据很好地符合VARCHAR(“完美契合”=不变成“?”),不知何故,不可思议地,应用程序并成长,现在有数百万条记录至少在其中一个领域最行标准ASCII但是一些包含Unicode字符,所以你必须保持NVARCHAR,考虑以下:
-
If you are using SQL Server 2008 or newer, and are on Enterprise Edition, then you can enable Data Compression. Data Compression can (but won't "always") compress Unicode data in
NCHAR
andNVARCHAR
fields. The determining factors are:如果您正在使用SQL Server 2008或更新版本,并且使用企业版,那么您可以启用数据压缩。数据压缩可以(但不会“总是”)压缩NCHAR和NVARCHAR字段中的Unicode数据。决定性的因素是:
-
NCHAR(1 - 4000)
andNVARCHAR(1 - 4000)
use the Standard Compression Scheme for Unicode, but only starting in SQL Server 2008 R2, AND only for IN ROW data, not OVERFLOW! This appears to be better than the regular ROW / PAGE compression algorithm. - NCHAR(1 - 4000)和NVARCHAR(1 - 4000)使用Unicode的标准压缩方案,但是只从SQL Server 2008 R2开始,并且只用于行数据,而不是溢出!这似乎比常规的行/页压缩算法要好。
-
NVARCHAR(MAX)
andXML
(and I guess alsoVARBINARY(MAX)
,TEXT
, andNTEXT
) data that is IN ROW (not off row in LOB or OVERFLOW pages) can be at least PAGE compressed, and maybe also ROW compressed (not sure about this last one). - NVARCHAR(MAX)和XML(我猜也有VARBINARY(MAX)、TEXT和NTEXT)中的数据(不是LOB中的off ROW或OVERFLOW页面)至少可以压缩页面,也可以压缩行(不确定最后一个)。
- Any OFF ROW data, LOB or OVERLOW = No Compression For You!
- 任何离线数据,LOB或过低=不压缩!
-
-
If using a version older than 2008 or not on Enterprise Edition, you can have two fields: one
VARCHAR
and oneNVARCHAR
. For example, let's say you are storing URLs which are mostly all base ASCII characters (values 0 - 127) and hence fit intoVARCHAR
, but sometimes have Unicode characters. Your schema can include the following 3 fields:如果在企业版上使用比2008年更早的版本,您可以有两个字段:一个VARCHAR和一个NVARCHAR。例如,假设您存储的url大部分都是基本的ASCII字符(值0 - 127),因此适合VARCHAR,但有时也有Unicode字符。您的模式可以包括以下3个领域:
... URLa VARCHAR(2048) NULL, URLu NVARCHAR(2048) NULL, URL AS (ISNULL(CONVERT(NVARCHAR([URLa])), [URLu])), CONSTRAINT [CK_TableName_OneUrlMax] CHECK ( ([URLa] IS NOT NULL OR [URLu] IS NOT NULL) AND ([URLa] IS NULL OR [URLu] IS NULL)) );
In this model you only SELECT from the
[URL]
computed column. For inserting and updating, you determine which field to use by seeing if converting alters the incoming value, which has to be ofNVARCHAR
type:在这个模型中,您只能从[URL]计算的列中选择。对于插入和更新,您可以通过查看转换是否改变输入值来确定要使用哪个字段,该值必须是NVARCHAR类型:
INSERT INTO TableName (..., URLa, URLu) VALUES (..., IIF (CONVERT(VARCHAR(2048), @URL) = @URL, @URL, NULL), IIF (CONVERT(VARCHAR(2048), @URL) <> @URL, NULL, @URL) );
#8
7
For that last few years all of our projects have used NVARCHAR for everything, since all of these projects are multilingual. Imported data from external sources (e.g. an ASCII file, etc.) is up-converted to Unicode before being inserted into the database.
在过去的几年里,我们所有的项目都使用了NVARCHAR,因为所有这些项目都是多语言的。从外部源(例如ASCII文件等)导入的数据在插入数据库之前被向上转换为Unicode。
I've yet to encounter any performance-related issues from the larger indexes, etc. The indexes do use more memory, but memory is cheap.
我还没有遇到任何与性能相关的问题,比如较大的索引等等。索引确实需要更多的内存,但是内存很便宜。
Whether you use stored procedures or construct SQL on the fly ensure that all string constants are prefixed with N (e.g. SET @foo = N'Hello world.';) so the constant is also Unicode. This avoids any string type conversion at runtime.
无论您是使用存储过程还是在fly中构造SQL,都要确保所有的字符串常量都是以N为前缀的(例如,SET @foo = N' hello world),所以这个常量也是Unicode的。这避免了在运行时进行任何字符串类型转换。
YMMV.
YMMV。
#9
6
Generally speaking; Start out with the most expensive datatype that has the least constraints. Put it in production. If performance starts to be an issue, find out what's actually being stored in those nvarchar
columns. Is there any characters in there that wouldn't fit into varchar
? If not, switch to varchar. Don't try to pre-optimize before you know where the pain is. My guess is that the choice between nvarchar/varchar is not what's going to slow down your application in the foreseable future. There will be other parts of the application where performance tuning will give you much more bang for the bucks.
一般来说;从最昂贵的具有最少约束的数据类型开始。把它放在生产。如果性能开始成为问题,请找出那些nvarchar列中实际存储的内容。里面有没有不适合varchar的角色?如果没有,切换到varchar。在你知道痛苦在哪里之前,不要试图预先优化。我的猜测是,nvarchar/varchar之间的选择不是在可预见的未来中减慢应用程序的速度。在应用程序的其他部分中,性能调优将为您带来更大的好处。
#10
6
I can speak from experience on this, beware of nvarchar
. Unless you absolutely require it this data field type destroys performance on larger database. I inherited a database that was hurting in terms of performance and space. We were able to reduce a 30GB database in size by 70%! There were some other modifications made to help with performance but I'm sure the varchar
's helped out significantly with that as well. If your database has the potential for growing tables to a million + records stay away from nvarchar
at all costs.
从我的经验来看,小心nvarchar。除非您绝对需要,否则这种数据字段类型会破坏大型数据库的性能。我继承了一个在性能和空间方面有问题的数据库。我们能够将一个30GB的数据库缩小70%!还有一些其他的修改来帮助性能,但是我相信varchar也在这方面有很大的帮助。如果您的数据库有可能将表增加到100万多条记录,那么请不惜任何代价远离nvarchar。
#11
4
I deal with this question at work often:
我在工作中经常遇到这个问题:
-
FTP feeds of inventory and pricing - Item descriptions and other text were in nvarchar when varchar worked fine. Converting these to varchar reduced file size almost in half and really helped with uploads.
当varchar正常工作时,文件目录和价格的FTP提要-项目描述和其他文本在nvarchar中。将这些文件转换为varchar将文件大小减少了几乎一半,并且确实有助于上传。
-
The above scenario worked fine until someone put a special character in the item description (maybe trademark, can't remember)
上面的场景很好,直到有人在商品描述中加入了一个特殊的字符(可能是商标,记不住)
I still do not use nvarchar every time over varchar. If there is any doubt or potential for special characters, I use nvarchar. I find I use varchar mostly when I am in 100% control of what is populating the field.
我仍然不会每次在varchar上使用nvarchar。如果对特殊角色有任何疑问或潜力,我使用nvarchar。我发现我使用varchar的时候大部分是在我100%控制这个领域的时候。
#12
3
Why, in all this discussion, has there been no mention of UTF-8? Being able to store the full unicode span of characters does not mean one has to always allocate two-bytes-per-character (or "code point" to use the UNICODE term). All of ASCII is UTF-8. Does SQL Server check for VARCHAR() fields that the text is strict ASCII (i.e. top byte bit zero)? I would hope not.
在所有这些讨论中,为什么没有提到UTF-8?能够存储完整的unicode字符跨度并不意味着必须总是为每个字符分配2字节(或“代码点”来使用unicode术语)。所有ASCII码都是UTF-8。SQL Server是否检查VARCHAR()字段的文本是否是严格的ASCII(即最高字节位为零)?我希望不是这样。
If then you want to store unicode and want compatibility with older ASCII-only applications, I would think using VARCHAR() and UTF-8 would be the magic bullet: It only uses more space when it needs to.
如果您希望存储unicode并希望与只支持ascii的旧应用程序兼容,那么我认为使用VARCHAR()和UTF-8将是一种神奇的方法:它只在需要时使用更多的空间。
For those of you unfamiliar with UTF-8, might I recommend a primer.
对于那些不熟悉UTF-8的人,我可以推荐一本入门书吗?
#13
1
There'll be exceptional instances when you'll want to deliberately restrict the data type to ensure it doesn't contain characters from a certain set. For example, I had a scenario where I needed to store the domain name in a database. Internationalisation for domain names wasn't reliable at the time so it was better to restrict the input at the base level, and help to avoid any potential issues.
当您想要故意限制数据类型以确保它不包含特定集合中的字符时,将会出现异常情况。例如,我有一个场景,我需要在数据库中存储域名。域名的国际化在当时并不可靠,所以最好在基本层面上限制输入,并帮助避免任何潜在的问题。
#14
0
If you are using NVARCHAR
just because a system stored procedure requires it, the most frequent occurrence being inexplicably sp_executesql
, and your dynamic SQL is very long, you would be better off from performance perspective doing all string manipulations (concatenation, replacement etc.) in VARCHAR
then converting the end result to NVARCHAR
and feeding it into the proc parameter. So no, do not always use NVARCHAR
!
如果您正在使用NVARCHAR仅仅因为一个系统存储过程需要,最常见的发生被莫名其妙地sp_executesql,和动态SQL很长,你会更好从性能的角度做字符串操作在VARCHAR(串联、更换等),然后将结果转换为NVARCHAR喂养成proc参数。所以不,不要总是使用NVARCHAR!