在SQL服务器上使用varchar(MAX)和文本

时间:2021-01-25 10:35:58

I just read that the VARCHAR(MAX) datatype (which can store close to 2GB of char data) is the recommended replacement for the TEXT datatype in SQL Server 2005 and Next SQL SERVER versions.

我刚读到VARCHAR(MAX) datatype(它可以存储接近2GB的char数据)是推荐的替代SQL Server 2005和下一个SQL Server版本的文本数据类型。

If I want to search inside a column for any string, which operation is quicker?

如果我想在列中搜索任何字符串,哪个操作更快?

  1. Using a the LIKE clause against a VARCHAR(MAX) column?

    对VARCHAR(MAX)列使用LIKE子句?

    WHERE COL1 LIKE '%search string%'

    其中COL1喜欢'%search string%'

  2. Using the TEXT column and put a Full Text Index/Catalog on this column, and then search using the CONTAINS clause?

    使用文本列并在该列上放置全文索引/目录,然后使用CONTAINS子句进行搜索?

    WHERE CONTAINS (Col1, 'MyToken')

    在包含(Col1 MyToken)

5 个解决方案

#1


280  

The VARCHAR(MAX) type is a replacement for TEXT. The basic difference is that a TEXT type will always store the data in a blob whereas the VARCHAR(MAX) type will attempt to store the data directly in the row unless it exceeds the 8k limitation and at that point it stores it in a blob.

VARCHAR(MAX)类型是文本的替换。基本的区别是,文本类型将始终将数据存储在blob中,而VARCHAR(MAX)类型将尝试将数据直接存储在行中,除非它超过8k限制,并在此时将其存储在blob中。

Using the LIKE statement is identical between the two datatypes. The additional functionality VARCHAR(MAX) gives you is that it is also can be used with = and GROUP BY as any other VARCHAR column can be. However, if you do have a lot of data you will have a huge performance issue using these methods.

使用LIKE语句在两个数据类型之间是相同的。VARCHAR(MAX)提供的附加功能是,它还可以与=和GROUP一起使用,就像任何其他VARCHAR列一样。但是,如果您有大量的数据,那么使用这些方法将会有很大的性能问题。

In regard to if you should use LIKE to search, or if you should use Full Text Indexing and CONTAINS. This question is the same regardless of VARCHAR(MAX) or TEXT.

关于是否应该使用搜索,或者是否应该使用全文索引和包含。无论VARCHAR(MAX)还是文本,这个问题都是一样的。

If you are searching large amounts of text and performance is key then you should use a Full Text Index.

如果您正在搜索大量的文本,并且性能是关键,那么您应该使用全文索引。

LIKE is simpler to implement and is often suitable for small amounts of data, but it has extremely poor performance with large data due to its inability to use an index.

LIKE的实现更简单,通常适用于少量的数据,但是由于无法使用索引,它在大数据上的性能非常差。

#2


17  

For large text, the full text index is much faster. But you can full text index varchar(max)as well.

对于大型文本,全文索引要快得多。但是也可以使用全文索引varchar(max)。

#3


15  

You can't search a text field without converting it from text to varchar.

如果不将文本字段从文本转换为varchar,就无法搜索文本字段。

declare @table table (a text)
insert into @table values ('a')
insert into @table values ('a')
insert into @table values ('b')
insert into @table values ('c')
insert into @table values ('d')


select *
from @table
where a ='a'

This give an error:

这给了一个错误:

The data types text and varchar are incompatible in the equal to operator.

Wheras this does not:

Wheras这并不:

declare @table table (a varchar(max))

Interestingly, LIKE still works, i.e.

有趣的是,LIKE仍然有效。

where a like '%a%'

#4


7  

  • Basic Definition
  • 基本定义

TEXT and VarChar(MAX) are Non-Unicode large Variable Length character data type, which can store maximum of 2147483647 Non-Unicode characters (i.e. maximum storage capacity is: 2GB).

文本和VarChar(MAX)是非unicode大型可变长度字符数据类型,可以存储最多2147483647个非unicode字符(即最大存储容量为:2GB)。

  • Which one to Use?
  • 使用哪一个?

As per MSDN link Microfost is suggesting to avoid using the Text datatype and it will be removed in a future versions of Sql Server. Varchar(Max) is the suggested data type for storing the large string values instead of Text data type.

根据MSDN link Microfost建议避免使用文本数据类型,它将在未来的Sql Server版本中被删除。Varchar(Max)是用于存储大字符串值而不是文本数据类型的建议数据类型。

  • In-Row or Out-of-Row Storage
  • 线性或Out-of-Row存储

Data of a Text type column is stored out-of-row in a separate LOB data pages. The row in the table data page will only have a 16 byte pointer to the LOB data page where the actual data is present. While Data of a Varchar(max) type column is stored in-row if it is less than or equal to 8000 byte. If Varchar(max) column value is crossing the 8000 bytes then the Varchar(max) column value is stored in a separate LOB data pages and row will only have a 16 byte pointer to the LOB data page where the actual data is present. So In-Row Varchar(Max) is good for searches and retrieval.

文本类型列的数据存储在单独的LOB数据页中。表数据页中的行只有一个指向实际数据所在的LOB数据页的16字节指针。而Varchar(max)类型列的数据如果小于或等于8000字节,则存储为行。如果Varchar(max)列值跨越了8000个字节,那么Varchar(max)列值将存储在一个单独的LOB数据页中,并且行将只有一个指向实际数据所在的LOB数据页的16字节指针。所以行内Varchar(Max)对于搜索和检索很有用。

  • Supported/Unsupported Functionalities
  • 支持/不支持的功能

Some of the string functions, operators or the constructs which doesn’t work on the Text type column, but they do work on VarChar(Max) type column.

有些字符串函数、操作符或结构不能在文本类型列上工作,但它们可以在VarChar(Max)类型列上工作。

  1. = Equal to Operator on VarChar(Max) type column
  2. = VarChar(Max)类型列上的运算符
  3. Group by clause on VarChar(Max) type column

    在VarChar(Max)类型列上按子句分组

    • System IO Considerations
    • 系统IO的考虑

As we know that the VarChar(Max) type column values are stored out-of-row only if the length of the value to be stored in it is greater than 8000 bytes or there is not enough space in the row, otherwise it will store it in-row. So if most of the values stored in the VarChar(Max) column are large and stored out-of-row, the data retrieval behavior will almost similar to the one that of the Text type column.

正如我们所知道的,VarChar(Max)类型的列值只在存储在它的值的长度大于8000字节或行中没有足够的空间的情况下存储,否则它会将其存储到行中。因此,如果VarChar(Max)列中存储的大多数值都很大且存储在行外,那么数据检索行为将几乎与文本类型列中的值相似。

But if most of the values stored in VarChar(Max) type columns are small enough to store in-row. Then retrieval of the data where LOB columns are not included requires the more number of data pages to read as the LOB column value is stored in-row in the same data page where the non-LOB column values are stored. But if the select query includes LOB column then it requires less number of pages to read for the data retrieval compared to the Text type columns.

但是,如果VarChar(Max)类型列中存储的大多数值足够小,可以存储在行中。然后检索不包含LOB列的数据时,需要读取更多的数据页,因为LOB列值存储在存储非LOB列值的同一数据页中。但是,如果select查询包含LOB列,那么与文本类型列相比,它需要更少的页面用于数据检索。

Conclusion

结论

Use VarChar(MAX) data type rather than TEXT for good performance.

使用VarChar(MAX)数据类型而不是文本来获得良好的性能。

Source

#5


4  

If using MS Access (especially older versions like 2003) you are forced to use TEXT datatype on SQL Server as MS Access does not recognize nvarchar(MAX) as a Memo field in Access, whereas TEXT is recognized as a Memo-field.

如果使用MS Access(特别是较早的版本,如2003),您将*在SQL Server上使用文本数据类型,因为MS Access不承认nvarchar(MAX)作为一个Memo字段进行访问,而文本被识别为一个Memo字段。

#1


280  

The VARCHAR(MAX) type is a replacement for TEXT. The basic difference is that a TEXT type will always store the data in a blob whereas the VARCHAR(MAX) type will attempt to store the data directly in the row unless it exceeds the 8k limitation and at that point it stores it in a blob.

VARCHAR(MAX)类型是文本的替换。基本的区别是,文本类型将始终将数据存储在blob中,而VARCHAR(MAX)类型将尝试将数据直接存储在行中,除非它超过8k限制,并在此时将其存储在blob中。

Using the LIKE statement is identical between the two datatypes. The additional functionality VARCHAR(MAX) gives you is that it is also can be used with = and GROUP BY as any other VARCHAR column can be. However, if you do have a lot of data you will have a huge performance issue using these methods.

使用LIKE语句在两个数据类型之间是相同的。VARCHAR(MAX)提供的附加功能是,它还可以与=和GROUP一起使用,就像任何其他VARCHAR列一样。但是,如果您有大量的数据,那么使用这些方法将会有很大的性能问题。

In regard to if you should use LIKE to search, or if you should use Full Text Indexing and CONTAINS. This question is the same regardless of VARCHAR(MAX) or TEXT.

关于是否应该使用搜索,或者是否应该使用全文索引和包含。无论VARCHAR(MAX)还是文本,这个问题都是一样的。

If you are searching large amounts of text and performance is key then you should use a Full Text Index.

如果您正在搜索大量的文本,并且性能是关键,那么您应该使用全文索引。

LIKE is simpler to implement and is often suitable for small amounts of data, but it has extremely poor performance with large data due to its inability to use an index.

LIKE的实现更简单,通常适用于少量的数据,但是由于无法使用索引,它在大数据上的性能非常差。

#2


17  

For large text, the full text index is much faster. But you can full text index varchar(max)as well.

对于大型文本,全文索引要快得多。但是也可以使用全文索引varchar(max)。

#3


15  

You can't search a text field without converting it from text to varchar.

如果不将文本字段从文本转换为varchar,就无法搜索文本字段。

declare @table table (a text)
insert into @table values ('a')
insert into @table values ('a')
insert into @table values ('b')
insert into @table values ('c')
insert into @table values ('d')


select *
from @table
where a ='a'

This give an error:

这给了一个错误:

The data types text and varchar are incompatible in the equal to operator.

Wheras this does not:

Wheras这并不:

declare @table table (a varchar(max))

Interestingly, LIKE still works, i.e.

有趣的是,LIKE仍然有效。

where a like '%a%'

#4


7  

  • Basic Definition
  • 基本定义

TEXT and VarChar(MAX) are Non-Unicode large Variable Length character data type, which can store maximum of 2147483647 Non-Unicode characters (i.e. maximum storage capacity is: 2GB).

文本和VarChar(MAX)是非unicode大型可变长度字符数据类型,可以存储最多2147483647个非unicode字符(即最大存储容量为:2GB)。

  • Which one to Use?
  • 使用哪一个?

As per MSDN link Microfost is suggesting to avoid using the Text datatype and it will be removed in a future versions of Sql Server. Varchar(Max) is the suggested data type for storing the large string values instead of Text data type.

根据MSDN link Microfost建议避免使用文本数据类型,它将在未来的Sql Server版本中被删除。Varchar(Max)是用于存储大字符串值而不是文本数据类型的建议数据类型。

  • In-Row or Out-of-Row Storage
  • 线性或Out-of-Row存储

Data of a Text type column is stored out-of-row in a separate LOB data pages. The row in the table data page will only have a 16 byte pointer to the LOB data page where the actual data is present. While Data of a Varchar(max) type column is stored in-row if it is less than or equal to 8000 byte. If Varchar(max) column value is crossing the 8000 bytes then the Varchar(max) column value is stored in a separate LOB data pages and row will only have a 16 byte pointer to the LOB data page where the actual data is present. So In-Row Varchar(Max) is good for searches and retrieval.

文本类型列的数据存储在单独的LOB数据页中。表数据页中的行只有一个指向实际数据所在的LOB数据页的16字节指针。而Varchar(max)类型列的数据如果小于或等于8000字节,则存储为行。如果Varchar(max)列值跨越了8000个字节,那么Varchar(max)列值将存储在一个单独的LOB数据页中,并且行将只有一个指向实际数据所在的LOB数据页的16字节指针。所以行内Varchar(Max)对于搜索和检索很有用。

  • Supported/Unsupported Functionalities
  • 支持/不支持的功能

Some of the string functions, operators or the constructs which doesn’t work on the Text type column, but they do work on VarChar(Max) type column.

有些字符串函数、操作符或结构不能在文本类型列上工作,但它们可以在VarChar(Max)类型列上工作。

  1. = Equal to Operator on VarChar(Max) type column
  2. = VarChar(Max)类型列上的运算符
  3. Group by clause on VarChar(Max) type column

    在VarChar(Max)类型列上按子句分组

    • System IO Considerations
    • 系统IO的考虑

As we know that the VarChar(Max) type column values are stored out-of-row only if the length of the value to be stored in it is greater than 8000 bytes or there is not enough space in the row, otherwise it will store it in-row. So if most of the values stored in the VarChar(Max) column are large and stored out-of-row, the data retrieval behavior will almost similar to the one that of the Text type column.

正如我们所知道的,VarChar(Max)类型的列值只在存储在它的值的长度大于8000字节或行中没有足够的空间的情况下存储,否则它会将其存储到行中。因此,如果VarChar(Max)列中存储的大多数值都很大且存储在行外,那么数据检索行为将几乎与文本类型列中的值相似。

But if most of the values stored in VarChar(Max) type columns are small enough to store in-row. Then retrieval of the data where LOB columns are not included requires the more number of data pages to read as the LOB column value is stored in-row in the same data page where the non-LOB column values are stored. But if the select query includes LOB column then it requires less number of pages to read for the data retrieval compared to the Text type columns.

但是,如果VarChar(Max)类型列中存储的大多数值足够小,可以存储在行中。然后检索不包含LOB列的数据时,需要读取更多的数据页,因为LOB列值存储在存储非LOB列值的同一数据页中。但是,如果select查询包含LOB列,那么与文本类型列相比,它需要更少的页面用于数据检索。

Conclusion

结论

Use VarChar(MAX) data type rather than TEXT for good performance.

使用VarChar(MAX)数据类型而不是文本来获得良好的性能。

Source

#5


4  

If using MS Access (especially older versions like 2003) you are forced to use TEXT datatype on SQL Server as MS Access does not recognize nvarchar(MAX) as a Memo field in Access, whereas TEXT is recognized as a Memo-field.

如果使用MS Access(特别是较早的版本,如2003),您将*在SQL Server上使用文本数据类型,因为MS Access不承认nvarchar(MAX)作为一个Memo字段进行访问,而文本被识别为一个Memo字段。