由于&符号,SQL Server 2008拆分字符串失败

时间:2021-08-23 22:08:05

I have created a stored procedure to attempt to replicate the split_string function that is now in SQL Server 2016.

我创建了一个存储过程来尝试复制现在在SQL Server 2016中的split_string函数。

So far I have got this:

到目前为止,我有这个:

CREATE FUNCTION MySplit
    (@delimited NVARCHAR(MAX), @delimiter NVARCHAR(100)) 
RETURNS @t TABLE
(
-- Id column can be commented out, not required for SQL splitting string
  id INT IDENTITY(1,1), -- I use this column for numbering split parts
  val NVARCHAR(MAX)
)
AS
BEGIN
    DECLARE @xml XML
    SET @xml = N'<root><r>' + replace(@delimited,@delimiter,'</r><r>') + '</r></root>'

    INSERT INTO @t(val)
        SELECT
            r.value('.','varchar(max)') AS item
        FROM
            @xml.nodes('//root/r') AS records(r)

    RETURN
END
GO

And it does work, but it will not split the text string if any part of it contains an ampersand [ &amp; ].

并且它确实有效,但如果文本字符串的任何部分包含&符号,它将不会拆分文本字符串[& ]。

I have found hundreds of examples of splitting a string, but none seem to deal with special characters.

我发现了数百个拆分字符串的例子,但似乎都没有处理特殊字符。

So using this:

所以使用这个:

select * 
from MySplit('Test1,Test2,Test3', ',') 

works ok, but

工作正常,但是

select * 
from MySplit('Test1 & Test4,Test2,Test3', ',') 

does not. It fails with

才不是。它失败了

XML parsing: line 1, character 17, illegal name character.

XML解析:第1行,第17个字符,非法名称字符。

What have I done wrong?

我做错了什么?

UPDATE

Firstly, thanks for @marcs, for showing me the error of my ways in writing this question.

首先,感谢@marcs,向我展示了我写这个问题的方式的错误。

Secondly, Thanks to all of the help below, especially @PanagiotisKanavos and @MatBailie

其次,感谢下面的所有帮助,特别是@PanagiotisKanavos和@MatBailie

As this is throw away code for migrating data from old to new system, I have chosen to use @MatBailie solution, quick and very dirty, but also perfect for this task.

由于这是将数据从旧系统迁移到新系统的丢弃代码,我选择使用@MatBailie解决方案,快速而且非常脏,但也非常适合这项任务。

In the future, though, I will be progressing down @PanagiotisKanavos solution.

但是,在未来,我将继续推进@PanagiotisKanavos解决方案。

2 个解决方案

#1


0  

First of all, SQL Server 2016 introduced a STRING_SPLIT TVF. You can write CROSS APPLY STRING_SPLIT(thatField,',') as items

首先,SQL Server 2016引入了STRING_SPLIT TVF。您可以将CROSS APPLY STRING_SPLIT(thatField,',')写为项目

In previous versions you still need to create a custom splitting function. There are various techniques. The fastest solution is to use a SQLCLR function.

在以前的版本中,您仍然需要创建自定义拆分功能。有各种技术。最快的解决方案是使用SQLCLR功能。

In some cases, the second fastest is what you used - convert the text to XML and select the nodes. A well known problem with this splitting technique is that illegal XML characters will break it, as you found out. That's why Aaron Bertrand doesn't consider this a generic splitter.

在某些情况下,第二快是您使用的 - 将文本转换为XML并选择节点。正如您所发现的,这种分裂技术的一个众所周知的问题是非法的XML字符会破坏它。这就是为什么Aaron Bertrand不认为这是一个通用的分离器。

You can replace invalid characters by their encoded values, eg & with &amp; but you have to be certain that your text will never contain such encodings.

您可以通过编码值替换无效字符,例如&&&&但你必须确定你的文本永远不会包含这样的编码。

Perhaps you should investigate different techniques, like the Moden function, which can be faster in many situations :

也许您应该研究不同的技术,比如Moden函数,它在许多情况下可以更快:

CREATE FUNCTION dbo.SplitStrings_Moden
(
   @List NVARCHAR(MAX),
   @Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
  WITH E1(N)        AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 
                         UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 
                         UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
       E2(N)        AS (SELECT 1 FROM E1 a, E1 b),
       E4(N)        AS (SELECT 1 FROM E2 a, E2 b),
       E42(N)       AS (SELECT 1 FROM E4 a, E2 b),
       cteTally(N)  AS (SELECT 0 UNION ALL SELECT TOP (DATALENGTH(ISNULL(@List,1))) 
                         ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E42),
       cteStart(N1) AS (SELECT t.N+1 FROM cteTally t
                         WHERE (SUBSTRING(@List,t.N,1) = @Delimiter OR t.N = 0))
  SELECT Item = SUBSTRING(@List, s.N1, ISNULL(NULLIF(CHARINDEX(@Delimiter,@List,s.N1),0)-s.N1,8000))
    FROM cteStart s;

Personally I created and use a SQLCLR UDF.

我个人创建并使用了SQLCLR UDF。

Another option is to avoid splitting altogether and pass table-valued parameters from the client to the server. Or use a microORM like Dapper that can construct an IN (...) clause from a list of values, eg:

另一种选择是避免完全拆分并将表值参数从客户端传递到服务器。或者使用像Dapper这样的microORM,它可以从值列表构造IN(...)子句,例如:

var products=connection.Query<Product>("select * from products where id in @ids",new {ids=myIdArray});

An ORM like EF that supports LINQ can also generate an IN clause :

支持LINQ的类似EF的ORM也可以生成IN子句:

var products = from product in dbContext.Products
               where myIdArray.Contains(product.Id)
               select product;

#2


1  

Edit your function and replace all & as &amp; This will remove the error. This happens because XML cannot parse & as it's an inbuilt tag.

编辑您的功能并替换所有&as&这将删除错误。发生这种情况是因为XML无法解析,因为它是内置标记。

#1


0  

First of all, SQL Server 2016 introduced a STRING_SPLIT TVF. You can write CROSS APPLY STRING_SPLIT(thatField,',') as items

首先,SQL Server 2016引入了STRING_SPLIT TVF。您可以将CROSS APPLY STRING_SPLIT(thatField,',')写为项目

In previous versions you still need to create a custom splitting function. There are various techniques. The fastest solution is to use a SQLCLR function.

在以前的版本中,您仍然需要创建自定义拆分功能。有各种技术。最快的解决方案是使用SQLCLR功能。

In some cases, the second fastest is what you used - convert the text to XML and select the nodes. A well known problem with this splitting technique is that illegal XML characters will break it, as you found out. That's why Aaron Bertrand doesn't consider this a generic splitter.

在某些情况下,第二快是您使用的 - 将文本转换为XML并选择节点。正如您所发现的,这种分裂技术的一个众所周知的问题是非法的XML字符会破坏它。这就是为什么Aaron Bertrand不认为这是一个通用的分离器。

You can replace invalid characters by their encoded values, eg & with &amp; but you have to be certain that your text will never contain such encodings.

您可以通过编码值替换无效字符,例如&&&&但你必须确定你的文本永远不会包含这样的编码。

Perhaps you should investigate different techniques, like the Moden function, which can be faster in many situations :

也许您应该研究不同的技术,比如Moden函数,它在许多情况下可以更快:

CREATE FUNCTION dbo.SplitStrings_Moden
(
   @List NVARCHAR(MAX),
   @Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
  WITH E1(N)        AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 
                         UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 
                         UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
       E2(N)        AS (SELECT 1 FROM E1 a, E1 b),
       E4(N)        AS (SELECT 1 FROM E2 a, E2 b),
       E42(N)       AS (SELECT 1 FROM E4 a, E2 b),
       cteTally(N)  AS (SELECT 0 UNION ALL SELECT TOP (DATALENGTH(ISNULL(@List,1))) 
                         ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E42),
       cteStart(N1) AS (SELECT t.N+1 FROM cteTally t
                         WHERE (SUBSTRING(@List,t.N,1) = @Delimiter OR t.N = 0))
  SELECT Item = SUBSTRING(@List, s.N1, ISNULL(NULLIF(CHARINDEX(@Delimiter,@List,s.N1),0)-s.N1,8000))
    FROM cteStart s;

Personally I created and use a SQLCLR UDF.

我个人创建并使用了SQLCLR UDF。

Another option is to avoid splitting altogether and pass table-valued parameters from the client to the server. Or use a microORM like Dapper that can construct an IN (...) clause from a list of values, eg:

另一种选择是避免完全拆分并将表值参数从客户端传递到服务器。或者使用像Dapper这样的microORM,它可以从值列表构造IN(...)子句,例如:

var products=connection.Query<Product>("select * from products where id in @ids",new {ids=myIdArray});

An ORM like EF that supports LINQ can also generate an IN clause :

支持LINQ的类似EF的ORM也可以生成IN子句:

var products = from product in dbContext.Products
               where myIdArray.Contains(product.Id)
               select product;

#2


1  

Edit your function and replace all & as &amp; This will remove the error. This happens because XML cannot parse & as it's an inbuilt tag.

编辑您的功能并替换所有&as&这将删除错误。发生这种情况是因为XML无法解析,因为它是内置标记。