I have created a stored procedure to attempt to replicate the split_string
function that is now in SQL Server 2016.
我创建了一个存储过程来尝试复制现在在SQL Server 2016中的split_string函数。
So far I have got this:
到目前为止,我有这个:
CREATE FUNCTION MySplit
(@delimited NVARCHAR(MAX), @delimiter NVARCHAR(100))
RETURNS @t TABLE
(
-- Id column can be commented out, not required for SQL splitting string
id INT IDENTITY(1,1), -- I use this column for numbering split parts
val NVARCHAR(MAX)
)
AS
BEGIN
DECLARE @xml XML
SET @xml = N'<root><r>' + replace(@delimited,@delimiter,'</r><r>') + '</r></root>'
INSERT INTO @t(val)
SELECT
r.value('.','varchar(max)') AS item
FROM
@xml.nodes('//root/r') AS records(r)
RETURN
END
GO
And it does work, but it will not split the text string if any part of it contains an ampersand [ &
].
并且它确实有效,但如果文本字符串的任何部分包含&符号,它将不会拆分文本字符串[& ]。
I have found hundreds of examples of splitting a string, but none seem to deal with special characters.
我发现了数百个拆分字符串的例子,但似乎都没有处理特殊字符。
So using this:
所以使用这个:
select *
from MySplit('Test1,Test2,Test3', ',')
works ok, but
工作正常,但是
select *
from MySplit('Test1 & Test4,Test2,Test3', ',')
does not. It fails with
才不是。它失败了
XML parsing: line 1, character 17, illegal name character.
XML解析:第1行,第17个字符,非法名称字符。
What have I done wrong?
我做错了什么?
UPDATE
Firstly, thanks for @marcs, for showing me the error of my ways in writing this question.
首先,感谢@marcs,向我展示了我写这个问题的方式的错误。
Secondly, Thanks to all of the help below, especially @PanagiotisKanavos and @MatBailie
其次,感谢下面的所有帮助,特别是@PanagiotisKanavos和@MatBailie
As this is throw away code for migrating data from old to new system, I have chosen to use @MatBailie solution, quick and very dirty, but also perfect for this task.
由于这是将数据从旧系统迁移到新系统的丢弃代码,我选择使用@MatBailie解决方案,快速而且非常脏,但也非常适合这项任务。
In the future, though, I will be progressing down @PanagiotisKanavos solution.
但是,在未来,我将继续推进@PanagiotisKanavos解决方案。
2 个解决方案
#1
0
First of all, SQL Server 2016 introduced a STRING_SPLIT TVF. You can write CROSS APPLY STRING_SPLIT(thatField,',') as items
首先,SQL Server 2016引入了STRING_SPLIT TVF。您可以将CROSS APPLY STRING_SPLIT(thatField,',')写为项目
In previous versions you still need to create a custom splitting function. There are various techniques. The fastest solution is to use a SQLCLR function.
在以前的版本中,您仍然需要创建自定义拆分功能。有各种技术。最快的解决方案是使用SQLCLR功能。
In some cases, the second fastest is what you used - convert the text to XML and select the nodes. A well known problem with this splitting technique is that illegal XML characters will break it, as you found out. That's why Aaron Bertrand doesn't consider this a generic splitter.
在某些情况下,第二快是您使用的 - 将文本转换为XML并选择节点。正如您所发现的,这种分裂技术的一个众所周知的问题是非法的XML字符会破坏它。这就是为什么Aaron Bertrand不认为这是一个通用的分离器。
You can replace invalid characters by their encoded values, eg &
with &
but you have to be certain that your text will never contain such encodings.
您可以通过编码值替换无效字符,例如&&&&但你必须确定你的文本永远不会包含这样的编码。
Perhaps you should investigate different techniques, like the Moden function, which can be faster in many situations :
也许您应该研究不同的技术,比如Moden函数,它在许多情况下可以更快:
CREATE FUNCTION dbo.SplitStrings_Moden
(
@List NVARCHAR(MAX),
@Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
E2(N) AS (SELECT 1 FROM E1 a, E1 b),
E4(N) AS (SELECT 1 FROM E2 a, E2 b),
E42(N) AS (SELECT 1 FROM E4 a, E2 b),
cteTally(N) AS (SELECT 0 UNION ALL SELECT TOP (DATALENGTH(ISNULL(@List,1)))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E42),
cteStart(N1) AS (SELECT t.N+1 FROM cteTally t
WHERE (SUBSTRING(@List,t.N,1) = @Delimiter OR t.N = 0))
SELECT Item = SUBSTRING(@List, s.N1, ISNULL(NULLIF(CHARINDEX(@Delimiter,@List,s.N1),0)-s.N1,8000))
FROM cteStart s;
Personally I created and use a SQLCLR UDF.
我个人创建并使用了SQLCLR UDF。
Another option is to avoid splitting altogether and pass table-valued parameters from the client to the server. Or use a microORM like Dapper that can construct an IN (...)
clause from a list of values, eg:
另一种选择是避免完全拆分并将表值参数从客户端传递到服务器。或者使用像Dapper这样的microORM,它可以从值列表构造IN(...)子句,例如:
var products=connection.Query<Product>("select * from products where id in @ids",new {ids=myIdArray});
An ORM like EF that supports LINQ can also generate an IN
clause :
支持LINQ的类似EF的ORM也可以生成IN子句:
var products = from product in dbContext.Products
where myIdArray.Contains(product.Id)
select product;
#2
1
Edit your function and replace all &
as &
This will remove the error. This happens because XML cannot parse &
as it's an inbuilt tag.
编辑您的功能并替换所有&as&这将删除错误。发生这种情况是因为XML无法解析,因为它是内置标记。
#1
0
First of all, SQL Server 2016 introduced a STRING_SPLIT TVF. You can write CROSS APPLY STRING_SPLIT(thatField,',') as items
首先,SQL Server 2016引入了STRING_SPLIT TVF。您可以将CROSS APPLY STRING_SPLIT(thatField,',')写为项目
In previous versions you still need to create a custom splitting function. There are various techniques. The fastest solution is to use a SQLCLR function.
在以前的版本中,您仍然需要创建自定义拆分功能。有各种技术。最快的解决方案是使用SQLCLR功能。
In some cases, the second fastest is what you used - convert the text to XML and select the nodes. A well known problem with this splitting technique is that illegal XML characters will break it, as you found out. That's why Aaron Bertrand doesn't consider this a generic splitter.
在某些情况下,第二快是您使用的 - 将文本转换为XML并选择节点。正如您所发现的,这种分裂技术的一个众所周知的问题是非法的XML字符会破坏它。这就是为什么Aaron Bertrand不认为这是一个通用的分离器。
You can replace invalid characters by their encoded values, eg &
with &
but you have to be certain that your text will never contain such encodings.
您可以通过编码值替换无效字符,例如&&&&但你必须确定你的文本永远不会包含这样的编码。
Perhaps you should investigate different techniques, like the Moden function, which can be faster in many situations :
也许您应该研究不同的技术,比如Moden函数,它在许多情况下可以更快:
CREATE FUNCTION dbo.SplitStrings_Moden
(
@List NVARCHAR(MAX),
@Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
E2(N) AS (SELECT 1 FROM E1 a, E1 b),
E4(N) AS (SELECT 1 FROM E2 a, E2 b),
E42(N) AS (SELECT 1 FROM E4 a, E2 b),
cteTally(N) AS (SELECT 0 UNION ALL SELECT TOP (DATALENGTH(ISNULL(@List,1)))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E42),
cteStart(N1) AS (SELECT t.N+1 FROM cteTally t
WHERE (SUBSTRING(@List,t.N,1) = @Delimiter OR t.N = 0))
SELECT Item = SUBSTRING(@List, s.N1, ISNULL(NULLIF(CHARINDEX(@Delimiter,@List,s.N1),0)-s.N1,8000))
FROM cteStart s;
Personally I created and use a SQLCLR UDF.
我个人创建并使用了SQLCLR UDF。
Another option is to avoid splitting altogether and pass table-valued parameters from the client to the server. Or use a microORM like Dapper that can construct an IN (...)
clause from a list of values, eg:
另一种选择是避免完全拆分并将表值参数从客户端传递到服务器。或者使用像Dapper这样的microORM,它可以从值列表构造IN(...)子句,例如:
var products=connection.Query<Product>("select * from products where id in @ids",new {ids=myIdArray});
An ORM like EF that supports LINQ can also generate an IN
clause :
支持LINQ的类似EF的ORM也可以生成IN子句:
var products = from product in dbContext.Products
where myIdArray.Contains(product.Id)
select product;
#2
1
Edit your function and replace all &
as &
This will remove the error. This happens because XML cannot parse &
as it's an inbuilt tag.
编辑您的功能并替换所有&as&这将删除错误。发生这种情况是因为XML无法解析,因为它是内置标记。