I have a table with 4 columns and data looks like
我有一个有4个列和数据的表。
`cust_id firstname lastname value`
`1 a b ct;ct;ct;dir`
`2 c a ct;ct;ct;ct;ct;ct`
`3 d e ct;ct;ct;dir;st`
i want output like
我希望输出像
`cust_id firstname lastname value`
`1 a b ct;dir`
`2 c a ct`
`3 d e ct;dir;st`
Have lots of rows with different number of repetitive words in each.
有很多行,每行都有不同数量的重复单词。
Any help much appreciated.
感谢任何帮助。
4 个解决方案
#1
3
With the help of a Parse/Split Function and a CROSS apply. I should add that the logic of the PARSE can easily be migrated into the CROSS APPLY if you can't use a UDF
在解析/拆分函数和交叉应用的帮助下。我应该补充一点,如果不能使用UDF,解析的逻辑可以很容易地迁移到CROSS APPLY中
Also, the sequence will be preserved if that matters.
而且,如果这一点很重要的话,序列也会被保留下来。
Declare @YourTable table (cust_id int,value varchar(50))
Insert Into @YourTable values
(1,'ct;ct;ct;dir'),
(2,'ct;ct;ct;ct;ct;ct'),
(3,'ct;ct;ct;dir;st')
Select A.*
,B.*
From @YourTable A
Cross Apply (
Select String=Stuff((Select Distinct ';' +RetVal From (
Select RetVal,RetSeq=min(RetSeq)
From [dbo].[udf-Str-Parse](A.Value,';') C
Group By RetVal
) X For XML Path ('')),1,1,'')
) B
Returns
返回
cust_id value String
1 ct;ct;ct;dir ct;dir
2 ct;ct;ct;ct;ct;ct ct
3 ct;ct;ct;dir;st ct;dir;st
The UDF if Needed
UDF如果需要
CREATE FUNCTION [dbo].[udf-Str-Parse] (@String varchar(max),@Delimiter varchar(10))
Returns Table
As
Return (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>'+ replace((Select @String as [*] For XML Path('')),@Delimiter,'</x><x>')+'</x>' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
);
--Select * from [dbo].[udf-Str-Parse]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse]('John Cappelletti was here',' ')
--Select * from [dbo].[udf-Str-Parse]('this,is,<test>,for,< & >',',')
Another Parse/Spit function (Returns same results as XML version)
另一个解析/吐痰函数(返回与XML版本相同的结果)
CREATE FUNCTION [dbo].[udf-Str-Parse-8K] (@String varchar(max),@Delimiter varchar(10))
Returns Table
As
Return (
with cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(N) As (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 a,cte1 b,cte1 c,cte1 d) A ),
cte3(N) As (Select 1 Union All Select t.N+DataLength(@Delimiter) From cte2 t Where Substring(@String,t.N,DataLength(@Delimiter)) = @Delimiter),
cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(@Delimiter,@String,s.N),0)-S.N,8000) From cte3 S)
Select RetSeq = Row_Number() over (Order By A.N)
,RetVal = Substring(@String, A.N, A.L)
From cte4 A
);
--Orginal Source http://www.sqlservercentral.com/articles/Tally+Table/72993/
--Much faster than str-Parse, but limited to 8K
--Select * from [dbo].[udf-Str-Parse-8K]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse-8K]('John||Cappelletti||was||here','||')
#2
0
If you want repeated, adjacent instances of ct;
to be turned into a single one, you can do:
如果你想重复,相邻的ct实例;要变成一个单一的,你可以做:
select replace(replace(replace(col, 'ct;', '><'), '<>', ''), '><', 'ct;')
This assumes that "<" and ">" do not appear in the column. Any two characters can be used for this purpose.
这假设“<”和“>”不出现在列中。任何两个字符都可以用于此目的。
You can also readily put this into an update
.
您也可以很容易地将其放入更新中。
Note: If these are codes of some sort, then you should normalize the data. This problem does appear in other contexts where normalization is not appropriate (for instance, removing consecutive spaces in a string).
注意:如果这些代码是某种类型的,那么应该对数据进行规范化。这个问题确实出现在其他不适合规范化的上下文中(例如,删除字符串中的连续空格)。
#3
0
The first thing to point out is that if you stored your data in a normalised fashion then you would not have as much of an issue, the best way would be a separate one to many table, e.g.
首先要指出的是,如果您以一种标准化的方式存储数据,那么您就不会有那么多的问题,最好的方法是将数据单独存储到多个表中,例如。
CustomerValues
CustomerValues
Cust_ID Value
-------------------
1 ct
1 ct
1 ct
1 dir
2 ct
2 ct
.....
Your query would then become something like:
您的查询将变成如下内容:
--SAMPLE DATA
WITH Customers AS
( SELECT *
FROM (VALUES
(1, 'a', 'b'),
(2, 'c', 'a'),
(3, 'd', 'e')
) AS t (cust_id, firstname, lastname)
), CustomerValues AS
( SELECT *
FROM (VALUES
(1, 'ct'), (1, 'ct'), (1, 'ct'), (1, 'dir'),
(2, 'ct'), (2, 'ct'), (2, 'ct'), (2, 'ct'), (2, 'ct'), (2, 'ct'),
(3, 'ct'), (3, 'ct'), (3, 'ct'), (3, 'dir'), (3, 'st')
) AS t (cust_id, value)
)
-- SAMPLE DATA END
SELECT c.cust_id,
c.firstname,
c.lastname,
value = STUFF(CustomerValues.value('.', 'NVARCHAR(MAX)'), 1, 1, '')
FROM Customers AS c
CROSS APPLY
( SELECT DISTINCT ';' + value
FROM CustomerValues AS cv
WHERE cv.cust_id = c.cust_id
FOR XML PATH(''), TYPE
) AS cv (CustomerValues);
For more reading on how the rows are concatenated see Grouped Concatenation in SQL Server
有关行如何连接的更多阅读,请参见SQL Server中的分组连接
Without your data in this format, you would need to perform some kind of split. For more see Split strings the right way – or the next best way
如果没有这种格式的数据,您将需要执行某种分割。更多信息,请参阅拆分字符串的正确方式——或下一个最佳方式
WITH Customers AS
( SELECT *
FROM (VALUES
(1, 'a', 'b', 'ct;ct;ct;dir'),
(2, 'c', 'a', 'ct;ct;ct;ct;ct;ct'),
(3, 'd', 'e', 'ct;ct;ct;dir;st')
) AS t (cust_id, firstname, lastname, value)
), Numbers (Number) AS
( SELECT ROW_NUMBER() OVER(ORDER BY N1.N)
FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS n1 (N)
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS n2 (N)
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS n3 (N)
), CustomerValues AS
( SELECT c.cust_id,
value = SUBSTRING(c.value, Number, CHARINDEX(';', c.value + ';', n.Number) - n.Number)
FROM Customers AS c
INNER JOIN Numbers AS n
ON N.Number <= CONVERT(INT, LEN(c.value))
AND SUBSTRING(';' + c.value, n.Number, 1) = ';'
)
SELECT c.cust_id,
c.firstname,
c.lastname,
value = STUFF(CustomerValues.value('.', 'NVARCHAR(MAX)'), 1, 1, '')
FROM Customers AS c
CROSS APPLY
( SELECT DISTINCT ';' + value
FROM CustomerValues AS cv
WHERE cv.cust_id = c.cust_id
FOR XML PATH(''), TYPE
) AS cv (CustomerValues);
#4
0
Here is how you can do:
你可以这样做:
WITH
CTE_Sample AS
(
SELECT 1 AS cust_id, 'a' AS firstname, 'b' AS lastname, 'ct;ct;ct;dir' AS YourValue UNION ALL
SELECT 2 AS cust_id, 'c' AS firstname, 'a' AS lastname, 'ct;ct;ct;ct;ct;ct' AS YourValue UNION ALL
SELECT 3 AS cust_id, 'd' AS firstname, 'e' AS lastname, 'ct;ct;ct;dir;st' AS YourValue
),
--
-- Split your values into lines (Distinct values)
CTE_Split AS
(
SELECT DISTINCT
YourValue
,value AS Val
FROM CTE_Sample SS
CROSS APPLY STRING_SPLIT(YourValue, ';')
)
SELECT
cust_id
,firstname
,lastname
-- Aggregate your different value into one column
,STUFF((
SELECT ';'+ Val
FROM CTE_Split SP
WHERE SP.YourValue = SA.YourValue
FOR XML PATH('')
), 1, 1, ''
) AS Val
FROM CTE_Sample SA
#1
3
With the help of a Parse/Split Function and a CROSS apply. I should add that the logic of the PARSE can easily be migrated into the CROSS APPLY if you can't use a UDF
在解析/拆分函数和交叉应用的帮助下。我应该补充一点,如果不能使用UDF,解析的逻辑可以很容易地迁移到CROSS APPLY中
Also, the sequence will be preserved if that matters.
而且,如果这一点很重要的话,序列也会被保留下来。
Declare @YourTable table (cust_id int,value varchar(50))
Insert Into @YourTable values
(1,'ct;ct;ct;dir'),
(2,'ct;ct;ct;ct;ct;ct'),
(3,'ct;ct;ct;dir;st')
Select A.*
,B.*
From @YourTable A
Cross Apply (
Select String=Stuff((Select Distinct ';' +RetVal From (
Select RetVal,RetSeq=min(RetSeq)
From [dbo].[udf-Str-Parse](A.Value,';') C
Group By RetVal
) X For XML Path ('')),1,1,'')
) B
Returns
返回
cust_id value String
1 ct;ct;ct;dir ct;dir
2 ct;ct;ct;ct;ct;ct ct
3 ct;ct;ct;dir;st ct;dir;st
The UDF if Needed
UDF如果需要
CREATE FUNCTION [dbo].[udf-Str-Parse] (@String varchar(max),@Delimiter varchar(10))
Returns Table
As
Return (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>'+ replace((Select @String as [*] For XML Path('')),@Delimiter,'</x><x>')+'</x>' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
);
--Select * from [dbo].[udf-Str-Parse]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse]('John Cappelletti was here',' ')
--Select * from [dbo].[udf-Str-Parse]('this,is,<test>,for,< & >',',')
Another Parse/Spit function (Returns same results as XML version)
另一个解析/吐痰函数(返回与XML版本相同的结果)
CREATE FUNCTION [dbo].[udf-Str-Parse-8K] (@String varchar(max),@Delimiter varchar(10))
Returns Table
As
Return (
with cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(N) As (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 a,cte1 b,cte1 c,cte1 d) A ),
cte3(N) As (Select 1 Union All Select t.N+DataLength(@Delimiter) From cte2 t Where Substring(@String,t.N,DataLength(@Delimiter)) = @Delimiter),
cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(@Delimiter,@String,s.N),0)-S.N,8000) From cte3 S)
Select RetSeq = Row_Number() over (Order By A.N)
,RetVal = Substring(@String, A.N, A.L)
From cte4 A
);
--Orginal Source http://www.sqlservercentral.com/articles/Tally+Table/72993/
--Much faster than str-Parse, but limited to 8K
--Select * from [dbo].[udf-Str-Parse-8K]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse-8K]('John||Cappelletti||was||here','||')
#2
0
If you want repeated, adjacent instances of ct;
to be turned into a single one, you can do:
如果你想重复,相邻的ct实例;要变成一个单一的,你可以做:
select replace(replace(replace(col, 'ct;', '><'), '<>', ''), '><', 'ct;')
This assumes that "<" and ">" do not appear in the column. Any two characters can be used for this purpose.
这假设“<”和“>”不出现在列中。任何两个字符都可以用于此目的。
You can also readily put this into an update
.
您也可以很容易地将其放入更新中。
Note: If these are codes of some sort, then you should normalize the data. This problem does appear in other contexts where normalization is not appropriate (for instance, removing consecutive spaces in a string).
注意:如果这些代码是某种类型的,那么应该对数据进行规范化。这个问题确实出现在其他不适合规范化的上下文中(例如,删除字符串中的连续空格)。
#3
0
The first thing to point out is that if you stored your data in a normalised fashion then you would not have as much of an issue, the best way would be a separate one to many table, e.g.
首先要指出的是,如果您以一种标准化的方式存储数据,那么您就不会有那么多的问题,最好的方法是将数据单独存储到多个表中,例如。
CustomerValues
CustomerValues
Cust_ID Value
-------------------
1 ct
1 ct
1 ct
1 dir
2 ct
2 ct
.....
Your query would then become something like:
您的查询将变成如下内容:
--SAMPLE DATA
WITH Customers AS
( SELECT *
FROM (VALUES
(1, 'a', 'b'),
(2, 'c', 'a'),
(3, 'd', 'e')
) AS t (cust_id, firstname, lastname)
), CustomerValues AS
( SELECT *
FROM (VALUES
(1, 'ct'), (1, 'ct'), (1, 'ct'), (1, 'dir'),
(2, 'ct'), (2, 'ct'), (2, 'ct'), (2, 'ct'), (2, 'ct'), (2, 'ct'),
(3, 'ct'), (3, 'ct'), (3, 'ct'), (3, 'dir'), (3, 'st')
) AS t (cust_id, value)
)
-- SAMPLE DATA END
SELECT c.cust_id,
c.firstname,
c.lastname,
value = STUFF(CustomerValues.value('.', 'NVARCHAR(MAX)'), 1, 1, '')
FROM Customers AS c
CROSS APPLY
( SELECT DISTINCT ';' + value
FROM CustomerValues AS cv
WHERE cv.cust_id = c.cust_id
FOR XML PATH(''), TYPE
) AS cv (CustomerValues);
For more reading on how the rows are concatenated see Grouped Concatenation in SQL Server
有关行如何连接的更多阅读,请参见SQL Server中的分组连接
Without your data in this format, you would need to perform some kind of split. For more see Split strings the right way – or the next best way
如果没有这种格式的数据,您将需要执行某种分割。更多信息,请参阅拆分字符串的正确方式——或下一个最佳方式
WITH Customers AS
( SELECT *
FROM (VALUES
(1, 'a', 'b', 'ct;ct;ct;dir'),
(2, 'c', 'a', 'ct;ct;ct;ct;ct;ct'),
(3, 'd', 'e', 'ct;ct;ct;dir;st')
) AS t (cust_id, firstname, lastname, value)
), Numbers (Number) AS
( SELECT ROW_NUMBER() OVER(ORDER BY N1.N)
FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS n1 (N)
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS n2 (N)
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS n3 (N)
), CustomerValues AS
( SELECT c.cust_id,
value = SUBSTRING(c.value, Number, CHARINDEX(';', c.value + ';', n.Number) - n.Number)
FROM Customers AS c
INNER JOIN Numbers AS n
ON N.Number <= CONVERT(INT, LEN(c.value))
AND SUBSTRING(';' + c.value, n.Number, 1) = ';'
)
SELECT c.cust_id,
c.firstname,
c.lastname,
value = STUFF(CustomerValues.value('.', 'NVARCHAR(MAX)'), 1, 1, '')
FROM Customers AS c
CROSS APPLY
( SELECT DISTINCT ';' + value
FROM CustomerValues AS cv
WHERE cv.cust_id = c.cust_id
FOR XML PATH(''), TYPE
) AS cv (CustomerValues);
#4
0
Here is how you can do:
你可以这样做:
WITH
CTE_Sample AS
(
SELECT 1 AS cust_id, 'a' AS firstname, 'b' AS lastname, 'ct;ct;ct;dir' AS YourValue UNION ALL
SELECT 2 AS cust_id, 'c' AS firstname, 'a' AS lastname, 'ct;ct;ct;ct;ct;ct' AS YourValue UNION ALL
SELECT 3 AS cust_id, 'd' AS firstname, 'e' AS lastname, 'ct;ct;ct;dir;st' AS YourValue
),
--
-- Split your values into lines (Distinct values)
CTE_Split AS
(
SELECT DISTINCT
YourValue
,value AS Val
FROM CTE_Sample SS
CROSS APPLY STRING_SPLIT(YourValue, ';')
)
SELECT
cust_id
,firstname
,lastname
-- Aggregate your different value into one column
,STUFF((
SELECT ';'+ Val
FROM CTE_Split SP
WHERE SP.YourValue = SA.YourValue
FOR XML PATH('')
), 1, 1, ''
) AS Val
FROM CTE_Sample SA