SQL Server——用单个单词替换列中的重复单词

时间:2021-12-05 22:50:50

I have a table with 4 columns and data looks like

我有一个有4个列和数据的表。

`cust_id firstname lastname  value`

`1       a            b      ct;ct;ct;dir`
`2       c            a      ct;ct;ct;ct;ct;ct`
`3       d            e      ct;ct;ct;dir;st`

i want output like

我希望输出像

`cust_id firstname lastname value`

`1       a            b      ct;dir`
`2       c            a      ct`
`3       d            e      ct;dir;st`

Have lots of rows with different number of repetitive words in each.

有很多行,每行都有不同数量的重复单词。

Any help much appreciated.

感谢任何帮助。

4 个解决方案

#1


3  

With the help of a Parse/Split Function and a CROSS apply. I should add that the logic of the PARSE can easily be migrated into the CROSS APPLY if you can't use a UDF

在解析/拆分函数和交叉应用的帮助下。我应该补充一点,如果不能使用UDF,解析的逻辑可以很容易地迁移到CROSS APPLY中

Also, the sequence will be preserved if that matters.

而且,如果这一点很重要的话,序列也会被保留下来。

Declare @YourTable table (cust_id int,value varchar(50))
Insert Into @YourTable values
(1,'ct;ct;ct;dir'),
(2,'ct;ct;ct;ct;ct;ct'),
(3,'ct;ct;ct;dir;st')

Select A.*
      ,B.*
 From  @YourTable A
 Cross Apply (
    Select String=Stuff((Select Distinct ';' +RetVal From (
        Select RetVal,RetSeq=min(RetSeq)
          From [dbo].[udf-Str-Parse](A.Value,';') C
          Group By RetVal
    ) X For XML Path ('')),1,1,'') 
  ) B

Returns

返回

cust_id value                String
1       ct;ct;ct;dir         ct;dir
2       ct;ct;ct;ct;ct;ct    ct
3       ct;ct;ct;dir;st      ct;dir;st

The UDF if Needed

UDF如果需要

CREATE FUNCTION [dbo].[udf-Str-Parse] (@String varchar(max),@Delimiter varchar(10))
Returns Table 
As
Return (  
    Select RetSeq = Row_Number() over (Order By (Select null))
          ,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
    From  (Select x = Cast('<x>'+ replace((Select @String as [*] For XML Path('')),@Delimiter,'</x><x>')+'</x>' as xml).query('.')) as A 
    Cross Apply x.nodes('x') AS B(i)
);
--Select * from [dbo].[udf-Str-Parse]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse]('John Cappelletti was here',' ')
--Select * from [dbo].[udf-Str-Parse]('this,is,<test>,for,< & >',',')

Another Parse/Spit function (Returns same results as XML version)

另一个解析/吐痰函数(返回与XML版本相同的结果)

CREATE FUNCTION [dbo].[udf-Str-Parse-8K] (@String varchar(max),@Delimiter varchar(10))
Returns Table 
As
Return (  
    with   cte1(N)   As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
           cte2(N)   As (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 a,cte1 b,cte1 c,cte1 d) A ),
           cte3(N)   As (Select 1 Union All Select t.N+DataLength(@Delimiter) From cte2 t Where Substring(@String,t.N,DataLength(@Delimiter)) = @Delimiter),
           cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(@Delimiter,@String,s.N),0)-S.N,8000) From cte3 S)

    Select RetSeq = Row_Number() over (Order By A.N)
          ,RetVal = Substring(@String, A.N, A.L) 
    From   cte4 A
);
--Orginal Source http://www.sqlservercentral.com/articles/Tally+Table/72993/
--Much faster than str-Parse, but limited to 8K
--Select * from [dbo].[udf-Str-Parse-8K]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse-8K]('John||Cappelletti||was||here','||')

#2


0  

If you want repeated, adjacent instances of ct; to be turned into a single one, you can do:

如果你想重复,相邻的ct实例;要变成一个单一的,你可以做:

select replace(replace(replace(col, 'ct;', '><'), '<>', ''), '><', 'ct;')

This assumes that "<" and ">" do not appear in the column. Any two characters can be used for this purpose.

这假设“<”和“>”不出现在列中。任何两个字符都可以用于此目的。

You can also readily put this into an update.

您也可以很容易地将其放入更新中。

Note: If these are codes of some sort, then you should normalize the data. This problem does appear in other contexts where normalization is not appropriate (for instance, removing consecutive spaces in a string).

注意:如果这些代码是某种类型的,那么应该对数据进行规范化。这个问题确实出现在其他不适合规范化的上下文中(例如,删除字符串中的连续空格)。

#3


0  

The first thing to point out is that if you stored your data in a normalised fashion then you would not have as much of an issue, the best way would be a separate one to many table, e.g.

首先要指出的是,如果您以一种标准化的方式存储数据,那么您就不会有那么多的问题,最好的方法是将数据单独存储到多个表中,例如。

CustomerValues

CustomerValues

Cust_ID     Value
-------------------
1           ct
1           ct
1           ct
1           dir
2           ct
2           ct
.....

Your query would then become something like:

您的查询将变成如下内容:

--SAMPLE DATA
WITH Customers AS
(   SELECT  *
    FROM    (VALUES
                (1, 'a', 'b'),
                (2, 'c', 'a'),
                (3, 'd', 'e')
            ) AS t (cust_id, firstname, lastname)
), CustomerValues AS
(   SELECT  *
    FROM    (VALUES
                (1, 'ct'), (1, 'ct'), (1, 'ct'), (1, 'dir'),
                (2, 'ct'), (2, 'ct'), (2, 'ct'), (2, 'ct'), (2, 'ct'), (2, 'ct'),
                (3, 'ct'), (3, 'ct'), (3, 'ct'), (3, 'dir'), (3, 'st')
            ) AS t (cust_id, value)
)
-- SAMPLE DATA END
SELECT  c.cust_id,
        c.firstname,
        c.lastname,
        value = STUFF(CustomerValues.value('.', 'NVARCHAR(MAX)'), 1, 1, '')
FROM    Customers AS c
        CROSS APPLY
        (   SELECT  DISTINCT ';' + value
            FROM    CustomerValues AS cv
            WHERE   cv.cust_id = c.cust_id
            FOR XML PATH(''), TYPE
        ) AS cv (CustomerValues);

For more reading on how the rows are concatenated see Grouped Concatenation in SQL Server

有关行如何连接的更多阅读,请参见SQL Server中的分组连接

Without your data in this format, you would need to perform some kind of split. For more see Split strings the right way – or the next best way

如果没有这种格式的数据,您将需要执行某种分割。更多信息,请参阅拆分字符串的正确方式——或下一个最佳方式

WITH Customers AS
(   SELECT  *
    FROM    (VALUES
                (1, 'a', 'b', 'ct;ct;ct;dir'),
                (2, 'c', 'a', 'ct;ct;ct;ct;ct;ct'),
                (3, 'd', 'e', 'ct;ct;ct;dir;st')
            ) AS t (cust_id, firstname, lastname, value)
), Numbers (Number) AS
(   SELECT  ROW_NUMBER() OVER(ORDER BY N1.N)
    FROM    (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS n1 (N)
    CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS n2 (N)
    CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS n3 (N)
), CustomerValues AS
(   SELECT  c.cust_id,
            value = SUBSTRING(c.value, Number,  CHARINDEX(';', c.value + ';', n.Number) - n.Number)
    FROM    Customers AS c
            INNER JOIN Numbers AS n
                ON N.Number <= CONVERT(INT, LEN(c.value))
                AND SUBSTRING(';' + c.value, n.Number, 1) = ';'
)
SELECT  c.cust_id,
        c.firstname,
        c.lastname,
        value = STUFF(CustomerValues.value('.', 'NVARCHAR(MAX)'), 1, 1, '')
FROM    Customers AS c
        CROSS APPLY
        (   SELECT  DISTINCT ';' + value
            FROM    CustomerValues AS cv
            WHERE   cv.cust_id = c.cust_id
            FOR XML PATH(''), TYPE
        ) AS cv (CustomerValues);

#4


0  

Here is how you can do:

你可以这样做:

WITH 
CTE_Sample AS
(
    SELECT 1 AS cust_id, 'a' AS firstname, 'b' AS lastname, 'ct;ct;ct;dir'      AS YourValue UNION ALL
    SELECT 2 AS cust_id, 'c' AS firstname, 'a' AS lastname, 'ct;ct;ct;ct;ct;ct' AS YourValue UNION ALL
    SELECT 3 AS cust_id, 'd' AS firstname, 'e' AS lastname, 'ct;ct;ct;dir;st'   AS YourValue 
),

--
-- Split your values into lines (Distinct values)
CTE_Split AS
(
    SELECT DISTINCT
             YourValue  
            ,value AS Val
          FROM CTE_Sample SS
          CROSS APPLY STRING_SPLIT(YourValue, ';')
)


SELECT 
     cust_id
    ,firstname
    ,lastname
    -- Aggregate your different value into one column
    ,STUFF((
        SELECT ';'+ Val
          FROM CTE_Split SP       
          WHERE SP.YourValue = SA.YourValue
        FOR XML PATH('')
                        ), 1, 1, '' 
    ) AS Val
 FROM CTE_Sample SA

#1


3  

With the help of a Parse/Split Function and a CROSS apply. I should add that the logic of the PARSE can easily be migrated into the CROSS APPLY if you can't use a UDF

在解析/拆分函数和交叉应用的帮助下。我应该补充一点,如果不能使用UDF,解析的逻辑可以很容易地迁移到CROSS APPLY中

Also, the sequence will be preserved if that matters.

而且,如果这一点很重要的话,序列也会被保留下来。

Declare @YourTable table (cust_id int,value varchar(50))
Insert Into @YourTable values
(1,'ct;ct;ct;dir'),
(2,'ct;ct;ct;ct;ct;ct'),
(3,'ct;ct;ct;dir;st')

Select A.*
      ,B.*
 From  @YourTable A
 Cross Apply (
    Select String=Stuff((Select Distinct ';' +RetVal From (
        Select RetVal,RetSeq=min(RetSeq)
          From [dbo].[udf-Str-Parse](A.Value,';') C
          Group By RetVal
    ) X For XML Path ('')),1,1,'') 
  ) B

Returns

返回

cust_id value                String
1       ct;ct;ct;dir         ct;dir
2       ct;ct;ct;ct;ct;ct    ct
3       ct;ct;ct;dir;st      ct;dir;st

The UDF if Needed

UDF如果需要

CREATE FUNCTION [dbo].[udf-Str-Parse] (@String varchar(max),@Delimiter varchar(10))
Returns Table 
As
Return (  
    Select RetSeq = Row_Number() over (Order By (Select null))
          ,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
    From  (Select x = Cast('<x>'+ replace((Select @String as [*] For XML Path('')),@Delimiter,'</x><x>')+'</x>' as xml).query('.')) as A 
    Cross Apply x.nodes('x') AS B(i)
);
--Select * from [dbo].[udf-Str-Parse]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse]('John Cappelletti was here',' ')
--Select * from [dbo].[udf-Str-Parse]('this,is,<test>,for,< & >',',')

Another Parse/Spit function (Returns same results as XML version)

另一个解析/吐痰函数(返回与XML版本相同的结果)

CREATE FUNCTION [dbo].[udf-Str-Parse-8K] (@String varchar(max),@Delimiter varchar(10))
Returns Table 
As
Return (  
    with   cte1(N)   As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
           cte2(N)   As (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 a,cte1 b,cte1 c,cte1 d) A ),
           cte3(N)   As (Select 1 Union All Select t.N+DataLength(@Delimiter) From cte2 t Where Substring(@String,t.N,DataLength(@Delimiter)) = @Delimiter),
           cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(@Delimiter,@String,s.N),0)-S.N,8000) From cte3 S)

    Select RetSeq = Row_Number() over (Order By A.N)
          ,RetVal = Substring(@String, A.N, A.L) 
    From   cte4 A
);
--Orginal Source http://www.sqlservercentral.com/articles/Tally+Table/72993/
--Much faster than str-Parse, but limited to 8K
--Select * from [dbo].[udf-Str-Parse-8K]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse-8K]('John||Cappelletti||was||here','||')

#2


0  

If you want repeated, adjacent instances of ct; to be turned into a single one, you can do:

如果你想重复,相邻的ct实例;要变成一个单一的,你可以做:

select replace(replace(replace(col, 'ct;', '><'), '<>', ''), '><', 'ct;')

This assumes that "<" and ">" do not appear in the column. Any two characters can be used for this purpose.

这假设“<”和“>”不出现在列中。任何两个字符都可以用于此目的。

You can also readily put this into an update.

您也可以很容易地将其放入更新中。

Note: If these are codes of some sort, then you should normalize the data. This problem does appear in other contexts where normalization is not appropriate (for instance, removing consecutive spaces in a string).

注意:如果这些代码是某种类型的,那么应该对数据进行规范化。这个问题确实出现在其他不适合规范化的上下文中(例如,删除字符串中的连续空格)。

#3


0  

The first thing to point out is that if you stored your data in a normalised fashion then you would not have as much of an issue, the best way would be a separate one to many table, e.g.

首先要指出的是,如果您以一种标准化的方式存储数据,那么您就不会有那么多的问题,最好的方法是将数据单独存储到多个表中,例如。

CustomerValues

CustomerValues

Cust_ID     Value
-------------------
1           ct
1           ct
1           ct
1           dir
2           ct
2           ct
.....

Your query would then become something like:

您的查询将变成如下内容:

--SAMPLE DATA
WITH Customers AS
(   SELECT  *
    FROM    (VALUES
                (1, 'a', 'b'),
                (2, 'c', 'a'),
                (3, 'd', 'e')
            ) AS t (cust_id, firstname, lastname)
), CustomerValues AS
(   SELECT  *
    FROM    (VALUES
                (1, 'ct'), (1, 'ct'), (1, 'ct'), (1, 'dir'),
                (2, 'ct'), (2, 'ct'), (2, 'ct'), (2, 'ct'), (2, 'ct'), (2, 'ct'),
                (3, 'ct'), (3, 'ct'), (3, 'ct'), (3, 'dir'), (3, 'st')
            ) AS t (cust_id, value)
)
-- SAMPLE DATA END
SELECT  c.cust_id,
        c.firstname,
        c.lastname,
        value = STUFF(CustomerValues.value('.', 'NVARCHAR(MAX)'), 1, 1, '')
FROM    Customers AS c
        CROSS APPLY
        (   SELECT  DISTINCT ';' + value
            FROM    CustomerValues AS cv
            WHERE   cv.cust_id = c.cust_id
            FOR XML PATH(''), TYPE
        ) AS cv (CustomerValues);

For more reading on how the rows are concatenated see Grouped Concatenation in SQL Server

有关行如何连接的更多阅读,请参见SQL Server中的分组连接

Without your data in this format, you would need to perform some kind of split. For more see Split strings the right way – or the next best way

如果没有这种格式的数据,您将需要执行某种分割。更多信息,请参阅拆分字符串的正确方式——或下一个最佳方式

WITH Customers AS
(   SELECT  *
    FROM    (VALUES
                (1, 'a', 'b', 'ct;ct;ct;dir'),
                (2, 'c', 'a', 'ct;ct;ct;ct;ct;ct'),
                (3, 'd', 'e', 'ct;ct;ct;dir;st')
            ) AS t (cust_id, firstname, lastname, value)
), Numbers (Number) AS
(   SELECT  ROW_NUMBER() OVER(ORDER BY N1.N)
    FROM    (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS n1 (N)
    CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS n2 (N)
    CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS n3 (N)
), CustomerValues AS
(   SELECT  c.cust_id,
            value = SUBSTRING(c.value, Number,  CHARINDEX(';', c.value + ';', n.Number) - n.Number)
    FROM    Customers AS c
            INNER JOIN Numbers AS n
                ON N.Number <= CONVERT(INT, LEN(c.value))
                AND SUBSTRING(';' + c.value, n.Number, 1) = ';'
)
SELECT  c.cust_id,
        c.firstname,
        c.lastname,
        value = STUFF(CustomerValues.value('.', 'NVARCHAR(MAX)'), 1, 1, '')
FROM    Customers AS c
        CROSS APPLY
        (   SELECT  DISTINCT ';' + value
            FROM    CustomerValues AS cv
            WHERE   cv.cust_id = c.cust_id
            FOR XML PATH(''), TYPE
        ) AS cv (CustomerValues);

#4


0  

Here is how you can do:

你可以这样做:

WITH 
CTE_Sample AS
(
    SELECT 1 AS cust_id, 'a' AS firstname, 'b' AS lastname, 'ct;ct;ct;dir'      AS YourValue UNION ALL
    SELECT 2 AS cust_id, 'c' AS firstname, 'a' AS lastname, 'ct;ct;ct;ct;ct;ct' AS YourValue UNION ALL
    SELECT 3 AS cust_id, 'd' AS firstname, 'e' AS lastname, 'ct;ct;ct;dir;st'   AS YourValue 
),

--
-- Split your values into lines (Distinct values)
CTE_Split AS
(
    SELECT DISTINCT
             YourValue  
            ,value AS Val
          FROM CTE_Sample SS
          CROSS APPLY STRING_SPLIT(YourValue, ';')
)


SELECT 
     cust_id
    ,firstname
    ,lastname
    -- Aggregate your different value into one column
    ,STUFF((
        SELECT ';'+ Val
          FROM CTE_Split SP       
          WHERE SP.YourValue = SA.YourValue
        FOR XML PATH('')
                        ), 1, 1, '' 
    ) AS Val
 FROM CTE_Sample SA