将列更新为不同的聚合值

时间:2022-11-22 22:44:08

I am creating a script that for "merging" and deleting duplicate rows from a table. The table contains address information, and uses an integer field for storing information about the email as bit flags (column name lngValue). For example, lngValue & 1 == 1 means its the primary address.

我正在创建一个脚本,用于“合并”和删除表中的重复行。该表包含地址信息,并使用整数字段将有关电子邮件的信息存储为位标志(列名lngValue)。例如,lngValue&1 == 1表示其主要地址。

There are instances of the same email being entered twice, but sometimes with different lngValues. To resolve this, I need to take the lngValue from all duplicates and assign them to one surviving record and delete the rest.

有两次输入相同电子邮件的情况,但有时会使用不同的lngValues。要解决这个问题,我需要从所有重复项中获取lngValue并将它们分配给一个幸存的记录并删除其余的记录。

My biggest headache so far as been with the "merging" of the records. What I want to do is bitwise or all lngValues of duplicate records together. Here is what I have so far, which only finds the value of all lngValues bitwise or'ed together.

到目前为止,我最头疼的是记录的“合并”。我想要做的是将重复记录的按位或所有lngValues放在一起。这是我到目前为止所做的,它只能按位或一起找到所有lngValues的值。

Warning: messy code ahead

警告:前面的代码混乱

declare @duplicates table
(
lngInternetPK int,
lngContactFK int,
lngValue int
)

insert into @duplicates (lngInternetPK, lngContactFK, lngValue) 
(
select  tblminternet.lngInternetPK, tblminternet.lngContactFK, tblminternet.lngValue   from tblminternet  inner join 
(select strAddress, lngcontactfk, count(*) as count from tblminternet where lngValue & 256 <> 256 group by strAddress, lngcontactfk) secondemail
On tblminternet.strAddress = secondemail.strAddress and
tblminternet.lngcontactfk = secondemail.lngcontactfk 
where count > 1 and tblminternet.strAddress is not null and tblminternet.lngValue & 256 <> 256 --order by lngContactFK, strAddress
)

update @duplicates set lngValue = t.val

from 
                (select (sum(dupes.lngValue) & 65535) as val from 
                    (select  here.lngInternetPK,                     here.lngContactFK, here.lngValue from tblminternet here  inner join 
                    (select strAddress, lngcontactfk, count(*) as count from tblminternet where lngValue & 256 <> 256 group by strAddress, lngcontactfk) secondemail
                    On here.strAddress = secondemail.strAddress     and
                    here.lngcontactfk = secondemail.lngcontactfk 
                    where count > 1 and here.strAddress is not      null and here.lngValue & 256 <> 256) dupes, tblminternet this

                where this.lngContactFK = dupes.lngContactFK
                ) t
where lngInternetPK in (select lngInternetPK from @duplicates)    

Edit:
As requested here is some sample data:

编辑:这里要求的是一些示例数据:

Table Name: tblminternet
Column Names:
lngInternetPK
lngContactFK
lngValue
strAddress

表名:tblminternet列名:lngInternetPK lngContactFK lngValue strAddress

Example row 1:
lngInternetPK: 1
lngContactFK: 1
lngValue: 33
strAddress: "me@myaddress.com"

示例第1行:lngInternetPK:1 lngContactFK:1 lngValue:33 strAddress:“me@myaddress.com”

Example row 2:
lngInternetPK: 2
lngContactFK: 1
lngValue: 40
strAddress: "me@myaddress.com"

示例第2行:lngInternetPK:2 lngContactFK:1 lngValue:40 strAddress:“me@myaddress.com”

If these two were merged here is the desired result:
lngInternetPK: 1
lngContactFK: 1
lngValue: 41
strAddress: "me@myaddress.com"

如果这两个合并在这里是期望的结果:lngInternetPK:1 lngContactFK:1 lngValue:41 strAddress:“me@myaddress.com”

Other necessary rules:
Each contact can have multiple emails, but each email row must be distinct ( each email can only appear as one row).

其他必要规则:每个联系人可以有多个电子邮件,但每个电子邮件行必须是不同的(每封电子邮件只能显示为一行)。

3 个解决方案

#1


SQL Server lacks native bitwise aggregates, that's why we need to emulate them.

SQL Server缺少本机按位聚合,这就是我们需要模拟它们的原因。

The main idea here is to generate a set of bits from 0 to 15, for each bit apply the bitmask to the value and select MAX (which will give us an OR for a given bit), then select the SUM (which will merge the bit masks).

这里的主要思想是生成一组从0到15的位,每个位将位掩码应用于该值并选择MAX(这将给出给定位的OR),然后选择SUM(将合并位掩码)。

The we just update the first lngInternetPK for any given (lngContactFK, strValue) with the new value of lngValue, and delete all duplicates.

我们只使用新值lngValue为任何给定的(lngContactFK,strValue)更新第一个lngInternetPK,并删除所有重复项。

;WITH   bits AS
        (
        SELECT  0 AS b
        UNION ALL
        SELECT  b + 1
        FROM    bits
        WHERE   b < 15
        ),
        v AS
        (
        SELECT  i.*,
                (
                SELECT  SUM(value)
                FROM    (
                        SELECT  MAX(lngValue & POWER(2, b)) AS value
                        FROM    tblmInternet ii
                        CROSS JOIN
                                bits
                        WHERE   ii.lngContactFK = i.lngContactFK
                                AND ii.strAddress = i.strAddress
                        GROUP BY
                                b
                        ) q
                ) AS lngNewValue
        FROM    (
                SELECT  ii.*, ROW_NUMBER() OVER (PARTITION BY lngContactFK, strAddress ORDER BY lngInternetPK) AS rn
                FROM    tblmInternet ii
                ) i
        WHERE   rn = 1
        )
UPDATE  v
SET     lngValue = lngNewValue;

;WITH    v AS
        (
        SELECT  ii.*, ROW_NUMBER() OVER (PARTITION BY lngContactFK, strAddress ORDER BY lngInternetPK) AS rn
        FROM    tblmInternet ii
        )
DELETE  v
WHERE   rn > 1

See this article in my blog for more detailed explanations:

有关更详细的说明,请参阅我的博客中的这篇文章:

#2


I believe the following query gets you what you want. This routine assumes a max of two duplicate addresses per contact. If there's more than one dup per contact, the query will have to be modified. I hope this helps.

我相信以下查询可以满足您的需求。此例程假定每个联系人最多有两个重复地址。如果每个联系人有多个重复,则必须修改查询。我希望这有帮助。

Declare @tblminternet 
Table 
( lngInternetPK int,   
  lngContactFK int,  
  lngValue int, 
  strAddress varchar(255)
)

Insert Into @tblminternet 
select 1, 1, 33, 'me@myaddress.com' 
union
select 2, 1, 40, 'me@myaddress.com'
union 
select 3, 2, 33, 'me@myaddress2.com'
union 
select 4, 2, 40, 'me@myaddress2.com'
union 
select 5, 3, 2, 'me@myaddress3.com'

--Select * from @tblminternet

Select  Distinct   
    A.lngContactFK , 
    A.lngValue | B.lngValue as 'Bitwise OR', 
    A.strAddress
From @tblminternet A, @tblminternet B
Where A.lngContactFK = B.lngContactFK
And A.strAddress = B.strAddress
And A.lngInternetPK != B.lngInternetPK

#3


You can create SQL Server Aggregate functions in .NET that you can then implement in SQL server inline. I think this requires a minimum of SQL server 2005 and Visual Studio 2010. I did one using Visual Studio 2013 Community Edition (free even for commercial use) for use with .NET 2 and SQL Server 2005.

您可以在.NET中创建SQL Server聚合函数,然后可以在SQL Server内联中实现。我认为这需要最少的SQL Server 2005和Visual Studio 2010.我使用Visual Studio 2013 Community Edition(甚至免费用于商业用途)与.NET 2和SQL Server 2005一起使用。

See the MSDN article: https://msdn.microsoft.com/en-us/library/91e6taax(v=vs.90).aspx

请参阅MSDN文章:https://msdn.microsoft.com/en-us/library/91e6taax(v=vs.90).aspx

First you'll need to enable the CLR feature in SQL server: https://msdn.microsoft.com/en-us/library/ms131048.aspx

首先,您需要在SQL Server中启用CLR功能:https://msdn.microsoft.com/en-us/library/ms131048.aspx

sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'clr enabled', 1;
GO
RECONFIGURE;
GO
  1. Create a SQL Server -> SQL Server Database Project
  2. 创建SQL Server - > SQL Server数据库项目

  3. Right-click on the new project and select Properties
  4. 右键单击新项目,然后选择“属性”

  5. Configure the targeted SQL Server version under Project Settings
  6. 在“项目设置”下配置目标SQL Server版本

  7. Configure the targeted CLR language under SQL CLR (such as VB)
  8. 在SQL CLR下配置目标CLR语言(例如VB)

  9. Right-click on the new project and select Add -> New Item...
  10. 右键单击新项目,然后选择Add - > New Item ...

  11. When the dialog pops up, select SQL Server -> SQL CLR VB -> SQL CLR VB Aggregate
  12. 弹出对话框时,选择SQL Server - > SQL CLR VB - > SQL CLR VB Aggregate

Now you can write your bitwise code in VB:

现在您可以在VB中编写您的按位代码:

Imports System
Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server


<Serializable()> _
<Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.Native)> _
Public Structure AggregateBitwiseOR

    Private CurrentAggregate As SqlTypes.SqlInt32

    Public Sub Init()
        CurrentAggregate = 0
    End Sub

    Public Sub Accumulate(ByVal value As SqlTypes.SqlInt32)
        'Perform Bitwise OR against aggregate memory
        CurrentAggregate = CurrentAggregate OR value
    End Sub

    Public Sub Merge(ByVal value as AggregateBitwiseOR)
        Accumulate(value.Terminate())
    End Sub

    Public Function Terminate() As SqlInt32
        Return CurrentAggregate
    End Function

End Structure

Now deploy it: https://msdn.microsoft.com/en-us/library/dahcx0ww(v=vs.90).aspx

现在部署它:https://msdn.microsoft.com/en-us/library/dahcx0ww(v = vs。90).aspx

  1. Build the project using the menu bar: Build -> Build ProjectName (if the build fails with error 04018 then download a new version of the data tools @ http://msdn.microsoft.com/en-US/data/hh297027 or by going to the menu bar: Tools -> Extensions And Updates, then under updates select update for Microsoft SQL Server Update For Database Tooling)
  2. 使用菜单栏构建项目:Build - > Build ProjectName(如果构建失败,错误04018,则下载新版本的数据工具@ http://msdn.microsoft.com/en-US/data/hh297027或者进入菜单栏:工具 - >扩展和更新,然后在更新下选择更新Microsoft SQL Server更新数据库工具)

  3. Copy your compiled DLL to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn and to C:\
  4. 将已编译的DLL复制到C:\ Program Files \ Microsoft SQL Server \ MSSQL.1 \ MSSQL \ Binn和C:\

  5. Register the DLL:

    注册DLL:

    CREATE ASSEMBLY [CLRTools] FROM ‘c:CLRTools.dll’ WITH PERMISSION_SET = SAFE

    CREATE ASSEMBLY [CLRTools] FROM'c:CLRTools.dll'with PERMISSION_SET = SAFE

  6. Create the aggregate in SQL:

    在SQL中创建聚合:

    CREATE AGGREGATE [dbo].[AggregateBitwiseOR](@value INT) RETURNS INT EXTERNAL NAME [CLRTools].[CLRTools.AggregateBitwiseOR];

    CREATE AGGREGATE [dbo]。[AggregateBitwiseOR](@ value INT)RETURNS INT EXTERNAL NAME [CLRTools]。[CLRTools.AggregateBitwiseOR];

If you get the error "Incorrect syntax near 'EXTERNAL'" then change the database compatibility level using following commands:

如果您收到错误“EXTERNAL'附近的语法不正确”,请使用以下命令更改数据库兼容级别:

For SQL Server 2005: EXEC sp_dbcmptlevel 'DatabaseName', 90

对于SQL Server 2005:EXEC sp_dbcmptlevel'DatabaseName',90

For SQL Server 2008: EXEC sp_dbcmptlevel 'DatabaseName', 100

对于SQL Server 2008:EXEC sp_dbcmptlevel'DatabaseName',100

  1. Test your code:

    测试你的代码:

    SELECT dbo.AggregateBitwiseOR(Foo) AS Foo FROM Bar

    SELECT dbo.AggregateBitwiseOR(Foo)AS Foo FROM Bar

I found this article helpful: http://www.codeproject.com/Articles/37377/SQL-Server-CLR-Functions

我发现这篇文章很有帮助:http://www.codeproject.com/Articles/37377/SQL-Server-CLR-Functions

#1


SQL Server lacks native bitwise aggregates, that's why we need to emulate them.

SQL Server缺少本机按位聚合,这就是我们需要模拟它们的原因。

The main idea here is to generate a set of bits from 0 to 15, for each bit apply the bitmask to the value and select MAX (which will give us an OR for a given bit), then select the SUM (which will merge the bit masks).

这里的主要思想是生成一组从0到15的位,每个位将位掩码应用于该值并选择MAX(这将给出给定位的OR),然后选择SUM(将合并位掩码)。

The we just update the first lngInternetPK for any given (lngContactFK, strValue) with the new value of lngValue, and delete all duplicates.

我们只使用新值lngValue为任何给定的(lngContactFK,strValue)更新第一个lngInternetPK,并删除所有重复项。

;WITH   bits AS
        (
        SELECT  0 AS b
        UNION ALL
        SELECT  b + 1
        FROM    bits
        WHERE   b < 15
        ),
        v AS
        (
        SELECT  i.*,
                (
                SELECT  SUM(value)
                FROM    (
                        SELECT  MAX(lngValue & POWER(2, b)) AS value
                        FROM    tblmInternet ii
                        CROSS JOIN
                                bits
                        WHERE   ii.lngContactFK = i.lngContactFK
                                AND ii.strAddress = i.strAddress
                        GROUP BY
                                b
                        ) q
                ) AS lngNewValue
        FROM    (
                SELECT  ii.*, ROW_NUMBER() OVER (PARTITION BY lngContactFK, strAddress ORDER BY lngInternetPK) AS rn
                FROM    tblmInternet ii
                ) i
        WHERE   rn = 1
        )
UPDATE  v
SET     lngValue = lngNewValue;

;WITH    v AS
        (
        SELECT  ii.*, ROW_NUMBER() OVER (PARTITION BY lngContactFK, strAddress ORDER BY lngInternetPK) AS rn
        FROM    tblmInternet ii
        )
DELETE  v
WHERE   rn > 1

See this article in my blog for more detailed explanations:

有关更详细的说明,请参阅我的博客中的这篇文章:

#2


I believe the following query gets you what you want. This routine assumes a max of two duplicate addresses per contact. If there's more than one dup per contact, the query will have to be modified. I hope this helps.

我相信以下查询可以满足您的需求。此例程假定每个联系人最多有两个重复地址。如果每个联系人有多个重复,则必须修改查询。我希望这有帮助。

Declare @tblminternet 
Table 
( lngInternetPK int,   
  lngContactFK int,  
  lngValue int, 
  strAddress varchar(255)
)

Insert Into @tblminternet 
select 1, 1, 33, 'me@myaddress.com' 
union
select 2, 1, 40, 'me@myaddress.com'
union 
select 3, 2, 33, 'me@myaddress2.com'
union 
select 4, 2, 40, 'me@myaddress2.com'
union 
select 5, 3, 2, 'me@myaddress3.com'

--Select * from @tblminternet

Select  Distinct   
    A.lngContactFK , 
    A.lngValue | B.lngValue as 'Bitwise OR', 
    A.strAddress
From @tblminternet A, @tblminternet B
Where A.lngContactFK = B.lngContactFK
And A.strAddress = B.strAddress
And A.lngInternetPK != B.lngInternetPK

#3


You can create SQL Server Aggregate functions in .NET that you can then implement in SQL server inline. I think this requires a minimum of SQL server 2005 and Visual Studio 2010. I did one using Visual Studio 2013 Community Edition (free even for commercial use) for use with .NET 2 and SQL Server 2005.

您可以在.NET中创建SQL Server聚合函数,然后可以在SQL Server内联中实现。我认为这需要最少的SQL Server 2005和Visual Studio 2010.我使用Visual Studio 2013 Community Edition(甚至免费用于商业用途)与.NET 2和SQL Server 2005一起使用。

See the MSDN article: https://msdn.microsoft.com/en-us/library/91e6taax(v=vs.90).aspx

请参阅MSDN文章:https://msdn.microsoft.com/en-us/library/91e6taax(v=vs.90).aspx

First you'll need to enable the CLR feature in SQL server: https://msdn.microsoft.com/en-us/library/ms131048.aspx

首先,您需要在SQL Server中启用CLR功能:https://msdn.microsoft.com/en-us/library/ms131048.aspx

sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'clr enabled', 1;
GO
RECONFIGURE;
GO
  1. Create a SQL Server -> SQL Server Database Project
  2. 创建SQL Server - > SQL Server数据库项目

  3. Right-click on the new project and select Properties
  4. 右键单击新项目,然后选择“属性”

  5. Configure the targeted SQL Server version under Project Settings
  6. 在“项目设置”下配置目标SQL Server版本

  7. Configure the targeted CLR language under SQL CLR (such as VB)
  8. 在SQL CLR下配置目标CLR语言(例如VB)

  9. Right-click on the new project and select Add -> New Item...
  10. 右键单击新项目,然后选择Add - > New Item ...

  11. When the dialog pops up, select SQL Server -> SQL CLR VB -> SQL CLR VB Aggregate
  12. 弹出对话框时,选择SQL Server - > SQL CLR VB - > SQL CLR VB Aggregate

Now you can write your bitwise code in VB:

现在您可以在VB中编写您的按位代码:

Imports System
Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server


<Serializable()> _
<Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.Native)> _
Public Structure AggregateBitwiseOR

    Private CurrentAggregate As SqlTypes.SqlInt32

    Public Sub Init()
        CurrentAggregate = 0
    End Sub

    Public Sub Accumulate(ByVal value As SqlTypes.SqlInt32)
        'Perform Bitwise OR against aggregate memory
        CurrentAggregate = CurrentAggregate OR value
    End Sub

    Public Sub Merge(ByVal value as AggregateBitwiseOR)
        Accumulate(value.Terminate())
    End Sub

    Public Function Terminate() As SqlInt32
        Return CurrentAggregate
    End Function

End Structure

Now deploy it: https://msdn.microsoft.com/en-us/library/dahcx0ww(v=vs.90).aspx

现在部署它:https://msdn.microsoft.com/en-us/library/dahcx0ww(v = vs。90).aspx

  1. Build the project using the menu bar: Build -> Build ProjectName (if the build fails with error 04018 then download a new version of the data tools @ http://msdn.microsoft.com/en-US/data/hh297027 or by going to the menu bar: Tools -> Extensions And Updates, then under updates select update for Microsoft SQL Server Update For Database Tooling)
  2. 使用菜单栏构建项目:Build - > Build ProjectName(如果构建失败,错误04018,则下载新版本的数据工具@ http://msdn.microsoft.com/en-US/data/hh297027或者进入菜单栏:工具 - >扩展和更新,然后在更新下选择更新Microsoft SQL Server更新数据库工具)

  3. Copy your compiled DLL to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn and to C:\
  4. 将已编译的DLL复制到C:\ Program Files \ Microsoft SQL Server \ MSSQL.1 \ MSSQL \ Binn和C:\

  5. Register the DLL:

    注册DLL:

    CREATE ASSEMBLY [CLRTools] FROM ‘c:CLRTools.dll’ WITH PERMISSION_SET = SAFE

    CREATE ASSEMBLY [CLRTools] FROM'c:CLRTools.dll'with PERMISSION_SET = SAFE

  6. Create the aggregate in SQL:

    在SQL中创建聚合:

    CREATE AGGREGATE [dbo].[AggregateBitwiseOR](@value INT) RETURNS INT EXTERNAL NAME [CLRTools].[CLRTools.AggregateBitwiseOR];

    CREATE AGGREGATE [dbo]。[AggregateBitwiseOR](@ value INT)RETURNS INT EXTERNAL NAME [CLRTools]。[CLRTools.AggregateBitwiseOR];

If you get the error "Incorrect syntax near 'EXTERNAL'" then change the database compatibility level using following commands:

如果您收到错误“EXTERNAL'附近的语法不正确”,请使用以下命令更改数据库兼容级别:

For SQL Server 2005: EXEC sp_dbcmptlevel 'DatabaseName', 90

对于SQL Server 2005:EXEC sp_dbcmptlevel'DatabaseName',90

For SQL Server 2008: EXEC sp_dbcmptlevel 'DatabaseName', 100

对于SQL Server 2008:EXEC sp_dbcmptlevel'DatabaseName',100

  1. Test your code:

    测试你的代码:

    SELECT dbo.AggregateBitwiseOR(Foo) AS Foo FROM Bar

    SELECT dbo.AggregateBitwiseOR(Foo)AS Foo FROM Bar

I found this article helpful: http://www.codeproject.com/Articles/37377/SQL-Server-CLR-Functions

我发现这篇文章很有帮助:http://www.codeproject.com/Articles/37377/SQL-Server-CLR-Functions