I am creating a script that for "merging" and deleting duplicate rows from a table. The table contains address information, and uses an integer field for storing information about the email as bit flags (column name lngValue). For example, lngValue & 1 == 1 means its the primary address.
我正在创建一个脚本,用于“合并”和删除表中的重复行。该表包含地址信息,并使用整数字段将有关电子邮件的信息存储为位标志(列名lngValue)。例如,lngValue&1 == 1表示其主要地址。
There are instances of the same email being entered twice, but sometimes with different lngValues. To resolve this, I need to take the lngValue from all duplicates and assign them to one surviving record and delete the rest.
有两次输入相同电子邮件的情况,但有时会使用不同的lngValues。要解决这个问题,我需要从所有重复项中获取lngValue并将它们分配给一个幸存的记录并删除其余的记录。
My biggest headache so far as been with the "merging" of the records. What I want to do is bitwise or all lngValues of duplicate records together. Here is what I have so far, which only finds the value of all lngValues bitwise or'ed together.
到目前为止,我最头疼的是记录的“合并”。我想要做的是将重复记录的按位或所有lngValues放在一起。这是我到目前为止所做的,它只能按位或一起找到所有lngValues的值。
Warning: messy code ahead
警告:前面的代码混乱
declare @duplicates table
(
lngInternetPK int,
lngContactFK int,
lngValue int
)
insert into @duplicates (lngInternetPK, lngContactFK, lngValue)
(
select tblminternet.lngInternetPK, tblminternet.lngContactFK, tblminternet.lngValue from tblminternet inner join
(select strAddress, lngcontactfk, count(*) as count from tblminternet where lngValue & 256 <> 256 group by strAddress, lngcontactfk) secondemail
On tblminternet.strAddress = secondemail.strAddress and
tblminternet.lngcontactfk = secondemail.lngcontactfk
where count > 1 and tblminternet.strAddress is not null and tblminternet.lngValue & 256 <> 256 --order by lngContactFK, strAddress
)
update @duplicates set lngValue = t.val
from
(select (sum(dupes.lngValue) & 65535) as val from
(select here.lngInternetPK, here.lngContactFK, here.lngValue from tblminternet here inner join
(select strAddress, lngcontactfk, count(*) as count from tblminternet where lngValue & 256 <> 256 group by strAddress, lngcontactfk) secondemail
On here.strAddress = secondemail.strAddress and
here.lngcontactfk = secondemail.lngcontactfk
where count > 1 and here.strAddress is not null and here.lngValue & 256 <> 256) dupes, tblminternet this
where this.lngContactFK = dupes.lngContactFK
) t
where lngInternetPK in (select lngInternetPK from @duplicates)
Edit:
As requested here is some sample data:
编辑:这里要求的是一些示例数据:
Table Name: tblminternet
Column Names:
lngInternetPK
lngContactFK
lngValue
strAddress
表名:tblminternet列名:lngInternetPK lngContactFK lngValue strAddress
Example row 1:
lngInternetPK: 1
lngContactFK: 1
lngValue: 33
strAddress: "me@myaddress.com"
示例第1行:lngInternetPK:1 lngContactFK:1 lngValue:33 strAddress:“me@myaddress.com”
Example row 2:
lngInternetPK: 2
lngContactFK: 1
lngValue: 40
strAddress: "me@myaddress.com"
示例第2行:lngInternetPK:2 lngContactFK:1 lngValue:40 strAddress:“me@myaddress.com”
If these two were merged here is the desired result:
lngInternetPK: 1
lngContactFK: 1
lngValue: 41
strAddress: "me@myaddress.com"
如果这两个合并在这里是期望的结果:lngInternetPK:1 lngContactFK:1 lngValue:41 strAddress:“me@myaddress.com”
Other necessary rules:
Each contact can have multiple emails, but each email row must be distinct ( each email can only appear as one row).
其他必要规则:每个联系人可以有多个电子邮件,但每个电子邮件行必须是不同的(每封电子邮件只能显示为一行)。
3 个解决方案
#1
SQL Server
lacks native bitwise aggregates, that's why we need to emulate them.
SQL Server缺少本机按位聚合,这就是我们需要模拟它们的原因。
The main idea here is to generate a set of bits from 0
to 15
, for each bit apply the bitmask to the value and select MAX
(which will give us an OR
for a given bit), then select the SUM
(which will merge the bit masks).
这里的主要思想是生成一组从0到15的位,每个位将位掩码应用于该值并选择MAX(这将给出给定位的OR),然后选择SUM(将合并位掩码)。
The we just update the first lngInternetPK
for any given (lngContactFK, strValue)
with the new value of lngValue
, and delete all duplicates.
我们只使用新值lngValue为任何给定的(lngContactFK,strValue)更新第一个lngInternetPK,并删除所有重复项。
;WITH bits AS
(
SELECT 0 AS b
UNION ALL
SELECT b + 1
FROM bits
WHERE b < 15
),
v AS
(
SELECT i.*,
(
SELECT SUM(value)
FROM (
SELECT MAX(lngValue & POWER(2, b)) AS value
FROM tblmInternet ii
CROSS JOIN
bits
WHERE ii.lngContactFK = i.lngContactFK
AND ii.strAddress = i.strAddress
GROUP BY
b
) q
) AS lngNewValue
FROM (
SELECT ii.*, ROW_NUMBER() OVER (PARTITION BY lngContactFK, strAddress ORDER BY lngInternetPK) AS rn
FROM tblmInternet ii
) i
WHERE rn = 1
)
UPDATE v
SET lngValue = lngNewValue;
;WITH v AS
(
SELECT ii.*, ROW_NUMBER() OVER (PARTITION BY lngContactFK, strAddress ORDER BY lngInternetPK) AS rn
FROM tblmInternet ii
)
DELETE v
WHERE rn > 1
See this article in my blog for more detailed explanations:
有关更详细的说明,请参阅我的博客中的这篇文章:
- SQL Server: aggregate bitwise OR
SQL Server:按位聚合OR
#2
I believe the following query gets you what you want. This routine assumes a max of two duplicate addresses per contact. If there's more than one dup per contact, the query will have to be modified. I hope this helps.
我相信以下查询可以满足您的需求。此例程假定每个联系人最多有两个重复地址。如果每个联系人有多个重复,则必须修改查询。我希望这有帮助。
Declare @tblminternet
Table
( lngInternetPK int,
lngContactFK int,
lngValue int,
strAddress varchar(255)
)
Insert Into @tblminternet
select 1, 1, 33, 'me@myaddress.com'
union
select 2, 1, 40, 'me@myaddress.com'
union
select 3, 2, 33, 'me@myaddress2.com'
union
select 4, 2, 40, 'me@myaddress2.com'
union
select 5, 3, 2, 'me@myaddress3.com'
--Select * from @tblminternet
Select Distinct
A.lngContactFK ,
A.lngValue | B.lngValue as 'Bitwise OR',
A.strAddress
From @tblminternet A, @tblminternet B
Where A.lngContactFK = B.lngContactFK
And A.strAddress = B.strAddress
And A.lngInternetPK != B.lngInternetPK
#3
You can create SQL Server Aggregate functions in .NET that you can then implement in SQL server inline. I think this requires a minimum of SQL server 2005 and Visual Studio 2010. I did one using Visual Studio 2013 Community Edition (free even for commercial use) for use with .NET 2 and SQL Server 2005.
您可以在.NET中创建SQL Server聚合函数,然后可以在SQL Server内联中实现。我认为这需要最少的SQL Server 2005和Visual Studio 2010.我使用Visual Studio 2013 Community Edition(甚至免费用于商业用途)与.NET 2和SQL Server 2005一起使用。
See the MSDN article: https://msdn.microsoft.com/en-us/library/91e6taax(v=vs.90).aspx
请参阅MSDN文章:https://msdn.microsoft.com/en-us/library/91e6taax(v=vs.90).aspx
First you'll need to enable the CLR feature in SQL server: https://msdn.microsoft.com/en-us/library/ms131048.aspx
首先,您需要在SQL Server中启用CLR功能:https://msdn.microsoft.com/en-us/library/ms131048.aspx
sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'clr enabled', 1;
GO
RECONFIGURE;
GO
- Create a SQL Server -> SQL Server Database Project
- Right-click on the new project and select Properties
- Configure the targeted SQL Server version under Project Settings
- Configure the targeted CLR language under SQL CLR (such as VB)
- Right-click on the new project and select Add -> New Item...
- When the dialog pops up, select SQL Server -> SQL CLR VB -> SQL CLR VB Aggregate
创建SQL Server - > SQL Server数据库项目
右键单击新项目,然后选择“属性”
在“项目设置”下配置目标SQL Server版本
在SQL CLR下配置目标CLR语言(例如VB)
右键单击新项目,然后选择Add - > New Item ...
弹出对话框时,选择SQL Server - > SQL CLR VB - > SQL CLR VB Aggregate
Now you can write your bitwise code in VB:
现在您可以在VB中编写您的按位代码:
Imports System
Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server
<Serializable()> _
<Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.Native)> _
Public Structure AggregateBitwiseOR
Private CurrentAggregate As SqlTypes.SqlInt32
Public Sub Init()
CurrentAggregate = 0
End Sub
Public Sub Accumulate(ByVal value As SqlTypes.SqlInt32)
'Perform Bitwise OR against aggregate memory
CurrentAggregate = CurrentAggregate OR value
End Sub
Public Sub Merge(ByVal value as AggregateBitwiseOR)
Accumulate(value.Terminate())
End Sub
Public Function Terminate() As SqlInt32
Return CurrentAggregate
End Function
End Structure
Now deploy it: https://msdn.microsoft.com/en-us/library/dahcx0ww(v=vs.90).aspx
现在部署它:https://msdn.microsoft.com/en-us/library/dahcx0ww(v = vs。90).aspx
- Build the project using the menu bar: Build -> Build ProjectName (if the build fails with error 04018 then download a new version of the data tools @ http://msdn.microsoft.com/en-US/data/hh297027 or by going to the menu bar: Tools -> Extensions And Updates, then under updates select update for Microsoft SQL Server Update For Database Tooling)
- Copy your compiled DLL to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn and to C:\
-
Register the DLL:
注册DLL:
CREATE ASSEMBLY [CLRTools] FROM ‘c:CLRTools.dll’ WITH PERMISSION_SET = SAFE
CREATE ASSEMBLY [CLRTools] FROM'c:CLRTools.dll'with PERMISSION_SET = SAFE
-
Create the aggregate in SQL:
在SQL中创建聚合:
CREATE AGGREGATE [dbo].[AggregateBitwiseOR](@value INT) RETURNS INT EXTERNAL NAME [CLRTools].[CLRTools.AggregateBitwiseOR];
CREATE AGGREGATE [dbo]。[AggregateBitwiseOR](@ value INT)RETURNS INT EXTERNAL NAME [CLRTools]。[CLRTools.AggregateBitwiseOR];
使用菜单栏构建项目:Build - > Build ProjectName(如果构建失败,错误04018,则下载新版本的数据工具@ http://msdn.microsoft.com/en-US/data/hh297027或者进入菜单栏:工具 - >扩展和更新,然后在更新下选择更新Microsoft SQL Server更新数据库工具)
将已编译的DLL复制到C:\ Program Files \ Microsoft SQL Server \ MSSQL.1 \ MSSQL \ Binn和C:\
If you get the error "Incorrect syntax near 'EXTERNAL'" then change the database compatibility level using following commands:
如果您收到错误“EXTERNAL'附近的语法不正确”,请使用以下命令更改数据库兼容级别:
For SQL Server 2005: EXEC sp_dbcmptlevel 'DatabaseName', 90
对于SQL Server 2005:EXEC sp_dbcmptlevel'DatabaseName',90
For SQL Server 2008: EXEC sp_dbcmptlevel 'DatabaseName', 100
对于SQL Server 2008:EXEC sp_dbcmptlevel'DatabaseName',100
-
Test your code:
测试你的代码:
SELECT dbo.AggregateBitwiseOR(Foo) AS Foo FROM Bar
SELECT dbo.AggregateBitwiseOR(Foo)AS Foo FROM Bar
I found this article helpful: http://www.codeproject.com/Articles/37377/SQL-Server-CLR-Functions
我发现这篇文章很有帮助:http://www.codeproject.com/Articles/37377/SQL-Server-CLR-Functions
#1
SQL Server
lacks native bitwise aggregates, that's why we need to emulate them.
SQL Server缺少本机按位聚合,这就是我们需要模拟它们的原因。
The main idea here is to generate a set of bits from 0
to 15
, for each bit apply the bitmask to the value and select MAX
(which will give us an OR
for a given bit), then select the SUM
(which will merge the bit masks).
这里的主要思想是生成一组从0到15的位,每个位将位掩码应用于该值并选择MAX(这将给出给定位的OR),然后选择SUM(将合并位掩码)。
The we just update the first lngInternetPK
for any given (lngContactFK, strValue)
with the new value of lngValue
, and delete all duplicates.
我们只使用新值lngValue为任何给定的(lngContactFK,strValue)更新第一个lngInternetPK,并删除所有重复项。
;WITH bits AS
(
SELECT 0 AS b
UNION ALL
SELECT b + 1
FROM bits
WHERE b < 15
),
v AS
(
SELECT i.*,
(
SELECT SUM(value)
FROM (
SELECT MAX(lngValue & POWER(2, b)) AS value
FROM tblmInternet ii
CROSS JOIN
bits
WHERE ii.lngContactFK = i.lngContactFK
AND ii.strAddress = i.strAddress
GROUP BY
b
) q
) AS lngNewValue
FROM (
SELECT ii.*, ROW_NUMBER() OVER (PARTITION BY lngContactFK, strAddress ORDER BY lngInternetPK) AS rn
FROM tblmInternet ii
) i
WHERE rn = 1
)
UPDATE v
SET lngValue = lngNewValue;
;WITH v AS
(
SELECT ii.*, ROW_NUMBER() OVER (PARTITION BY lngContactFK, strAddress ORDER BY lngInternetPK) AS rn
FROM tblmInternet ii
)
DELETE v
WHERE rn > 1
See this article in my blog for more detailed explanations:
有关更详细的说明,请参阅我的博客中的这篇文章:
- SQL Server: aggregate bitwise OR
SQL Server:按位聚合OR
#2
I believe the following query gets you what you want. This routine assumes a max of two duplicate addresses per contact. If there's more than one dup per contact, the query will have to be modified. I hope this helps.
我相信以下查询可以满足您的需求。此例程假定每个联系人最多有两个重复地址。如果每个联系人有多个重复,则必须修改查询。我希望这有帮助。
Declare @tblminternet
Table
( lngInternetPK int,
lngContactFK int,
lngValue int,
strAddress varchar(255)
)
Insert Into @tblminternet
select 1, 1, 33, 'me@myaddress.com'
union
select 2, 1, 40, 'me@myaddress.com'
union
select 3, 2, 33, 'me@myaddress2.com'
union
select 4, 2, 40, 'me@myaddress2.com'
union
select 5, 3, 2, 'me@myaddress3.com'
--Select * from @tblminternet
Select Distinct
A.lngContactFK ,
A.lngValue | B.lngValue as 'Bitwise OR',
A.strAddress
From @tblminternet A, @tblminternet B
Where A.lngContactFK = B.lngContactFK
And A.strAddress = B.strAddress
And A.lngInternetPK != B.lngInternetPK
#3
You can create SQL Server Aggregate functions in .NET that you can then implement in SQL server inline. I think this requires a minimum of SQL server 2005 and Visual Studio 2010. I did one using Visual Studio 2013 Community Edition (free even for commercial use) for use with .NET 2 and SQL Server 2005.
您可以在.NET中创建SQL Server聚合函数,然后可以在SQL Server内联中实现。我认为这需要最少的SQL Server 2005和Visual Studio 2010.我使用Visual Studio 2013 Community Edition(甚至免费用于商业用途)与.NET 2和SQL Server 2005一起使用。
See the MSDN article: https://msdn.microsoft.com/en-us/library/91e6taax(v=vs.90).aspx
请参阅MSDN文章:https://msdn.microsoft.com/en-us/library/91e6taax(v=vs.90).aspx
First you'll need to enable the CLR feature in SQL server: https://msdn.microsoft.com/en-us/library/ms131048.aspx
首先,您需要在SQL Server中启用CLR功能:https://msdn.microsoft.com/en-us/library/ms131048.aspx
sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'clr enabled', 1;
GO
RECONFIGURE;
GO
- Create a SQL Server -> SQL Server Database Project
- Right-click on the new project and select Properties
- Configure the targeted SQL Server version under Project Settings
- Configure the targeted CLR language under SQL CLR (such as VB)
- Right-click on the new project and select Add -> New Item...
- When the dialog pops up, select SQL Server -> SQL CLR VB -> SQL CLR VB Aggregate
创建SQL Server - > SQL Server数据库项目
右键单击新项目,然后选择“属性”
在“项目设置”下配置目标SQL Server版本
在SQL CLR下配置目标CLR语言(例如VB)
右键单击新项目,然后选择Add - > New Item ...
弹出对话框时,选择SQL Server - > SQL CLR VB - > SQL CLR VB Aggregate
Now you can write your bitwise code in VB:
现在您可以在VB中编写您的按位代码:
Imports System
Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server
<Serializable()> _
<Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.Native)> _
Public Structure AggregateBitwiseOR
Private CurrentAggregate As SqlTypes.SqlInt32
Public Sub Init()
CurrentAggregate = 0
End Sub
Public Sub Accumulate(ByVal value As SqlTypes.SqlInt32)
'Perform Bitwise OR against aggregate memory
CurrentAggregate = CurrentAggregate OR value
End Sub
Public Sub Merge(ByVal value as AggregateBitwiseOR)
Accumulate(value.Terminate())
End Sub
Public Function Terminate() As SqlInt32
Return CurrentAggregate
End Function
End Structure
Now deploy it: https://msdn.microsoft.com/en-us/library/dahcx0ww(v=vs.90).aspx
现在部署它:https://msdn.microsoft.com/en-us/library/dahcx0ww(v = vs。90).aspx
- Build the project using the menu bar: Build -> Build ProjectName (if the build fails with error 04018 then download a new version of the data tools @ http://msdn.microsoft.com/en-US/data/hh297027 or by going to the menu bar: Tools -> Extensions And Updates, then under updates select update for Microsoft SQL Server Update For Database Tooling)
- Copy your compiled DLL to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn and to C:\
-
Register the DLL:
注册DLL:
CREATE ASSEMBLY [CLRTools] FROM ‘c:CLRTools.dll’ WITH PERMISSION_SET = SAFE
CREATE ASSEMBLY [CLRTools] FROM'c:CLRTools.dll'with PERMISSION_SET = SAFE
-
Create the aggregate in SQL:
在SQL中创建聚合:
CREATE AGGREGATE [dbo].[AggregateBitwiseOR](@value INT) RETURNS INT EXTERNAL NAME [CLRTools].[CLRTools.AggregateBitwiseOR];
CREATE AGGREGATE [dbo]。[AggregateBitwiseOR](@ value INT)RETURNS INT EXTERNAL NAME [CLRTools]。[CLRTools.AggregateBitwiseOR];
使用菜单栏构建项目:Build - > Build ProjectName(如果构建失败,错误04018,则下载新版本的数据工具@ http://msdn.microsoft.com/en-US/data/hh297027或者进入菜单栏:工具 - >扩展和更新,然后在更新下选择更新Microsoft SQL Server更新数据库工具)
将已编译的DLL复制到C:\ Program Files \ Microsoft SQL Server \ MSSQL.1 \ MSSQL \ Binn和C:\
If you get the error "Incorrect syntax near 'EXTERNAL'" then change the database compatibility level using following commands:
如果您收到错误“EXTERNAL'附近的语法不正确”,请使用以下命令更改数据库兼容级别:
For SQL Server 2005: EXEC sp_dbcmptlevel 'DatabaseName', 90
对于SQL Server 2005:EXEC sp_dbcmptlevel'DatabaseName',90
For SQL Server 2008: EXEC sp_dbcmptlevel 'DatabaseName', 100
对于SQL Server 2008:EXEC sp_dbcmptlevel'DatabaseName',100
-
Test your code:
测试你的代码:
SELECT dbo.AggregateBitwiseOR(Foo) AS Foo FROM Bar
SELECT dbo.AggregateBitwiseOR(Foo)AS Foo FROM Bar
I found this article helpful: http://www.codeproject.com/Articles/37377/SQL-Server-CLR-Functions
我发现这篇文章很有帮助:http://www.codeproject.com/Articles/37377/SQL-Server-CLR-Functions