I would really like some advice here, to give some background info I am working with inserting Message Tracking logs from Exchange 2007 into SQL. As we have millions upon millions of rows per day I am using a Bulk Insert statement to insert the data into a SQL table.
我真的很喜欢这里的一些建议,提供一些背景信息我正在使用将Exchange 2007中的邮件跟踪日志插入SQL。由于我们每天有数百万行,我使用Bulk Insert语句将数据插入到SQL表中。
In fact I actually Bulk Insert into a temp table and then from there I MERGE the data into the live table, this is for test parsing issues as certain fields otherwise have quotes and such around the values.
实际上我实际上是Bulk Insert到一个临时表然后从那里我将数据合并到实时表中,这是用于测试解析问题,因为某些字段否则有引号等值。
This works well, with the exception of the fact that the recipient-address column is a delimited field seperated by a ; character, and it can be incredibly long sometimes as there can be many email recipients.
这种方法很有效,但收件人地址列是由a分隔的分隔字段除外;角色,有时可能会非常长,因为可能有很多电子邮件收件人。
I would like to take this column, and split the values into multiple rows which would then be inserted into another table. Problem is anything I am trying is either taking too long or not working the way I want.
我想采用这一列,并将值拆分为多行,然后将其插入另一个表中。问题是我正在尝试的任何事情要么花费太长时间,要么不按我想要的方式工作。
Take this example data:
以此示例数据为例:
message-id recipient-address
2D5E558D4B5A3D4F962DA5051EE364BE06CF37A3A5@Server.com user1@domain1.com
E52F650C53A275488552FFD49F98E9A6BEA1262E@Server.com user2@domain2.com
4fd70c47.4d600e0a.0a7b.ffff87e1@Server.com user3@domain3.com;user4@domain4.com;user5@domain5.com
I would like this to be formatted as followed in my Recipients table:
我希望在我的收件人表格中将其格式化为:
message-id recipient-address
2D5E558D4B5A3D4F962DA5051EE364BE06CF37A3A5@Server.com user1@domain1.com
E52F650C53A275488552FFD49F98E9A6BEA1262E@Server.com user2@domain2.com
4fd70c47.4d600e0a.0a7b.ffff87e1@Server.com user3@domain3.com
4fd70c47.4d600e0a.0a7b.ffff87e1@Server.com user4@domain4.com
4fd70c47.4d600e0a.0a7b.ffff87e1@Server.com user5@domain5.com
Does anyone have any ideas about how I can go about doing this?
有没有人对我如何做到这一点有任何想法?
I know PowerShell pretty well, so I tried in that, but a foreach loop even on 28K records took forever to process, I need something that will run as quickly/efficiently as possible.
我非常了解PowerShell,所以我尝试了,但即使在28K记录上的foreach循环也需要永久处理,我需要能够尽可能快速/高效地运行的东西。
Thanks!
谢谢!
3 个解决方案
#1
41
First, create a split function:
首先,创建一个split函数:
CREATE FUNCTION dbo.SplitStrings
(
@List NVARCHAR(MAX),
@Delimiter NVARCHAR(255)
)
RETURNS TABLE
AS
RETURN (SELECT Number = ROW_NUMBER() OVER (ORDER BY Number),
Item FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(@List, Number,
CHARINDEX(@Delimiter, @List + @Delimiter, Number) - Number)))
FROM (SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1 CROSS APPLY sys.all_objects) AS n(Number)
WHERE Number <= CONVERT(INT, LEN(@List))
AND SUBSTRING(@Delimiter + @List, Number, 1) = @Delimiter
) AS y);
GO
Now you can extrapolate simply by:
现在你可以简单地推断:
SELECT s.[message-id], f.Item
FROM dbo.SourceData AS s
CROSS APPLY dbo.SplitStrings(s.[recipient-address], ';') as f;
Also I suggest not putting dashes in column names. It means you always have to put them in [square brackets]
.
另外我建议不要在列名中添加破折号。这意味着你总是要把它们放在[方括号]中。
#2
1
SQL Server 2016 include a new table function string_split(), similar to the previous solution.
SQL Server 2016包含一个新的表函数string_split(),类似于以前的解决方案。
The only requirement is Set compatibility level to 130 (SQL Server 2016)
唯一的要求是将兼容级别设置为130(SQL Server 2016)
#3
0
You may use CROSS APPLY (available in SQL Server 2005 and above) and STRING_SPLIT function (available in SQL Server 2016 and above):
您可以使用CROSS APPLY(在SQL Server 2005及更高版本中可用)和STRING_SPLIT函数(在SQL Server 2016及更高版本中可用):
DECLARE @delimiter nvarchar(255) = ';';
-- create tables
CREATE TABLE MessageRecipients (MessageId int, Recipients nvarchar(max));
CREATE TABLE MessageRecipient (MessageId int, Recipient nvarchar(max));
-- insert data
INSERT INTO MessageRecipients VALUES (1, 'user1@domain.com; user2@domain.com; user3@domain.com');
INSERT INTO MessageRecipients VALUES (2, 'user@domain1.com; user@domain2.com');
-- insert into MessageRecipient
INSERT INTO MessageRecipient
SELECT MessageId, ltrim(rtrim(value))
FROM MessageRecipients
CROSS APPLY STRING_SPLIT(Recipients, @delimiter)
-- output results
SELECT * FROM MessageRecipients;
SELECT * FROM MessageRecipient;
-- delete tables
DROP TABLE MessageRecipients;
DROP TABLE MessageRecipient;
Results:
结果:
MessageId Recipients
----------- ----------------------------------------------------
1 user1@domain.com; user2@domain.com; user3@domain.com
2 user@domain1.com; user@domain2.com
and
和
MessageId Recipient
----------- ----------------
1 user1@domain.com
1 user2@domain.com
1 user3@domain.com
2 user@domain1.com
2 user@domain2.com
#1
41
First, create a split function:
首先,创建一个split函数:
CREATE FUNCTION dbo.SplitStrings
(
@List NVARCHAR(MAX),
@Delimiter NVARCHAR(255)
)
RETURNS TABLE
AS
RETURN (SELECT Number = ROW_NUMBER() OVER (ORDER BY Number),
Item FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(@List, Number,
CHARINDEX(@Delimiter, @List + @Delimiter, Number) - Number)))
FROM (SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1 CROSS APPLY sys.all_objects) AS n(Number)
WHERE Number <= CONVERT(INT, LEN(@List))
AND SUBSTRING(@Delimiter + @List, Number, 1) = @Delimiter
) AS y);
GO
Now you can extrapolate simply by:
现在你可以简单地推断:
SELECT s.[message-id], f.Item
FROM dbo.SourceData AS s
CROSS APPLY dbo.SplitStrings(s.[recipient-address], ';') as f;
Also I suggest not putting dashes in column names. It means you always have to put them in [square brackets]
.
另外我建议不要在列名中添加破折号。这意味着你总是要把它们放在[方括号]中。
#2
1
SQL Server 2016 include a new table function string_split(), similar to the previous solution.
SQL Server 2016包含一个新的表函数string_split(),类似于以前的解决方案。
The only requirement is Set compatibility level to 130 (SQL Server 2016)
唯一的要求是将兼容级别设置为130(SQL Server 2016)
#3
0
You may use CROSS APPLY (available in SQL Server 2005 and above) and STRING_SPLIT function (available in SQL Server 2016 and above):
您可以使用CROSS APPLY(在SQL Server 2005及更高版本中可用)和STRING_SPLIT函数(在SQL Server 2016及更高版本中可用):
DECLARE @delimiter nvarchar(255) = ';';
-- create tables
CREATE TABLE MessageRecipients (MessageId int, Recipients nvarchar(max));
CREATE TABLE MessageRecipient (MessageId int, Recipient nvarchar(max));
-- insert data
INSERT INTO MessageRecipients VALUES (1, 'user1@domain.com; user2@domain.com; user3@domain.com');
INSERT INTO MessageRecipients VALUES (2, 'user@domain1.com; user@domain2.com');
-- insert into MessageRecipient
INSERT INTO MessageRecipient
SELECT MessageId, ltrim(rtrim(value))
FROM MessageRecipients
CROSS APPLY STRING_SPLIT(Recipients, @delimiter)
-- output results
SELECT * FROM MessageRecipients;
SELECT * FROM MessageRecipient;
-- delete tables
DROP TABLE MessageRecipients;
DROP TABLE MessageRecipient;
Results:
结果:
MessageId Recipients
----------- ----------------------------------------------------
1 user1@domain.com; user2@domain.com; user3@domain.com
2 user@domain1.com; user@domain2.com
and
和
MessageId Recipient
----------- ----------------
1 user1@domain.com
1 user2@domain.com
1 user3@domain.com
2 user@domain1.com
2 user@domain2.com