SQL Server - 如何插入记录并确保它是唯一的

时间:2021-09-18 04:16:59

I'm trying to figure out the best way to insert a record into a single table but only if the item doesn't already exist. The KEY in this case is an NVARCHAR(400) field. For this example, lets pretend it's the name of a word in the Oxford English Dictionary / insert your fav dictionary here. Also, i'm guessing i will need to make the Word field a primary key. (the table will also have a unique identifier PK also).

我试图找出将记录插入单个表的最佳方法,但前提是该项目尚不存在。在这种情况下,KEY是NVARCHAR(400)字段。对于此示例,我们假装它是牛津英语词典中单词的名称/在此处插入您的fav词典。另外,我猜我需要将Word字段作为主键。 (该表也将具有唯一的标识符PK)。

So .. i might get these words that i need to add to the table...

所以...我可能会得到这些我需要添加到表格中的单词...

eg.

例如。

  • Cat
  • Dog
  • Foo
  • Bar
  • 酒吧
  • PewPew
  • PewPew
  • etc...
  • 等等...

So traditionally, i would try the following (pseudo code)

所以传统上,我会尝试以下(伪代码)

SELECT WordID FROM Words WHERE Word = @Word
IF WordID IS NULL OR WordID <= 0
    INSERT INTO Words VALUES (@Word)

ie. If the word doesn't exist, then insert it.

即。如果该单词不存在,则插入它。

Now .. the problem i'm worried about is that we're getting LOTS of hits .. so is it possible that the word could be inserted from another process in between the SELECT and the INSERT .. which would then throw a constraint error? (ie. a Race Condition).

现在..我担心的问题是我们得到了大量的命中..所以有可能这个词可以从SELECT和INSERT之间的另一个进程插入...这会引发一个约束错误? (即比赛条件)。

I then thought that i might be able to do the following ...

然后我想我可能会做以下事情......

INSERT INTO Words (Word)
SELECT @Word
WHERE NOT EXISTS (SELECT WordID FROM Words WHERE Word = @Word)

basically, insert a word when it doesn't exist.

基本上,当一个单词不存在时插入一个单词。

Bad syntax aside, i'm not sure if this is bad or good because of how it locks down the table (if it does) and is not that performant on a table that it getting massive reads and plenty of writes.

除了错误的语法之外,我不确定这是坏还是好因为它如何锁定表(如果它确实)并且不是表上的高效读取和大量写入的性能。

So - what do you Sql gurus think / do?

所以 - 你是什么Sql大师想/做的?

I was hoping to have a simple insert and 'catch' that for any errors thrown.

我希望有一个简单的插入和“捕获”,任何错误抛出。

7 个解决方案

#1


24  

Your solution:

你的解决方案

INSERT INTO Words (Word)
    SELECT @Word
WHERE NOT EXISTS (SELECT WordID FROM Words WHERE Word = @Word)

...is about as good as it gets. You could simplify it to this:

......几乎和它一样好。您可以将其简化为:

INSERT INTO Words (Word)
    SELECT @Word
WHERE NOT EXISTS (SELECT * FROM Words WHERE Word = @Word)

...because EXISTS doesn't actually need to return any records, so the query optimiser won't bother looking at which fields you asked for.

...因为EXISTS实际上不需要返回任何记录,因此查询优化器不会费心查看您要求的字段。

As you mention, however, this isn't particularly performant, because it'll lock the whole table during the INSERT. Except that, if you add a unique index (it doesn't need to be the primary key) to Word, then it'll only need to lock the relevant pages.

但是,正如您所提到的,这不是特别高效,因为它会在INSERT期间锁定整个表。除此之外,如果您向Word添加唯一索引(它不需要是主键),那么它只需要锁定相关页面。

Your best option is to simulate the expected load and look at the performance with SQL Server Profiler. As with any other field, premature optimisation is a bad thing. Define acceptable performance metrics, and then measure before doing anything else.

您最好的选择是模拟预期的负载并使用SQL Server Profiler查看性能。与任何其他领域一样,过早优化是一件坏事。定义可接受的性能指标,然后在执行任何其他操作之前进

If that's still not giving you adequate performance, then there's a bunch of techniques from the data warehousing field that could help.

如果仍然没有给你足够的性能,那么数据仓库领域的一系列技术可能会有所帮助。

#2


4  

I think I've found a better (or at least faster) answer to this. Create an index like:

我想我已经找到了一个更好(或至少更快)的答案。创建一个索引,如:

CREATE UNIQUE NONCLUSTERED INDEX [IndexTableUniqueRows] ON [dbo].[table] 
(
    [Col1] ASC,
    [Col2] ASC,

)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = ON, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

Include all the columns that define uniqueness. The important part is IGNORE_DUP_KEY = ON. That turns non unique inserts into warnings. SSIS ignores these warnings and you can still use fastload too.

包括定义唯一性的所有列。重要的部分是IGNORE_DUP_KEY = ON。这会将非唯一插入转换为警告。 SSIS忽略了这些警告,你仍然可以使用fastload。

#3


3  

If you are using MS SQL Server, you can create a unique index on your table's columns that need to be unique (documented here):

如果您使用的是MS SQL Server,则可以在表的列上创建一个唯一的索引,该索引必须是唯一的(在此处记录):

CREATE UNIQUE [ CLUSTERED | NONCLUSTERED ] INDEX <index_name>
    ON Words ( word [ ASC | DESC ])

Specify Clustered or NonClustered, depending on your case. Also, if you want it sorted (to enable faster seeking), specify ASC or DESC for the sort order.

指定Clustered或NonClustered,具体取决于您的情况。此外,如果您希望对其进行排序(以便更快地搜索),请为排序顺序指定ASC或DESC。

See here, if you want to learn more about indexes architecture.

如果您想了解有关索引体系结构的更多信息,请参见此处。

Otherwise, you could use UNIQUE CONSTRAINTS like documented here:

否则,您可以使用UNIQUE CONSTRAINTS,如下所示:

ALTER TABLE Words
ADD CONSTRAINT UniqueWord
UNIQUE (Word); 

#4


2  

I had similar problem and this is how I solved it

我有类似的问题,这就是我解决它的方式

insert into Words
( selectWord , Fixword)
SELECT word,'theFixword'
FROM   OldWordsTable
WHERE 
(
    (word LIKE 'junk%') OR
     (word LIKE 'orSomthing') 

)
and word not in 
    (
        SELECT selectWord FROM words WHERE selectWord = word
    ) 

#5


1  

while unique constraint is certaily one way to go you can also use this for your insert logic: http://www.sqlteam.com/article/application-locks-or-mutexes-in-sql-server-2005

虽然唯一约束是一种可行的方法,但您也可以将它用于插入逻辑:http://www.sqlteam.com/article/application-locks-or-mutexes-in-sql-server-2005

basicaly you don't put any locks on the table below thus not worrying about the reads while your existance checks will be performed ok.

基本上你没有在下面的桌子上放任何锁,因此在你的存在检查将被执行时不会担心读取。

it's a mutex in sql code.

它是sql代码中的互斥锁。

#6


0  

I can't speak to the particulars of MS SQL, but one point of a primary key in SQL is to ensure uniqueness. So by definition in generic SQL terms, a primary key is one or more fields that is unique to a table. While there are different ways to enforce this behavior (replace the old entry with the new one vs. reject the new one) I would be surprised if MS SQL both didn't have a mechanism for enforcing this behavior and that it wasn't to reject the new entry. Just make sure you set the primary key to the Word field and it should work.

我不能谈论MS SQL的细节,但SQL中主键的一点是确保唯一性。因此,根据通用SQL术语的定义,主键是一个或多个对表而言唯一的字段。虽然有不同的方法来强制执行此行为(将旧条目替换为新条目与拒绝新条目),如果MS SQL既没有强制执行此行为的机制又不是拒绝新条目。只需确保将主键设置为Word字段,它应该可以正常工作。

Once again though, I disclaim this is all from my knowledge from MySQL programming and my databases class, so apologies if I'm off on the intricacies of MS SQL.

再一次,我不赞成这完全来自我对MySQL编程和我的数据库类的了解,所以如果我对MS SQL的复杂性感到抱歉,那就道歉了。

#7


-1  

declare @Error int

begin transaction
  INSERT INTO Words (Word) values(@word)
  set @Error = @@ERROR
  if @Error <> 0 --if error is raised
  begin
      goto LogError
  end
commit transaction
goto ProcEnd

LogError:
rollback transaction

#1


24  

Your solution:

你的解决方案

INSERT INTO Words (Word)
    SELECT @Word
WHERE NOT EXISTS (SELECT WordID FROM Words WHERE Word = @Word)

...is about as good as it gets. You could simplify it to this:

......几乎和它一样好。您可以将其简化为:

INSERT INTO Words (Word)
    SELECT @Word
WHERE NOT EXISTS (SELECT * FROM Words WHERE Word = @Word)

...because EXISTS doesn't actually need to return any records, so the query optimiser won't bother looking at which fields you asked for.

...因为EXISTS实际上不需要返回任何记录,因此查询优化器不会费心查看您要求的字段。

As you mention, however, this isn't particularly performant, because it'll lock the whole table during the INSERT. Except that, if you add a unique index (it doesn't need to be the primary key) to Word, then it'll only need to lock the relevant pages.

但是,正如您所提到的,这不是特别高效,因为它会在INSERT期间锁定整个表。除此之外,如果您向Word添加唯一索引(它不需要是主键),那么它只需要锁定相关页面。

Your best option is to simulate the expected load and look at the performance with SQL Server Profiler. As with any other field, premature optimisation is a bad thing. Define acceptable performance metrics, and then measure before doing anything else.

您最好的选择是模拟预期的负载并使用SQL Server Profiler查看性能。与任何其他领域一样,过早优化是一件坏事。定义可接受的性能指标,然后在执行任何其他操作之前进

If that's still not giving you adequate performance, then there's a bunch of techniques from the data warehousing field that could help.

如果仍然没有给你足够的性能,那么数据仓库领域的一系列技术可能会有所帮助。

#2


4  

I think I've found a better (or at least faster) answer to this. Create an index like:

我想我已经找到了一个更好(或至少更快)的答案。创建一个索引,如:

CREATE UNIQUE NONCLUSTERED INDEX [IndexTableUniqueRows] ON [dbo].[table] 
(
    [Col1] ASC,
    [Col2] ASC,

)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = ON, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

Include all the columns that define uniqueness. The important part is IGNORE_DUP_KEY = ON. That turns non unique inserts into warnings. SSIS ignores these warnings and you can still use fastload too.

包括定义唯一性的所有列。重要的部分是IGNORE_DUP_KEY = ON。这会将非唯一插入转换为警告。 SSIS忽略了这些警告,你仍然可以使用fastload。

#3


3  

If you are using MS SQL Server, you can create a unique index on your table's columns that need to be unique (documented here):

如果您使用的是MS SQL Server,则可以在表的列上创建一个唯一的索引,该索引必须是唯一的(在此处记录):

CREATE UNIQUE [ CLUSTERED | NONCLUSTERED ] INDEX <index_name>
    ON Words ( word [ ASC | DESC ])

Specify Clustered or NonClustered, depending on your case. Also, if you want it sorted (to enable faster seeking), specify ASC or DESC for the sort order.

指定Clustered或NonClustered,具体取决于您的情况。此外,如果您希望对其进行排序(以便更快地搜索),请为排序顺序指定ASC或DESC。

See here, if you want to learn more about indexes architecture.

如果您想了解有关索引体系结构的更多信息,请参见此处。

Otherwise, you could use UNIQUE CONSTRAINTS like documented here:

否则,您可以使用UNIQUE CONSTRAINTS,如下所示:

ALTER TABLE Words
ADD CONSTRAINT UniqueWord
UNIQUE (Word); 

#4


2  

I had similar problem and this is how I solved it

我有类似的问题,这就是我解决它的方式

insert into Words
( selectWord , Fixword)
SELECT word,'theFixword'
FROM   OldWordsTable
WHERE 
(
    (word LIKE 'junk%') OR
     (word LIKE 'orSomthing') 

)
and word not in 
    (
        SELECT selectWord FROM words WHERE selectWord = word
    ) 

#5


1  

while unique constraint is certaily one way to go you can also use this for your insert logic: http://www.sqlteam.com/article/application-locks-or-mutexes-in-sql-server-2005

虽然唯一约束是一种可行的方法,但您也可以将它用于插入逻辑:http://www.sqlteam.com/article/application-locks-or-mutexes-in-sql-server-2005

basicaly you don't put any locks on the table below thus not worrying about the reads while your existance checks will be performed ok.

基本上你没有在下面的桌子上放任何锁,因此在你的存在检查将被执行时不会担心读取。

it's a mutex in sql code.

它是sql代码中的互斥锁。

#6


0  

I can't speak to the particulars of MS SQL, but one point of a primary key in SQL is to ensure uniqueness. So by definition in generic SQL terms, a primary key is one or more fields that is unique to a table. While there are different ways to enforce this behavior (replace the old entry with the new one vs. reject the new one) I would be surprised if MS SQL both didn't have a mechanism for enforcing this behavior and that it wasn't to reject the new entry. Just make sure you set the primary key to the Word field and it should work.

我不能谈论MS SQL的细节,但SQL中主键的一点是确保唯一性。因此,根据通用SQL术语的定义,主键是一个或多个对表而言唯一的字段。虽然有不同的方法来强制执行此行为(将旧条目替换为新条目与拒绝新条目),如果MS SQL既没有强制执行此行为的机制又不是拒绝新条目。只需确保将主键设置为Word字段,它应该可以正常工作。

Once again though, I disclaim this is all from my knowledge from MySQL programming and my databases class, so apologies if I'm off on the intricacies of MS SQL.

再一次,我不赞成这完全来自我对MySQL编程和我的数据库类的了解,所以如果我对MS SQL的复杂性感到抱歉,那就道歉了。

#7


-1  

declare @Error int

begin transaction
  INSERT INTO Words (Word) values(@word)
  set @Error = @@ERROR
  if @Error <> 0 --if error is raised
  begin
      goto LogError
  end
commit transaction
goto ProcEnd

LogError:
rollback transaction