I'm facing the following problem:
我面临以下问题:
I'm trying to keep a table in sql-server synchronized with multiple external databases. These external databases do not have a shared unique primary key so the local table has a simple integer PK.
我正在尝试将sql-server中的表与多个外部数据库保持同步。这些外部数据库没有共享的唯一主键,因此本地表具有简单的整数PK。
Now to keep the local table up to date the following is done:
现在,为了使本地表保持最新,完成以下操作:
- External databases are queried.
- 查询外部数据库。
- Data is converted into valid data for local table.
- 数据被转换为本地表的有效数据。
- An insert is used to try to write the data to the local table.
- insert用于尝试将数据写入本地表。
- If insert returns a duplicate entry exception, the PK will be searched by a select query and the data will be written to table by an update query.
- 如果insert返回重复的条目异常,则将通过select查询搜索PK,并且数据将通过更新查询写入表。
- Another table is modified using the PK of the inserted or updated row.
- 使用插入或更新的行的PK修改另一个表。
Now this works fine but to me it seems very inefficient. Most of the time the data is already in the local table and results in a duplicate key exception on the insert. This means lots of exceptions that need to be handled, which is expensive. Also, because of the PK's being managed by the DB a select query must be used to find the row to be updated.
现在这个工作正常但对我来说似乎非常低效。大多数情况下,数据已经在本地表中,并导致插入上出现重复的键异常。这意味着需要处理许多异常,这很昂贵。此外,由于PK由DB管理,因此必须使用选择查询来查找要更新的行。
How can I avoid this effect? I do not want to use a stored procedure as I like to keep the query manageable by code and included in version control.
我该如何避免这种影响?我不想使用存储过程,因为我喜欢通过代码保持查询可管理并包含在版本控制中。
I've looked at merge but I've seen too many people reporting issues with it.
我看过合并,但我看到有太多人报告它的问题。
I think I need to use a form of upsert but I'm not sure how to do this with the PK's being managed by the DB.
我想我需要使用一种upsert形式,但我不知道如何通过数据库管理PK来实现这一点。
tl;dr: What I need is a query that will allow me to either insert or update a row(depending on duplicate key or not) that will always return the PK of the row.
tl; dr:我需要的是一个查询,它允许我插入或更新一行(取决于重复键或不重复),它总是返回行的PK。
1 个解决方案
#1
2
I have an implementation that I've done in the past that I like. You may or may not find it useful.
我有一个我过去做过的实现,我喜欢。您可能会或可能不会发现它有用。
This is how it works... I load both external and local data into memory using a model object that will work for both. For example...
这是它的工作原理...我使用一个适用于两者的模型对象将外部和本地数据加载到内存中。例如...
public class Person
{
public string FirstName { get; set; }
public string LastName { get; set; }
public string PhoneNumber { get; set; }
public string Address { get; set; }
// This comparer will be used to find records that exist or don't exist.
public class KeyFieldComparer : IEqualityComparer<Person>
{
public bool Equals(Person p1, Person p2)
{
return p1.FirstName == p2.FirstName && p1.LastName == p2.LastName;
}
public int GetHashCode(Person p)
{
return p.FirstName.GetHashCode() ^ p.LastName.GetHashCode();
}
}
// This comparer will be used to find records that are outdated and need to be updated.
public class OutdatedComparer : IEqualityComparer<Person>
{
public bool Equals(Person p1, Person p2)
{
return p1.FirstName == p2.FirstName && p1.LastName == p2.LastName && (p1.PhoneNumber != p2.PhoneNumber || p1.Address != p2.Address);
}
public int GetHashCode(Person p)
{
return p.FirstName.GetHashCode() ^ p.LastName.GetHashCode();
}
}
}
We need to have some way to uniquely identify the records which I assume you have. In this example it's by FirstName
and LastName
(I know that's not very unique but for simplicity let's pretend it works well). The IEqualityComparer<>
will do the work of finding the outdated and new records when the lists are loaded into memory.
我们需要有一些方法来唯一地识别我认为你拥有的记录。在这个例子中,它是由FirstName和LastName(我知道这不是很独特,但为了简单起见,我们假装它运作良好)。当列表加载到内存中时,IEqualityComparer <>将完成查找过期记录和新记录的工作。
Now we simply separate existing outdated records and brand new records like this...
现在我们简单地将现有的过时记录和这样的全新记录分开......
List<Person> local = loadLocalRecords();
List<Person> external = loadExternalRecords();
var newRecordsToInsert = external.Except(local, new Person.KeyFieldComparer());
var outdatedRecordsToUpdate = local.Intersect(external, new Person.OutdatedComparer());
I hope it makes sense. I can answer questions if you have them. The good thing about this method is that it does the job with the least amount of hits to the database (I think). The bad thing is that it has to load everything into memory which may not be practical for you. But your table size has to be large for it to be a problem. Somewhere above a few million records depending on how many columns.
我希望这是有道理的。如果你有问题,我可以回答问题。这种方法的好处在于它能够以最少的数据库命中率完成工作(我认为)。不好的是它必须将所有内容加载到内存中,这对您来说可能并不实用。但是你的桌子大小必须很大才能成为一个问题。根据有多少列,超过几百万条记录。
#1
2
I have an implementation that I've done in the past that I like. You may or may not find it useful.
我有一个我过去做过的实现,我喜欢。您可能会或可能不会发现它有用。
This is how it works... I load both external and local data into memory using a model object that will work for both. For example...
这是它的工作原理...我使用一个适用于两者的模型对象将外部和本地数据加载到内存中。例如...
public class Person
{
public string FirstName { get; set; }
public string LastName { get; set; }
public string PhoneNumber { get; set; }
public string Address { get; set; }
// This comparer will be used to find records that exist or don't exist.
public class KeyFieldComparer : IEqualityComparer<Person>
{
public bool Equals(Person p1, Person p2)
{
return p1.FirstName == p2.FirstName && p1.LastName == p2.LastName;
}
public int GetHashCode(Person p)
{
return p.FirstName.GetHashCode() ^ p.LastName.GetHashCode();
}
}
// This comparer will be used to find records that are outdated and need to be updated.
public class OutdatedComparer : IEqualityComparer<Person>
{
public bool Equals(Person p1, Person p2)
{
return p1.FirstName == p2.FirstName && p1.LastName == p2.LastName && (p1.PhoneNumber != p2.PhoneNumber || p1.Address != p2.Address);
}
public int GetHashCode(Person p)
{
return p.FirstName.GetHashCode() ^ p.LastName.GetHashCode();
}
}
}
We need to have some way to uniquely identify the records which I assume you have. In this example it's by FirstName
and LastName
(I know that's not very unique but for simplicity let's pretend it works well). The IEqualityComparer<>
will do the work of finding the outdated and new records when the lists are loaded into memory.
我们需要有一些方法来唯一地识别我认为你拥有的记录。在这个例子中,它是由FirstName和LastName(我知道这不是很独特,但为了简单起见,我们假装它运作良好)。当列表加载到内存中时,IEqualityComparer <>将完成查找过期记录和新记录的工作。
Now we simply separate existing outdated records and brand new records like this...
现在我们简单地将现有的过时记录和这样的全新记录分开......
List<Person> local = loadLocalRecords();
List<Person> external = loadExternalRecords();
var newRecordsToInsert = external.Except(local, new Person.KeyFieldComparer());
var outdatedRecordsToUpdate = local.Intersect(external, new Person.OutdatedComparer());
I hope it makes sense. I can answer questions if you have them. The good thing about this method is that it does the job with the least amount of hits to the database (I think). The bad thing is that it has to load everything into memory which may not be practical for you. But your table size has to be large for it to be a problem. Somewhere above a few million records depending on how many columns.
我希望这是有道理的。如果你有问题,我可以回答问题。这种方法的好处在于它能够以最少的数据库命中率完成工作(我认为)。不好的是它必须将所有内容加载到内存中,这对您来说可能并不实用。但是你的桌子大小必须很大才能成为一个问题。根据有多少列,超过几百万条记录。