I have an application that is made up of the following:
我有一个由以下内容组成的应用程序:
A central database containing 100k+ records A number of "client" databases each containing around 10-20k records
包含100k +记录的*数据库许多“客户”数据库,每个数据库包含大约10-20k条记录
The client databases contain details of contacts that each have a unique ID (contactID).
The central database contains some of these contacts identified by the same contactID.
客户端数据库包含每个具有唯一ID(contactID)的联系人的详细信息。*数据库包含由同一contactID标识的一些联系人。
Overnight we need to iterate through the client databases querying the central database for updates to each contact then bring it into the client database.
The central database is held by a third party so we cannot change anything.
The company holding the central database want to do this over web services iterating through each contact.
一夜之间,我们需要遍历客户端数据库,查询*数据库以获取每个联系人的更新,然后将其带入客户端数据库。*数据库由第三方持有,因此我们无法改变任何内容。持有*数据库的公司希望通过遍历每个联系人的Web服务来执行此操作。
My concerns are this would be very slow to do over web services given the number of records.
我担心的是,考虑到记录的数量,这对于Web服务来说是非常缓慢的。
Currently I am thinking of generating a file on each client which contains a list of all the contacts for that client. This would then be sent to the central database. The central database would then process this file and send back another file containing all updates.
目前我正在考虑在每个客户端上生成一个文件,其中包含该客户端的所有联系人列表。然后将其发送到*数据库。然后,*数据库将处理此文件并发回包含所有更新的另一个文件。
How would you create this so it runs as quick as possible?
你会如何创建它,以便尽可能快地运行?
5 个解决方案
#1
3
I would not go through each record in the scenario you describe. Web services are fine but you could as easily use them for bulk updates.
我不会查看您描述的场景中的每条记录。 Web服务很好但您可以轻松地将它们用于批量更新。
From the top of my head, something like this would work a little better:
从我的头脑中,这样的事情会更好一些:
- Get a 'diff' containing all the changes in the master database. This could be done before the synchronisation starts and only has to be done once per day.
- Send a full list of contacts from the client to the server. With a list that big it does not really matter how it is send but if you don't like SOAP web services think about JSON. I would go with web-services for convenience. Again, the list of contacts could be prepared in advance.
- Create a list of records which need to be update by comparing the 'diff'with the list from the client.
- Get all the updated details in one go (if you use MS SQL you could do that with XML, other SQL services offer different paths)
- Send the details to the client
获取包含master数据库中所有更改的“diff”。这可以在同步开始之前完成,并且每天只需要完成一次。
将完整的联系人列表从客户端发送到服务器。有一个大的列表并不重要它如何发送,但如果你不喜欢SOAP Web服务考虑JSON。为方便起见,我会选择网络服务。同样,可以提前准备联系人列表。
通过将'diff'与客户端列表进行比较,创建需要更新的记录列表。
一次性获取所有更新的详细信息(如果使用MS SQL,则可以使用XML执行此操作,其他SQL服务提供不同的路径)
将详细信息发送给客户
The same mechanism could be used for updating one record only, you just need to use different list (containing 1 ID) in the p.2
可以使用相同的机制仅更新一条记录,只需在p.2中使用不同的列表(包含1个ID)
#2
1
As you're relatively constrained by the third party, you might want to do one of the following:
由于您受到第三方的相对约束,您可能希望执行以下操作之一:
-
Have the third party create a web service which returns a list of all id's which have details changed in the last day.
让第三方创建一个Web服务,该服务返回所有id的列表,其中包含在最后一天更改的详细信息。
-
You then iterate through this reduced list using a web service (which hopefully is much smaller then the entire list), and update your contacts as appropriate.
然后,您使用Web服务(希望比整个列表小得多)遍历此缩减列表,并根据需要更新您的联系人。
or
- Have the third party create a web service which contains an xml document containing a dump of the contacts details in their system which have changed in a certain period. You would then process this document, updating your details.
让第三方创建一个Web服务,其中包含一个xml文档,其中包含系统中已在特定时间段内更改的联系人详细信息的转储。然后,您将处理此文档,更新您的详细信息。
#3
1
Go with bulk transfers wherever possible.
尽可能使用批量转移。
Opening a TCP/IP connection is a VERY slow process. Transferring 20K contact ID's (even if the ID is huge) in a single WS request is a lot faster than 20K web service requests.
打开TCP / IP连接是一个非常缓慢的过程。在单个WS请求中传输20K联系ID(即使ID很大)比20K Web服务请求快得多。
You don't want to wait for them to reply. You have two kinds of potential workflows.
你不想等他们回复。您有两种潜在的工作流程。
-
Busy Waiting. You send a batch. You poll every few minutes until the batch is done. This is rather icky, but totally one-sided. You do all the work, they just respond with "not done yet" or "done". Then you request your batch results.
忙着等待。您发送批次。您每隔几分钟轮询一次,直到批处理完成。这是相当狡猾的,但完全是片面的。你做所有的工作,他们只是回答“尚未完成”或“完成”。然后您请求批处理结果。
-
Notification. You send a batch including some address at which you can be notified. It might be an email address, or it might be a URL (IP address, port and path) at which you want notification.
通知。您发送一个批次,包括一些可以通知您的地址。它可能是电子邮件地址,也可能是您想要通知的URL(IP地址,端口和路径)。
-
If you elect to use email (or similar) notification you can then do a proper WS request to retrieve the batch.
如果您选择使用电子邮件(或类似)通知,则可以执行正确的WS请求以检索批处理。
-
If you elect to use WS notification, you have a small web service that either gets a simple notification, and does a WS request to get the result, or they send you the entire result.
如果您选择使用WS通知,那么您将拥有一个小型Web服务,该服务可以获得简单的通知,并执行WS请求以获取结果,或者它们会向您发送整个结果。
-
-
If you want to be rude, open several hundred threads and make several hundred concurrent requests. This will take less elapsed time, but will swamp their server.
如果你想要粗鲁,打开几百个线程并发出几百个并发请求。这将花费更少的时间,但会淹没他们的服务器。
#4
0
You can ask for a file (XML) containing all the ID details from the central DB and you can get it converted in to some object mappings. Then having object list at your end you can compare and update accordingly.
您可以请求包含来自*数据库的所有ID详细信息的文件(XML),您可以将其转换为某些对象映射。然后在您的最后有对象列表,您可以相应地进行比较和更新。
#5
0
how about messaging? On the central database side, a messaging queue system can be created for storing db records update events, and there may be a topic for each client databases so that client can only subscribe to those interesting events. The queue will be responsible for notifying client of records update events.
消息怎么样?在*数据库端,可以创建用于存储db记录更新事件的消息传递队列系统,并且每个客户端数据库可能有一个主题,以便客户端只能订阅那些有趣的事件。队列将负责通知客户端记录更新事件。
#1
3
I would not go through each record in the scenario you describe. Web services are fine but you could as easily use them for bulk updates.
我不会查看您描述的场景中的每条记录。 Web服务很好但您可以轻松地将它们用于批量更新。
From the top of my head, something like this would work a little better:
从我的头脑中,这样的事情会更好一些:
- Get a 'diff' containing all the changes in the master database. This could be done before the synchronisation starts and only has to be done once per day.
- Send a full list of contacts from the client to the server. With a list that big it does not really matter how it is send but if you don't like SOAP web services think about JSON. I would go with web-services for convenience. Again, the list of contacts could be prepared in advance.
- Create a list of records which need to be update by comparing the 'diff'with the list from the client.
- Get all the updated details in one go (if you use MS SQL you could do that with XML, other SQL services offer different paths)
- Send the details to the client
获取包含master数据库中所有更改的“diff”。这可以在同步开始之前完成,并且每天只需要完成一次。
将完整的联系人列表从客户端发送到服务器。有一个大的列表并不重要它如何发送,但如果你不喜欢SOAP Web服务考虑JSON。为方便起见,我会选择网络服务。同样,可以提前准备联系人列表。
通过将'diff'与客户端列表进行比较,创建需要更新的记录列表。
一次性获取所有更新的详细信息(如果使用MS SQL,则可以使用XML执行此操作,其他SQL服务提供不同的路径)
将详细信息发送给客户
The same mechanism could be used for updating one record only, you just need to use different list (containing 1 ID) in the p.2
可以使用相同的机制仅更新一条记录,只需在p.2中使用不同的列表(包含1个ID)
#2
1
As you're relatively constrained by the third party, you might want to do one of the following:
由于您受到第三方的相对约束,您可能希望执行以下操作之一:
-
Have the third party create a web service which returns a list of all id's which have details changed in the last day.
让第三方创建一个Web服务,该服务返回所有id的列表,其中包含在最后一天更改的详细信息。
-
You then iterate through this reduced list using a web service (which hopefully is much smaller then the entire list), and update your contacts as appropriate.
然后,您使用Web服务(希望比整个列表小得多)遍历此缩减列表,并根据需要更新您的联系人。
or
- Have the third party create a web service which contains an xml document containing a dump of the contacts details in their system which have changed in a certain period. You would then process this document, updating your details.
让第三方创建一个Web服务,其中包含一个xml文档,其中包含系统中已在特定时间段内更改的联系人详细信息的转储。然后,您将处理此文档,更新您的详细信息。
#3
1
Go with bulk transfers wherever possible.
尽可能使用批量转移。
Opening a TCP/IP connection is a VERY slow process. Transferring 20K contact ID's (even if the ID is huge) in a single WS request is a lot faster than 20K web service requests.
打开TCP / IP连接是一个非常缓慢的过程。在单个WS请求中传输20K联系ID(即使ID很大)比20K Web服务请求快得多。
You don't want to wait for them to reply. You have two kinds of potential workflows.
你不想等他们回复。您有两种潜在的工作流程。
-
Busy Waiting. You send a batch. You poll every few minutes until the batch is done. This is rather icky, but totally one-sided. You do all the work, they just respond with "not done yet" or "done". Then you request your batch results.
忙着等待。您发送批次。您每隔几分钟轮询一次,直到批处理完成。这是相当狡猾的,但完全是片面的。你做所有的工作,他们只是回答“尚未完成”或“完成”。然后您请求批处理结果。
-
Notification. You send a batch including some address at which you can be notified. It might be an email address, or it might be a URL (IP address, port and path) at which you want notification.
通知。您发送一个批次,包括一些可以通知您的地址。它可能是电子邮件地址,也可能是您想要通知的URL(IP地址,端口和路径)。
-
If you elect to use email (or similar) notification you can then do a proper WS request to retrieve the batch.
如果您选择使用电子邮件(或类似)通知,则可以执行正确的WS请求以检索批处理。
-
If you elect to use WS notification, you have a small web service that either gets a simple notification, and does a WS request to get the result, or they send you the entire result.
如果您选择使用WS通知,那么您将拥有一个小型Web服务,该服务可以获得简单的通知,并执行WS请求以获取结果,或者它们会向您发送整个结果。
-
-
If you want to be rude, open several hundred threads and make several hundred concurrent requests. This will take less elapsed time, but will swamp their server.
如果你想要粗鲁,打开几百个线程并发出几百个并发请求。这将花费更少的时间,但会淹没他们的服务器。
#4
0
You can ask for a file (XML) containing all the ID details from the central DB and you can get it converted in to some object mappings. Then having object list at your end you can compare and update accordingly.
您可以请求包含来自*数据库的所有ID详细信息的文件(XML),您可以将其转换为某些对象映射。然后在您的最后有对象列表,您可以相应地进行比较和更新。
#5
0
how about messaging? On the central database side, a messaging queue system can be created for storing db records update events, and there may be a topic for each client databases so that client can only subscribe to those interesting events. The queue will be responsible for notifying client of records update events.
消息怎么样?在*数据库端,可以创建用于存储db记录更新事件的消息传递队列系统,并且每个客户端数据库可能有一个主题,以便客户端只能订阅那些有趣的事件。队列将负责通知客户端记录更新事件。