同步常见分布式数据的最佳实践

I have a internet application that supports offline mode where users might create data that will be synchronized with the server when the user comes back online. So because of this I'm using UUID's for identity in my database so the disconnected clients can generate new objects without fear of using an ID used by another client, etc. However, while this works great for objects that are owned by this user there are objects that are shared by multiple users. For example, tags used by a user might be global, and there's no possible way the remote database could hold all possible tags in the universe.

我有一个支持离线模式的互联网应用程序，用户可以创建在用户重新联机时与服务器同步的数据。因此，我在我的数据库中使用UUID作为标识，因此断开连接的客户端可以生成新对象而不必担心使用其他客户端使用的ID等。但是，这对于此用户拥有的对象非常有用是多个用户共享的对象。例如，用户使用的标记可能是全局的，远程数据库无法保存Universe中所有可能的标记。

If an offline user creates an object and adds some tags to it. Let's say those tags don't exist on the user's local database so the software generates a UUID for them. Now when those tags are synchronized there would need to be resolution process to resolve any overlap. Some way to match up any existing tags in the remote database with the local versions.

如果脱机用户创建对象并向其添加一些标记。假设用户的本地数据库上不存在这些标记，因此软件会为它们生成UUID。现在，当这些标签同步时，需要通过解决过程来解决任何重叠。某种方法可以将远程数据库中的任何现有标记与本地版本进行匹配。

One way is to use some process by which global objects are resolved by a natural key (name in the case of a tag), and the local database has to replace it's existing object with this the one from the global database. This can be messy when there are many connections to other objects. Something tells me to avoid this.

一种方法是使用一些进程，通过自然键（在标记的情况下为名称）来解析全局对象，并且本地数据库必须用全局数据库中的对象替换它的现有对象。当与其他对象有许多连接时，这可能会很混乱。有事告诉我要避免这种情况。

Another way to handle this is to use two IDs. One global ID and one local ID. I was hoping using UUIDs would help avoid this, but I keep going back and forth between using a single UUID and using two split IDs. Using this option makes me wonder if I've let the problem get out of hand.

处理此问题的另一种方法是使用两个ID。一个全局ID和一个本地ID。我希望使用UUID可以帮助避免这种情况，但我会在使用单个UUID和使用两个拆分ID之间来回切换。使用这个选项让我想知道我是否让问题失控。

Another approach is to track all changes through the non-shared objects. In this example, the object the user assigned the tags. When the user synchronizes their offline changes the server might replace his local tag with the global one. The next time this client synchronizes with the server it detects a change in the non-shared object. When the client pulls down that object he'll receive the global tag. The software will simply resave the non-shared object pointing it to the server's tag and orphaning his local version. Some issues with this are extra round trips to fully synchronize, and extra data in the local database that is just orphaned. Are there other issues or bugs that could happen when the system is in between synchronization states? (i.e. trying to talk to the server and sending it local UUIDs for objects, etc).

另一种方法是通过非共享对象跟踪所有更改。在此示例中，用户分配标记的对象。当用户同步其脱机更改时，服务器可能会将其本地标记替换为全局标记。下次此客户端与服务器同步时，它会检测到非共享对象的更改。当客户端拉下该对象时，他将收到全局标记。该软件将简单地重新保存非共享对象，将其指向服务器的标签并孤立其本地版本。这方面的一些问题是完全同步的额外往返，以及刚刚孤立的本地数据库中的额外数据。当系统处于同步状态之间时，是否还会出现其他问题或错误？（即尝试与服务器通信并向对象发送本地UUID等）。

Another alternative is to avoid common objects. In my software that could be an acceptable answer. I'm not doing a lot of sharing of objects across users, but that doesn't mean I'd NOT be doing it in the future. Which means choosing this option could paralyze my software in the future should I need to add these types of features. There are consequences to this choice, and I'm not sure if I've completely explored them.

另一种选择是避免常见物体。在我的软件中，这可能是一个可接受的答案。我没有在用户之间进行大量的对象共享，但这并不意味着我将来不会这样做。这意味着如果我需要添加这些类型的功能，选择此选项可能会在将来使我的软件瘫痪。这个选择有后果，我不确定我是否已经完全探索过它们。

So I'm looking for any sort of best practice, existing algorithms for handling this type of system, guidance on choices, etc.

所以我正在寻找任何类型的最佳实践，处理此类系统的现有算法，选择指南等。

3 个解决方案

#1

Depend on what application semantics you want to offer to users, you may pick different solutions. E.g., if you are actually talking about tagging objects created by an offline user with a keyword, and wanting to share the tags across multiple objects created by different users, then using "text" for the tag is fine, as you suggested. Once everyone's changes are merged, tags with the same "text", like, say "THIS IS AWESOME", will be shared.

根据您要为用户提供的应用程序语义，您可以选择不同的解决方案。例如，如果您实际上是在谈论使用关键字标记离线用户创建的对象，并希望在不同用户创建的多个对象之间共享标记，那么使用“text”标记就可以了，如您所建议的那样。一旦合并了每个人的更改，将共享具有相同“文本”的标签，例如“这很棒”。

There are other ways to handle disconnected updates to shared objects. SVN, CVS, and other version control system try to resolve conflicts automatically, and when cannot, will just tell user there is a conflict. You can do the same, just tell user there have been concurrent updates and the users have to handle resolution.

还有其他方法可以处理对共享对象的断开连接更新。 SVN，CVS和其他版本控制系统尝试自动解决冲突，何时不能，只会告诉用户存在冲突。您也可以这样做，只需告诉用户有并发更新，用户必须处理解决方案。

Alternatively, you can also log updates as units of change, and try to compose the changes together. For example, if your shared object is a canvas, and your application semantics allows shared drawing on the same canvas, then a disconnected update that draws a line from point A to point B, and another disconnected update drawing a line from point C to point D, can be composed. In this case, if you keep those two updates as just two operations, you can order the two updates and on re-connection, each user uploads all its disconnected operations and applies missing operations from other users. You probably want some kind of ordering rule, perhaps based on version number.

或者，您也可以将更新记录为更改单位，并尝试将更改组合在一起。例如，如果您的共享对象是画布，并且您的应用程序语义允许在同一画布上共享绘图，那么从A点到B点绘制一条线的断开连接的更新，以及从C点到另一点绘制一条线的另一个断开连接的更新D，可以组成。在这种情况下，如果将这两个更新保留为两个操作，则可以订购这两个更新，并且在重新连接时，每个用户都会上载其所有断开连接的操作，并从其他用户应用缺少的操作。您可能需要某种排序规则，可能基于版本号。

Another alternative: if updates to shared objects cannot be automatically reconciled, and your application semantics does not support notifying user and asking user to resolve conflicts due to disconnected updates, then you can also use version tree to handle this. Each update to a shared object creates a new version, with past version as the parent. When there are disconnected updates to a shared object from two different users, two separate children versions/leaf nodes result from the same parent version. If your application's internal representation of state is this version tree, then your application's internal state remains consistent despite disconnected updates, and you can handle the two branches of the version tree in some other way (e.g. letting user know of branches and create tools for them to merge branches, as in source control systems).

另一种选择：如果无法自动协调对共享对象的更新，并且您的应用程序语义不支持通知用户并要求用户解决由于断开连接更新而导致的冲突，那么您还可以使用版本树来处理此问题。对共享对象的每次更新都会创建一个新版本，并将过去的版本作为父版本。当来自两个不同用户的共享对象的断开连接更新时，两个单独的子版本/叶节点来自相同的父版本。如果应用程序的内部状态表示形式是此版本树，那么尽管断开连接更新，应用程序的内部状态仍保持一致，您可以通过其他方式处理版本树的两个分支（例如，让用户知道分支并为其创建工具）合并分支，如源控制系统）。

Just a few options. Hope this helps.

只是几个选项。希望这可以帮助。

#2

As a totally out of left-field suggestion, I'm wondering if using something like CouchDB might work for your situation. Its replication features could handle a lot of your online/offline synchronisation problems for you, including mechanisms to allow the application to handle conflict resolution when it arises.

作为一个完全出于左场建议，我想知道使用像CouchDB这样的东西是否可以适用于你的情况。它的复制功能可以为您处理许多在线/离线同步问题，包括允许应用程序在出现时解决冲突的机制。

#3

Your problem is quite similar to versioning systems like SVN. You could take example from those.

您的问题与SVN等版本系统非常相似。你可以从那些例子。

Each user would have a set of personal objects plus any shared objects that they need. Locally, they will work as if they own the all the objects.

每个用户都有一组个人对象以及他们需要的任何共享对象。在本地，他们将像拥有所有对象一样工作。

During sync, the client would first download any changes in the objects, and automatically synchronize what is obvious. In your example, if there is a new tag coming from the server with the same name, then it would update the UUID correspondingly on the local system.

在同步期间，客户端将首先下载对象中的任何更改，并自动同步显而易见的内容。在您的示例中，如果来自服务器的新标记具有相同名称，则它将在本地系统上相应地更新UUID。

This would also be a nice place in which to detect and handle cases like data committed from another client, but by the same user.

这也是一个很好的地方，可以检测和处理从另一个客户端提交的数据，但是由同一个用户提交。

Once the client has an updated and merged version of the data, you can do an upload.

一旦客户端具有更新和合并的数据版本，您就可以进行上载。

There will be to round trips, but I see no way of doing this without overcomplicating the data structure and having potential pitfalls in the way you do the sync.

将会有往返，但我认为没有办法做到这一点，不会使数据结构过于复杂，并且在进行同步时会有潜在的陷阱。

#1