核心数据多线程导入(重复对象)

时间:2021-09-05 13:02:54

I have an NSOperationQueue that imports objects into Core Data that I get from a web api. Each operation has a private child managedObjectContext of my app's main managedObjectContext. Each operation takes the object to be imported and checks whether the object already exists in which case it updates the existing object. If the object doesn't exist it creates this new object. These changes on the private child contexts are then propagated up to the main managed object context.

我有一个NSOperationQueue,它将对象导入Core Data,这是我从web api获得的。每个操作都有我的应用程序的主要managedObjectContext的私有子managedObjectContext。每个操作都会获取要导入的对象,并检查该对象是否已存在,在哪种情况下它会更新现有对象。如果该对象不存在,则创建此新对象。然后,私有子上下文的这些更改将传播到主要托管对象上下文。

This setup has worked very well for me, but there is a duplicates issue.

这个设置对我来说非常好用,但是有一个重复的问题。

When I've got the same object being imported in two different concurrent operations I get duplicate objects that have the exact same data. (They both check to see if the object exists, and it doesn't appear to them to already exist). The reason i'll have 2 of the same objects importing at around the same time is that I'll often be processing a "new" api call as well as a "get" api call. Due to the concurrently asynchronous nature of my setup, it's hard to ensure that I won't ever have duplicate objects attempting to import.

当我在两个不同的并发操作中导入相同的对象时,我得到具有完全相同数据的重复对象。 (它们都检查对象是否存在,并且它们看起来不存在)。我将在同一时间导入2个相同对象的原因是我经常处理“新”api调用以及“获取”api调用。由于我的设置同时具有异步性,因此很难确保我不会尝试导入重复的对象。

So my question is what is the best way to solve this particular issue? I thought about limiting imports to max concurrent operations to 1 (I don't like this because of performance). Similarly I've considering requiring a save after every import operation and trying to handle merging of contexts. Also, i've considered grooming the data afterwards to occasionally clean up duplicates. And finally, i've considered just handling the duplicates on all fetch requests. But none of these solutions seem great to me, and perhaps there is an easy solution I've over looked.

所以我的问题是解决这个特定问题的最佳方法是什么?我考虑过限制导入到最大并发操作为1(由于性能,我不喜欢这样)。类似地,我考虑在每次导入操作之后要求保存并尝试处理上下文的合并。另外,我之后考虑过修饰数据以偶尔清理重复数据。最后,我考虑过只处理所有获取请求的重复项。但是这些解决方案对我来说都不是很好,也许我已经看过一个简单的解决方案了。

3 个解决方案

#1


4  

So the problem is:

所以问题是:

  • contexts are a scratchpad — unless and until you save, changes you make in them are not pushed to the persistent store;
  • 上下文是一个暂存器 - 除非你保存,你在其中所做的更改不会被推送到持久存储;
  • you want one context to be aware of changes made on another that hasn't yet been pushed.
  • 您希望一个上下文知道对另一个尚未推送的更改。

To me it doesn't sound like merging between contexts is going to work — contexts are not thread safe. Therefore for a merge to occur nothing else can be ongoing on the thread/queue of the other context. You're therefore never going to be able to eliminate the risk that a new object is inserted while another context is partway through its insertion process.

对我而言,上下文之间的合并听起来并不合适 - 上下文不是线程安全的。因此,对于合并发生,在其他上下文的线程/队列上不能进行任何其他操作。因此,当插入过程中途的另一个上下文时,您永远无法消除插入新对象的风险。

Additional observations:

补充意见:

  • SQLite is not thread safe in any practical sense;
  • 在任何实际意义上,SQLite都不是线程安全的;
  • hence all trips to the persistent store are serialised regardless of how you issue them.
  • 因此,无论您如何发布持久性商店的所有行程都会被序列化。

Bearing in mind the problem and the SQLite limitations, in my app we've adopted a framework whereby the web calls are naturally concurrent as per NSURLConnection, subsequent parsing of the results (JSON parsing plus some fishing into the result) occurs concurrently and then the find-or-create step is channeled into a serial queue.

考虑到问题和SQLite限制,在我的应用程序中,我们采用了一个框架,根据NSURLConnection,Web调用自然是并发的,后续解析结果(JSON解析加上一些捕获到结果中)同时发生,然后find-or-create步骤被引导到一个串行队列中。

Very little processing time is lost by the serialisation because the SQLite trips would be serialised anyway, and they're the overwhelming majority of the serialised stuff.

序列化消耗的处理时间非常短,因为无论如何SQLite行程都会被序列化,并且它们是绝大多数序列化的东西。

#2


2  

Start by creating dependences between your operations. Make sure one can't complete until its dependency does.

首先在您的操作之间创建依赖关系。确保在依赖之前无法完成。

Check out http://developer.apple.com/library/mac/documentation/Cocoa/Reference/NSOperation_class/Reference/Reference.html#//apple_ref/occ/instm/NSOperation/addDependency:

查看http://developer.apple.com/library/mac/documentation/Cocoa/Reference/NSOperation_class/Reference/Reference.html#//apple_ref/occ/instm/NSOperation/addDependency:

Each operation should call save when it finished. Next, I would try the Find-Or-Create methodology suggested here:

每个操作都应在完成后调用save。接下来,我将尝试在此处建议的Find-Or-Create方法:

https://developer.apple.com/library/ios/documentation/Cocoa/Conceptual/CoreData/Articles/cdImporting.html

https://developer.apple.com/library/ios/documentation/Cocoa/Conceptual/CoreData/Articles/cdImporting.html

It'll solve your duplicates problem, and can probably result in you doing less fetches (which are expensive and slow, thus drain battery quickly).

它可以解决你的重复问题,并且可能导致你做更少的提取(这是昂贵和缓慢,因此快速耗尽电池)。

You could also create a global child context to handle all of your imports, then merge the whole huge thing at the end, but it really comes down to how big the data set is and your memory considerations.

您还可以创建一个全局子上下文来处理所有导入,然后在最后合并整个巨大的东西,但它实际上取决于数据集的大小和内存考虑因素。

#3


2  

I've been struggling with the same issue for a while now. The discussion on this question so far has given me a few ideas, which I will share now.

我一直在努力解决同样的问题。到目前为止对这个问题的讨论给了我一些想法,我现在将分享。

Please note that this is essentially untested since in my case I only see this duplicate issue very rarely during testing and there's no obvious way for me to reproduce it easily.

请注意,这基本上没有经过测试,因为在我的情况下,我只是在测试期间很少看到这个重复的问题,并且我没有明显的方法可以轻松地重现它。

I have the same CoreData stack setup - A master MOC on a private queue, which has a child on the main queue and it used as the app's main context. Finally, bulk import operations (find-or-create) are passed off onto a third MOC using a background queue. Once the operation is complete saves are propagated up to the PSC.

我有相同的CoreData堆栈设置 - 私有队列上的主MOC,它在主队列上有一个子节点,它用作应用程序的主上下文。最后,使用后台队列将批量导入操作(查找或创建)传递到第三个MOC。操作完成后,保存将传播到PSC。

I've moved all my Core Data stack from the AppDelegate to a separate class (AppModel) that provides the app with access to the aggregate root object of the domain (the Player) and also a helper function for performing background operations on the model (performBlock:onSuccess:onError:).

我已将我的所有Core Data堆栈从AppDelegate移动到一个单独的类(AppModel),该类为应用程序提供对域的聚合根对象(Player)的访问权限,以及用于对模型执行后台操作的辅助函数( performBlock:的onSuccess:onerror的:)。

Luckily for me, all the major CoreData operations are funnelled through this method so if I can ensure that these operations are run serially then the duplicate problem should be solved.

对我来说幸运的是,所有主要的CoreData操作都是通过这种方法汇集的,所以如果我能确保这些操作是连续运行的,那么应该解决重复的问题。

- (void) performBlock: (void(^)(Player *player, NSManagedObjectContext *managedObjectContext)) operation onSuccess: (void(^)()) successCallback onError:(void(^)(id error)) errorCallback
{
    //Add this operation to the NSOperationQueue to ensure that 
    //duplicate records are not created in a multi-threaded environment
    [self.operationQueue addOperationWithBlock:^{

        NSManagedObjectContext *managedObjectContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];
        [managedObjectContext setUndoManager:nil];
        [managedObjectContext setParentContext:self.mainManagedObjectContext];

        [managedObjectContext performBlockAndWait:^{

            //Retrive a copy of the Player object attached to the new context
            id player = [managedObjectContext objectWithID:[self.player objectID]];
            //Execute the block operation
            operation(player, managedObjectContext);

            NSError *error = nil;
            if (![managedObjectContext save:&error])
            {
                //Call the error handler
                dispatch_async(dispatch_get_main_queue(), ^{
                    NSLog(@"%@", error);
                    if(errorCallback) return errorCallback(error);
                });
                return;
            }

            //Save the parent MOC (mainManagedObjectContext) - WILL BLOCK MAIN THREAD BREIFLY
            [managedObjectContext.parentContext performBlockAndWait:^{
                NSError *error = nil;
                if (![managedObjectContext.parentContext save:&error])
                {
                    //Call the error handler
                    dispatch_async(dispatch_get_main_queue(), ^{
                        NSLog(@"%@", error);
                        if(errorCallback) return errorCallback(error);
                    });
                    return;
                }
            }];

            //Attempt to clear any retain cycles created during operation
            [managedObjectContext reset];

            //Call the success handler
            dispatch_async(dispatch_get_main_queue(), ^{
                if (successCallback) return successCallback();
            });
        }];
    }];
}

What I've added here that I hope is going to resolve the issue for me is wrapping the whole thing in addOperationWithBlock. My operation queue is simply configured as follows:

我在这里添加的内容,我希望为我解决这个问题是将整个事情包装在addOperationWithBlock中。我的操作队列配置如下:

single.operationQueue = [[NSOperationQueue alloc] init];
[single.operationQueue setMaxConcurrentOperationCount:1];

In my API class, I might perform an import on my operation as follows:

在我的API类中,我可能会对我的操作执行导入,如下所示:

- (void) importUpdates: (id) methodResult onSuccess: (void (^)()) successCallback onError: (void (^)(id error)) errorCallback
{
    [_model performBlock:^(Player *player, NSManagedObjectContext *managedObjectContext) {
        //Perform bulk import for data in methodResult using the provided managedObjectContext
    } onSuccess:^{
        //Call the success handler
        dispatch_async(dispatch_get_main_queue(), ^{
            if (successCallback) return successCallback();
        });
    } onError:errorCallback];
}

Now with the NSOperationQueue in place it should no longer be possible for more than one batch operation to take place at the same time.

现在有了NSOperationQueue,不再可能同时进行多个批处理操作。

#1


4  

So the problem is:

所以问题是:

  • contexts are a scratchpad — unless and until you save, changes you make in them are not pushed to the persistent store;
  • 上下文是一个暂存器 - 除非你保存,你在其中所做的更改不会被推送到持久存储;
  • you want one context to be aware of changes made on another that hasn't yet been pushed.
  • 您希望一个上下文知道对另一个尚未推送的更改。

To me it doesn't sound like merging between contexts is going to work — contexts are not thread safe. Therefore for a merge to occur nothing else can be ongoing on the thread/queue of the other context. You're therefore never going to be able to eliminate the risk that a new object is inserted while another context is partway through its insertion process.

对我而言,上下文之间的合并听起来并不合适 - 上下文不是线程安全的。因此,对于合并发生,在其他上下文的线程/队列上不能进行任何其他操作。因此,当插入过程中途的另一个上下文时,您永远无法消除插入新对象的风险。

Additional observations:

补充意见:

  • SQLite is not thread safe in any practical sense;
  • 在任何实际意义上,SQLite都不是线程安全的;
  • hence all trips to the persistent store are serialised regardless of how you issue them.
  • 因此,无论您如何发布持久性商店的所有行程都会被序列化。

Bearing in mind the problem and the SQLite limitations, in my app we've adopted a framework whereby the web calls are naturally concurrent as per NSURLConnection, subsequent parsing of the results (JSON parsing plus some fishing into the result) occurs concurrently and then the find-or-create step is channeled into a serial queue.

考虑到问题和SQLite限制,在我的应用程序中,我们采用了一个框架,根据NSURLConnection,Web调用自然是并发的,后续解析结果(JSON解析加上一些捕获到结果中)同时发生,然后find-or-create步骤被引导到一个串行队列中。

Very little processing time is lost by the serialisation because the SQLite trips would be serialised anyway, and they're the overwhelming majority of the serialised stuff.

序列化消耗的处理时间非常短,因为无论如何SQLite行程都会被序列化,并且它们是绝大多数序列化的东西。

#2


2  

Start by creating dependences between your operations. Make sure one can't complete until its dependency does.

首先在您的操作之间创建依赖关系。确保在依赖之前无法完成。

Check out http://developer.apple.com/library/mac/documentation/Cocoa/Reference/NSOperation_class/Reference/Reference.html#//apple_ref/occ/instm/NSOperation/addDependency:

查看http://developer.apple.com/library/mac/documentation/Cocoa/Reference/NSOperation_class/Reference/Reference.html#//apple_ref/occ/instm/NSOperation/addDependency:

Each operation should call save when it finished. Next, I would try the Find-Or-Create methodology suggested here:

每个操作都应在完成后调用save。接下来,我将尝试在此处建议的Find-Or-Create方法:

https://developer.apple.com/library/ios/documentation/Cocoa/Conceptual/CoreData/Articles/cdImporting.html

https://developer.apple.com/library/ios/documentation/Cocoa/Conceptual/CoreData/Articles/cdImporting.html

It'll solve your duplicates problem, and can probably result in you doing less fetches (which are expensive and slow, thus drain battery quickly).

它可以解决你的重复问题,并且可能导致你做更少的提取(这是昂贵和缓慢,因此快速耗尽电池)。

You could also create a global child context to handle all of your imports, then merge the whole huge thing at the end, but it really comes down to how big the data set is and your memory considerations.

您还可以创建一个全局子上下文来处理所有导入,然后在最后合并整个巨大的东西,但它实际上取决于数据集的大小和内存考虑因素。

#3


2  

I've been struggling with the same issue for a while now. The discussion on this question so far has given me a few ideas, which I will share now.

我一直在努力解决同样的问题。到目前为止对这个问题的讨论给了我一些想法,我现在将分享。

Please note that this is essentially untested since in my case I only see this duplicate issue very rarely during testing and there's no obvious way for me to reproduce it easily.

请注意,这基本上没有经过测试,因为在我的情况下,我只是在测试期间很少看到这个重复的问题,并且我没有明显的方法可以轻松地重现它。

I have the same CoreData stack setup - A master MOC on a private queue, which has a child on the main queue and it used as the app's main context. Finally, bulk import operations (find-or-create) are passed off onto a third MOC using a background queue. Once the operation is complete saves are propagated up to the PSC.

我有相同的CoreData堆栈设置 - 私有队列上的主MOC,它在主队列上有一个子节点,它用作应用程序的主上下文。最后,使用后台队列将批量导入操作(查找或创建)传递到第三个MOC。操作完成后,保存将传播到PSC。

I've moved all my Core Data stack from the AppDelegate to a separate class (AppModel) that provides the app with access to the aggregate root object of the domain (the Player) and also a helper function for performing background operations on the model (performBlock:onSuccess:onError:).

我已将我的所有Core Data堆栈从AppDelegate移动到一个单独的类(AppModel),该类为应用程序提供对域的聚合根对象(Player)的访问权限,以及用于对模型执行后台操作的辅助函数( performBlock:的onSuccess:onerror的:)。

Luckily for me, all the major CoreData operations are funnelled through this method so if I can ensure that these operations are run serially then the duplicate problem should be solved.

对我来说幸运的是,所有主要的CoreData操作都是通过这种方法汇集的,所以如果我能确保这些操作是连续运行的,那么应该解决重复的问题。

- (void) performBlock: (void(^)(Player *player, NSManagedObjectContext *managedObjectContext)) operation onSuccess: (void(^)()) successCallback onError:(void(^)(id error)) errorCallback
{
    //Add this operation to the NSOperationQueue to ensure that 
    //duplicate records are not created in a multi-threaded environment
    [self.operationQueue addOperationWithBlock:^{

        NSManagedObjectContext *managedObjectContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];
        [managedObjectContext setUndoManager:nil];
        [managedObjectContext setParentContext:self.mainManagedObjectContext];

        [managedObjectContext performBlockAndWait:^{

            //Retrive a copy of the Player object attached to the new context
            id player = [managedObjectContext objectWithID:[self.player objectID]];
            //Execute the block operation
            operation(player, managedObjectContext);

            NSError *error = nil;
            if (![managedObjectContext save:&error])
            {
                //Call the error handler
                dispatch_async(dispatch_get_main_queue(), ^{
                    NSLog(@"%@", error);
                    if(errorCallback) return errorCallback(error);
                });
                return;
            }

            //Save the parent MOC (mainManagedObjectContext) - WILL BLOCK MAIN THREAD BREIFLY
            [managedObjectContext.parentContext performBlockAndWait:^{
                NSError *error = nil;
                if (![managedObjectContext.parentContext save:&error])
                {
                    //Call the error handler
                    dispatch_async(dispatch_get_main_queue(), ^{
                        NSLog(@"%@", error);
                        if(errorCallback) return errorCallback(error);
                    });
                    return;
                }
            }];

            //Attempt to clear any retain cycles created during operation
            [managedObjectContext reset];

            //Call the success handler
            dispatch_async(dispatch_get_main_queue(), ^{
                if (successCallback) return successCallback();
            });
        }];
    }];
}

What I've added here that I hope is going to resolve the issue for me is wrapping the whole thing in addOperationWithBlock. My operation queue is simply configured as follows:

我在这里添加的内容,我希望为我解决这个问题是将整个事情包装在addOperationWithBlock中。我的操作队列配置如下:

single.operationQueue = [[NSOperationQueue alloc] init];
[single.operationQueue setMaxConcurrentOperationCount:1];

In my API class, I might perform an import on my operation as follows:

在我的API类中,我可能会对我的操作执行导入,如下所示:

- (void) importUpdates: (id) methodResult onSuccess: (void (^)()) successCallback onError: (void (^)(id error)) errorCallback
{
    [_model performBlock:^(Player *player, NSManagedObjectContext *managedObjectContext) {
        //Perform bulk import for data in methodResult using the provided managedObjectContext
    } onSuccess:^{
        //Call the success handler
        dispatch_async(dispatch_get_main_queue(), ^{
            if (successCallback) return successCallback();
        });
    } onError:errorCallback];
}

Now with the NSOperationQueue in place it should no longer be possible for more than one batch operation to take place at the same time.

现在有了NSOperationQueue,不再可能同时进行多个批处理操作。