iPad -解析一个巨大的json文件(50到100 mb)

时间:2021-04-09 13:47:05

I'm trying to parse an extremely big json-File on an iPad. The filesize will vary between 50 and 100 mb (there is an initial file and there will be one new full set of data every month, which will be downloaded, parsed and saved into coredata)

我正在解析iPad上一个非常大的json文件。filesize的大小将在50到100 mb之间(有一个初始文件,每个月将有一组新的完整数据,这些数据将被下载、解析并保存到coredata中)

I'm building this app for a company as an Enterprise solution - the json file contains sensitive customerdata and it needs to be saved locally on the ipad so it will work even offline. It worked when the file was below 20mb, but now the set of data became bigger and I really need to parse it. I'm receiving memory warnings during parsing and after the third warning it just crashes. I have several different Core Data entities and I'm just setting all the values coming from the json file (when app is launched for the first time) and after everything is done, I'm doing the [context save].

我正在为一家公司构建这个应用程序作为企业解决方案——json文件包含敏感的customerdata,它需要在ipad上本地保存,这样它甚至可以离线工作。当文件小于20mb时,它可以工作,但是现在数据集变得更大了,我需要解析它。我在解析期间收到内存警告,第三次警告之后,它就崩溃了。我有几个不同的核心数据实体,我只是设置了所有来自json文件的值(当应用程序第一次启动时),在完成所有操作之后,我在做[context save]。

I was hoping somebody could give me some advice on how to handle such huge files. I was thinking about splitting the json file up to several smaller json files and maybe parsing them in multiple threads, but I don't know if that's the right approach. I guess one big problem is that the whole file is being held in the memory - maybe there's some way of "streaming" it into the memory or something like that?

我希望有人能就如何处理这么大的文件给我一些建议。我正在考虑将json文件拆分为几个更小的json文件,并可能在多个线程中解析它们,但我不知道这是不是正确的方法。我猜一个大问题是整个文件都被保存在内存中——也许有什么办法把它“流”到内存中去?

I'm using JSONKit (https://github.com/johnezang/JSONKit) for parsing the file, since I have read that it's the fastest one (maybe there's a slower one which goes easier on memory?).

我正在使用JSONKit (https://github.com/johnezang/JSONKit)解析文件,因为我读过它是最快的(可能有一个更慢的,内存更容易)。

Thanks in advance.

提前谢谢。

2 个解决方案

#1


18  

1) Write your data to a file, then use NSData's dataWithContentsOfFile:options:error: and specify the NSDataReadingMappedAlways and NSDataReadingUncached flags. This will tell the system to use mmap() to reduce the memory footprint, and not to burden the file system cache with blocks of memory (that makes it slower, but much less of a burden to iOS).

1)将数据写到一个文件中,然后使用NSData的dataWithContentsOfFile:options:error:并指定NSDataReadingMappedAlways和nsdatareadinguncache标记。这将告诉系统使用mmap()来减少内存占用,而不是使用内存块来增加文件系统缓存的负担(这会使其速度变慢,但对iOS的负担要小得多)。

2) You can use the YAJL SAX style JSON parser to get objects as they decode.

2)可以使用YAJL SAX风格的JSON解析器在对象解码时获取对象。

Note: I have not done 2) but have used the techniques embodied in 1).

注意:我没有做过2)但是已经使用了在1中体现的技术。

3) I ended up needed such a thing myself, and wrote SAX-JSON-Parser-ForStreamingData that can be tied to any asynchronous downloader (including my own).

3)最后我自己也需要这样的东西,并编写了SAX-JSON-Parser-ForStreamingData,可以绑定到任何异步下载器(包括我自己的)上。

#2


2  

Given the current memory constraints on a mobile device, it's likely impossible to parse 100 MB JSON text and then create a representation of a Foundation object which itself will take roughly 10 times the amount of RAM than the size of source JSON text.

考虑到移动设备上的当前内存限制,可能不可能解析100 MB JSON文本,然后创建一个Foundation对象的表示,该对象本身的RAM大小大约是源JSON文本大小的10倍。

That is, your JSON result would take about 1 GByte RAM in order to allocate the space required for the foundation objects.

也就是说,您的JSON结果将占用大约1 GByte RAM,以便为foundation对象分配所需的空间。

So, there is likely no way to create one gigantic JSON representation - no matter how you get and read and parse the input. You need to split it into many smaller ones. This may require a modification on the server side, though.

因此,无论如何获取、读取和解析输入,都不可能创建一个巨大的JSON表示。你需要把它分成许多小的。不过,这可能需要在服务器端进行修改。

Another solution is this, but much more elaborated:

另一种解决办法是,但要详细得多:

Use a SAX style parser, which takes the huge JSON as input via a streaming API and outputs several smaller JSON texts (the inner parts). The SAX style parser may use a Blocks API (dispatch lib) to pass its results - the smaller JSONs asynchronously to another JSON parser. That is, the smaller JSONs are fed a usual JSON parser which produces the JSON representations, which in turn are fed your CoreData Model Generator.

使用SAX样式解析器,它通过流API将巨大的JSON作为输入,并输出几个较小的JSON文本(内部部分)。SAX风格的解析器可以使用block API (dispatch lib)来传递结果——较小的JSONs异步传递给另一个JSON解析器。也就是说,给较小的JSONs提供一个通常的JSON解析器,该解析器生成JSON表示,然后再为CoreData模型生成器提供JSON表示。

You can even make it possible to download the huge JSON and parse it simultaneously with the SAX style parser, while simultaneously creating smaller JSONs and simultaneously storing them into Core Data.

您甚至可以下载巨大的JSON并与SAX样式解析器同时解析它,同时创建更小的JSONs并同时将它们存储到核心数据中。

What you need is a JSON parser with a SAX style API that can parse chunks of input text, performs fast, and can create a representation of Foundation objects.

您需要的是具有SAX风格API的JSON解析器,该API可以解析输入文本的块,执行速度快,并且可以创建基对象的表示。

I know only one JSON library which has this feature set, and there are even examples given which can partly show how you can accomplish exactly this: JPJson on GitHub. The parser is also very fast - on ARM it's faster than JSONKit. Caveat: its implementation is in C++ and requires a few steps to install it on a developer machine. It has a well documented Objective-C API, though.

我只知道有一个JSON库具有这个特性集,甚至有一些示例可以部分地展示如何实现这一点:GitHub上的JPJson。解析器也非常快——它比JSONKit快。注意:它的实现是用c++实现的,需要一些步骤才能在开发人员机器上安装它。不过,它有一个文档良好的Objective-C API。

Would like to add that I'm the author ;) An update is soon available, which utilizes latest C++11 compiler and C++11 library features resulting in even faster code (25% faster on ARM than JSONKit and twice as fast as NSJSONSerialization).

我想补充一点,我是作者;更新很快就可用了,它利用了最新的c++ 11编译器和c++ 11库特性,使代码更快(ARM比JSONKit快25%,是NSJSONSerialization的两倍)。

To give you same of the facts of the speed: The parser is able to download (over WiFi) and parse 25 MByte data containing 1000 JSONs (25 kByte each) in 7 seconds on Wifi 802.11g, and 4 seconds on Wifi 802.11n, including creating and releasing the 1000 representations on an iPad 2.

给你相同的事实速度:解析器能够下载(WiFi)和解析25兆字节数据包含1000 json(25每千字节)在无线802.11 g,7秒和4秒802.11 n WiFi,包括创建和释放1000交涉iPad 2。

#1


18  

1) Write your data to a file, then use NSData's dataWithContentsOfFile:options:error: and specify the NSDataReadingMappedAlways and NSDataReadingUncached flags. This will tell the system to use mmap() to reduce the memory footprint, and not to burden the file system cache with blocks of memory (that makes it slower, but much less of a burden to iOS).

1)将数据写到一个文件中,然后使用NSData的dataWithContentsOfFile:options:error:并指定NSDataReadingMappedAlways和nsdatareadinguncache标记。这将告诉系统使用mmap()来减少内存占用,而不是使用内存块来增加文件系统缓存的负担(这会使其速度变慢,但对iOS的负担要小得多)。

2) You can use the YAJL SAX style JSON parser to get objects as they decode.

2)可以使用YAJL SAX风格的JSON解析器在对象解码时获取对象。

Note: I have not done 2) but have used the techniques embodied in 1).

注意:我没有做过2)但是已经使用了在1中体现的技术。

3) I ended up needed such a thing myself, and wrote SAX-JSON-Parser-ForStreamingData that can be tied to any asynchronous downloader (including my own).

3)最后我自己也需要这样的东西,并编写了SAX-JSON-Parser-ForStreamingData,可以绑定到任何异步下载器(包括我自己的)上。

#2


2  

Given the current memory constraints on a mobile device, it's likely impossible to parse 100 MB JSON text and then create a representation of a Foundation object which itself will take roughly 10 times the amount of RAM than the size of source JSON text.

考虑到移动设备上的当前内存限制,可能不可能解析100 MB JSON文本,然后创建一个Foundation对象的表示,该对象本身的RAM大小大约是源JSON文本大小的10倍。

That is, your JSON result would take about 1 GByte RAM in order to allocate the space required for the foundation objects.

也就是说,您的JSON结果将占用大约1 GByte RAM,以便为foundation对象分配所需的空间。

So, there is likely no way to create one gigantic JSON representation - no matter how you get and read and parse the input. You need to split it into many smaller ones. This may require a modification on the server side, though.

因此,无论如何获取、读取和解析输入,都不可能创建一个巨大的JSON表示。你需要把它分成许多小的。不过,这可能需要在服务器端进行修改。

Another solution is this, but much more elaborated:

另一种解决办法是,但要详细得多:

Use a SAX style parser, which takes the huge JSON as input via a streaming API and outputs several smaller JSON texts (the inner parts). The SAX style parser may use a Blocks API (dispatch lib) to pass its results - the smaller JSONs asynchronously to another JSON parser. That is, the smaller JSONs are fed a usual JSON parser which produces the JSON representations, which in turn are fed your CoreData Model Generator.

使用SAX样式解析器,它通过流API将巨大的JSON作为输入,并输出几个较小的JSON文本(内部部分)。SAX风格的解析器可以使用block API (dispatch lib)来传递结果——较小的JSONs异步传递给另一个JSON解析器。也就是说,给较小的JSONs提供一个通常的JSON解析器,该解析器生成JSON表示,然后再为CoreData模型生成器提供JSON表示。

You can even make it possible to download the huge JSON and parse it simultaneously with the SAX style parser, while simultaneously creating smaller JSONs and simultaneously storing them into Core Data.

您甚至可以下载巨大的JSON并与SAX样式解析器同时解析它,同时创建更小的JSONs并同时将它们存储到核心数据中。

What you need is a JSON parser with a SAX style API that can parse chunks of input text, performs fast, and can create a representation of Foundation objects.

您需要的是具有SAX风格API的JSON解析器,该API可以解析输入文本的块,执行速度快,并且可以创建基对象的表示。

I know only one JSON library which has this feature set, and there are even examples given which can partly show how you can accomplish exactly this: JPJson on GitHub. The parser is also very fast - on ARM it's faster than JSONKit. Caveat: its implementation is in C++ and requires a few steps to install it on a developer machine. It has a well documented Objective-C API, though.

我只知道有一个JSON库具有这个特性集,甚至有一些示例可以部分地展示如何实现这一点:GitHub上的JPJson。解析器也非常快——它比JSONKit快。注意:它的实现是用c++实现的,需要一些步骤才能在开发人员机器上安装它。不过,它有一个文档良好的Objective-C API。

Would like to add that I'm the author ;) An update is soon available, which utilizes latest C++11 compiler and C++11 library features resulting in even faster code (25% faster on ARM than JSONKit and twice as fast as NSJSONSerialization).

我想补充一点,我是作者;更新很快就可用了,它利用了最新的c++ 11编译器和c++ 11库特性,使代码更快(ARM比JSONKit快25%,是NSJSONSerialization的两倍)。

To give you same of the facts of the speed: The parser is able to download (over WiFi) and parse 25 MByte data containing 1000 JSONs (25 kByte each) in 7 seconds on Wifi 802.11g, and 4 seconds on Wifi 802.11n, including creating and releasing the 1000 representations on an iPad 2.

给你相同的事实速度:解析器能够下载(WiFi)和解析25兆字节数据包含1000 json(25每千字节)在无线802.11 g,7秒和4秒802.11 n WiFi,包括创建和释放1000交涉iPad 2。