I was wondering if there is a "correct" way to parse a JSON file using Jackson where the JSON file contains a property that is huge without loading the entire stream into memory. I need to keep the memory low since it's an Android app. Am not asking here how to Android: Parsing large JSON file but rather one property is really large and the others don't matter.
我想知道是否有一种“正确”的方法来解析使用Jackson的JSON文件,其中JSON文件包含一个巨大的属性而不将整个流加载到内存中。我需要保持低内存,因为它是一个Android应用程序。我不是在这里问如何Android:解析大型JSON文件而是一个属性非常大而其他属性并不重要。
For instance, let's say i have the following :
例如,假设我有以下内容:
{
"filename": "afilename.jpg",
"data": "**Huge data here, about 20Mb base64 string**",
"mime": "mimeType",
"otherProperties": "..."
}
The data property could be extracted to a new file if needed (via an outputstream or other meanings) but i don't manage to achieve this using Jackson. Am open to use other libraries i just thought jackson would be ideal thanks to it's streaming API.
如果需要,可以将数据属性提取到新文件(通过输出流或其他含义),但我无法使用Jackson实现此目的。我打算使用其他库我只是认为杰克逊是理想的,这要归功于它的流媒体API。
Thanks
谢谢
2 个解决方案
#1
3
Finally I manage to recover my huge data like this, where in
is an inputstream over the json file i want to parse the data from and out
is the file where am gonna write my data to:
最后我设法恢复这样的大数据,其中in是json文件的输入流我想要解析数据来回输出是我要将数据写入的文件:
public boolean extrationContenuDocument(FileInputStream in, FileOutputStream out, FileInfo info)
throws JsonParseException, IOException {
SerializedString keyDocContent = new SerializedString("data");
boolean isDone = false;
JsonParser jp = this.jsonFactory.createJsonParser(in);
// Let's move our inputstream cursor until the 'data' property is found
while (!jp.nextFieldName(keyDocContent)) {
Log.v("Traitement JSON", "Searching for 'data' property ...");
}
// Found it? Ok, move the inputstream cursor until the begining of it's
// content
JsonToken current = jp.nextToken();
// if the current token is not String value it means u didn't found the
// 'data' property or it's content is not a correct => stop
if (current == JsonToken.VALUE_STRING) {
Log.v("Traitement JSON", "Property 'data' found");
// Here it gets a little tricky cause if the file is not big enough
// all the content of the 'data' property could be read directly
// insted of using this
if (info.getSize() > TAILLE_MIN_PETIT_FICHER) {
Log.v("Traitement JSON", "the content of 'data' is too big to be read directly -> using buffered reading");
// JsonParser uses a buffer to read, there is some data that
// could have been read by it, i need to fetch it
ByteArrayOutputStream debutDocStream = new ByteArrayOutputStream();
int premierePartieRead = jp.releaseBuffered(debutDocStream);
byte[] debutDoc = debutDocStream.toByteArray();
// Write the head of the content of the 'data' property, this is
// actually what as read from the inputstream by the JsonParser
// when did jp.nextToken()
Log.v("Traitement JSON", "Write the head");
out.write(debutDoc);
// Now we need to write the rest until we find the tail of the
// content of the 'data' property
Log.v("Traitement JSON", "Write the middle");
// So i prepare a buffer to continue reading the inputstream
byte[] buffer = new byte[TAILLE_BUFFER_GROS_FICHER];
// The escape char that determines where to stop reading will be "
byte endChar = (byte) '"';
// Fetch me some bytes from the inputstream
int bytesRead = in.read(buffer);
int bytesBeforeEndChar = 0;
int deuxiemePartieRead = 0;
boolean isDocContentFin = false;
// Are we at the end of the 'data' property? Keep writing the
// content of the 'data' property if it's not the case
while ((bytesRead > 0) && !isDocContentFin) {
bytesBeforeEndChar = 0;
// Since am using a buffer the escape char could be in the
// middle of it, gotta look if it is
for (byte b : buffer) {
if (b != endChar) {
bytesBeforeEndChar++;
} else {
isDocContentFin = true;
break;
}
}
if (bytesRead > bytesBeforeEndChar) {
Log.v("Traitement JSON", "Write the tail");
out.write(buffer, 0, bytesBeforeEndChar);
deuxiemePartieRead += bytesBeforeEndChar;
} else {
out.write(buffer, 0, bytesRead);
deuxiemePartieRead += bytesRead;
}
bytesRead = in.read(buffer);
}
Log.v("Traitement JSON", "Bytes read: " + (premierePartieRead + deuxiemePartieRead) + " (" + premierePartieRead + " head,"
+ deuxiemePartieRead + " tail)");
isDone = true;
} else {
Log.v("Traitement JSON", "File is small enough to be read directly");
String contenuFichier = jp.getText();
out.write(contenuFichier.getBytes());
isDone = true;
}
} else {
throw new JsonParseException("The property " + keyDocContent.getValue() + " couldn't be found in the Json Stream.", null);
}
jp.close();
return isDone;
}
It's not pretty, but works like a charm! @staxman let me know what you think.
它不漂亮,但像魅力一样! @staxman让我知道你的想法。
Edit :
This is now an implemented feature , see : https://github.com/FasterXML/jackson-core/issues/14 and JsonParser.readBinaryValue()
现在这是一个实现的功能,请参阅:https://github.com/FasterXML/jackson-core/issues/14和JsonParser.readBinaryValue()
#2
1
EDIT: This is not a good answer for this question -- it would work if sub-trees were objects to bind, but NOT when the issue is a single big Base64-encoded String.
编辑:对于这个问题,这不是一个好的答案 - 如果子树是要绑定的对象,它将起作用,但是当问题是单个大的Base64编码的字符串时不行。
If I understand the question correctly, yes, you can read file incrementally but still you data-binding, if your input consists of a sequence of JSON Objects or arrays.
如果我正确理解了这个问题,是的,如果您的输入包含一系列JSON对象或数组,那么您可以逐步读取文件但仍然是数据绑定。
If so, you can use JsonParser
to advance stream to point to the first object (its START_OBJECT token), and then use data-binding methods in either JsonParser
(JsonParser.readValueAs()
) or ObjectMapper
(ObjectMapper.readValue(JsonParser, type)
).
如果是这样,您可以使用JsonParser将流提前指向第一个对象(其START_OBJECT标记),然后在JsonParser(JsonParser.readValueAs())或ObjectMapper(ObjectMapper.readValue(JsonParser,type)中使用数据绑定方法)。
Something like:
就像是:
ObjectMapper mapper = new ObjectMapper();
JsonParser jp = mapper.getJsonFactory().createJsonParser(new File("file.json"));
while (jp.nextToken() != null) {
MyPojo pojo = jp.readValueAs(MyPojo.class);
// do something
}
(note: depending on exact structure of JSON, you may need to skip some elements -- when calling readValueAs(), parser must have received START_ELEMENT that starts JSON Object to bind).
(注意:根据JSON的确切结构,您可能需要跳过一些元素 - 在调用readValueAs()时,解析器必须已经收到启动JSON对象的START_ELEMENT以进行绑定)。
Or, even simpler, you may be able to use method readValues
in ObjectReader
:
或者,甚至更简单,您可以在ObjectReader中使用方法readValues:
ObjectReader r = mapper.reader(MyPojo.class);
MappingIterator<MyPojo> it = r.readValues(new File("file.json"));
while (it.hasNextValue()) {
MyPojo pojo. = it.nextValue();
// do something with it
}
in both cases Jackson data binder only reads as many JSON tokens as necessary to produce a single Object (MyPojo or whatever type you have). JsonParser
itself only needs enough memory to contain information on a single JSON Token.
在这两种情况下,Jackson数据绑定器只读取生成单个Object(MyPojo或您拥有的任何类型)所需的JSON令牌。 JsonParser本身只需要足够的内存来包含单个JSON令牌的信息。
#1
3
Finally I manage to recover my huge data like this, where in
is an inputstream over the json file i want to parse the data from and out
is the file where am gonna write my data to:
最后我设法恢复这样的大数据,其中in是json文件的输入流我想要解析数据来回输出是我要将数据写入的文件:
public boolean extrationContenuDocument(FileInputStream in, FileOutputStream out, FileInfo info)
throws JsonParseException, IOException {
SerializedString keyDocContent = new SerializedString("data");
boolean isDone = false;
JsonParser jp = this.jsonFactory.createJsonParser(in);
// Let's move our inputstream cursor until the 'data' property is found
while (!jp.nextFieldName(keyDocContent)) {
Log.v("Traitement JSON", "Searching for 'data' property ...");
}
// Found it? Ok, move the inputstream cursor until the begining of it's
// content
JsonToken current = jp.nextToken();
// if the current token is not String value it means u didn't found the
// 'data' property or it's content is not a correct => stop
if (current == JsonToken.VALUE_STRING) {
Log.v("Traitement JSON", "Property 'data' found");
// Here it gets a little tricky cause if the file is not big enough
// all the content of the 'data' property could be read directly
// insted of using this
if (info.getSize() > TAILLE_MIN_PETIT_FICHER) {
Log.v("Traitement JSON", "the content of 'data' is too big to be read directly -> using buffered reading");
// JsonParser uses a buffer to read, there is some data that
// could have been read by it, i need to fetch it
ByteArrayOutputStream debutDocStream = new ByteArrayOutputStream();
int premierePartieRead = jp.releaseBuffered(debutDocStream);
byte[] debutDoc = debutDocStream.toByteArray();
// Write the head of the content of the 'data' property, this is
// actually what as read from the inputstream by the JsonParser
// when did jp.nextToken()
Log.v("Traitement JSON", "Write the head");
out.write(debutDoc);
// Now we need to write the rest until we find the tail of the
// content of the 'data' property
Log.v("Traitement JSON", "Write the middle");
// So i prepare a buffer to continue reading the inputstream
byte[] buffer = new byte[TAILLE_BUFFER_GROS_FICHER];
// The escape char that determines where to stop reading will be "
byte endChar = (byte) '"';
// Fetch me some bytes from the inputstream
int bytesRead = in.read(buffer);
int bytesBeforeEndChar = 0;
int deuxiemePartieRead = 0;
boolean isDocContentFin = false;
// Are we at the end of the 'data' property? Keep writing the
// content of the 'data' property if it's not the case
while ((bytesRead > 0) && !isDocContentFin) {
bytesBeforeEndChar = 0;
// Since am using a buffer the escape char could be in the
// middle of it, gotta look if it is
for (byte b : buffer) {
if (b != endChar) {
bytesBeforeEndChar++;
} else {
isDocContentFin = true;
break;
}
}
if (bytesRead > bytesBeforeEndChar) {
Log.v("Traitement JSON", "Write the tail");
out.write(buffer, 0, bytesBeforeEndChar);
deuxiemePartieRead += bytesBeforeEndChar;
} else {
out.write(buffer, 0, bytesRead);
deuxiemePartieRead += bytesRead;
}
bytesRead = in.read(buffer);
}
Log.v("Traitement JSON", "Bytes read: " + (premierePartieRead + deuxiemePartieRead) + " (" + premierePartieRead + " head,"
+ deuxiemePartieRead + " tail)");
isDone = true;
} else {
Log.v("Traitement JSON", "File is small enough to be read directly");
String contenuFichier = jp.getText();
out.write(contenuFichier.getBytes());
isDone = true;
}
} else {
throw new JsonParseException("The property " + keyDocContent.getValue() + " couldn't be found in the Json Stream.", null);
}
jp.close();
return isDone;
}
It's not pretty, but works like a charm! @staxman let me know what you think.
它不漂亮,但像魅力一样! @staxman让我知道你的想法。
Edit :
This is now an implemented feature , see : https://github.com/FasterXML/jackson-core/issues/14 and JsonParser.readBinaryValue()
现在这是一个实现的功能,请参阅:https://github.com/FasterXML/jackson-core/issues/14和JsonParser.readBinaryValue()
#2
1
EDIT: This is not a good answer for this question -- it would work if sub-trees were objects to bind, but NOT when the issue is a single big Base64-encoded String.
编辑:对于这个问题,这不是一个好的答案 - 如果子树是要绑定的对象,它将起作用,但是当问题是单个大的Base64编码的字符串时不行。
If I understand the question correctly, yes, you can read file incrementally but still you data-binding, if your input consists of a sequence of JSON Objects or arrays.
如果我正确理解了这个问题,是的,如果您的输入包含一系列JSON对象或数组,那么您可以逐步读取文件但仍然是数据绑定。
If so, you can use JsonParser
to advance stream to point to the first object (its START_OBJECT token), and then use data-binding methods in either JsonParser
(JsonParser.readValueAs()
) or ObjectMapper
(ObjectMapper.readValue(JsonParser, type)
).
如果是这样,您可以使用JsonParser将流提前指向第一个对象(其START_OBJECT标记),然后在JsonParser(JsonParser.readValueAs())或ObjectMapper(ObjectMapper.readValue(JsonParser,type)中使用数据绑定方法)。
Something like:
就像是:
ObjectMapper mapper = new ObjectMapper();
JsonParser jp = mapper.getJsonFactory().createJsonParser(new File("file.json"));
while (jp.nextToken() != null) {
MyPojo pojo = jp.readValueAs(MyPojo.class);
// do something
}
(note: depending on exact structure of JSON, you may need to skip some elements -- when calling readValueAs(), parser must have received START_ELEMENT that starts JSON Object to bind).
(注意:根据JSON的确切结构,您可能需要跳过一些元素 - 在调用readValueAs()时,解析器必须已经收到启动JSON对象的START_ELEMENT以进行绑定)。
Or, even simpler, you may be able to use method readValues
in ObjectReader
:
或者,甚至更简单,您可以在ObjectReader中使用方法readValues:
ObjectReader r = mapper.reader(MyPojo.class);
MappingIterator<MyPojo> it = r.readValues(new File("file.json"));
while (it.hasNextValue()) {
MyPojo pojo. = it.nextValue();
// do something with it
}
in both cases Jackson data binder only reads as many JSON tokens as necessary to produce a single Object (MyPojo or whatever type you have). JsonParser
itself only needs enough memory to contain information on a single JSON Token.
在这两种情况下,Jackson数据绑定器只读取生成单个Object(MyPojo或您拥有的任何类型)所需的JSON令牌。 JsonParser本身只需要足够的内存来包含单个JSON令牌的信息。