读取file到byte []数组时的java.lang.OutOfMemoryError

时间:2022-08-26 15:25:16

Is there a cleaner and faster way to do this:

是否有更清洁,更快速的方法:

BufferedReader inputReader = new BufferedReader(new InputStreamReader(context.openFileInput("data.txt")));
String inputString;
StringBuilder stringBuffer = new StringBuilder();
while ((inputString = inputReader.readLine()) != null) {
    stringBuffer.append(inputString + "\n");
}
text = stringBuffer.toString();
byte[] data = text.getBytes();

Basically I'm trying to convert a file into byte[], except if the file is large enough then I run into an outofmemory error. I've been looking around SO for a solution, I tried to do this here, and it didn't work. Any help would be appreciated.

基本上我正在尝试将文件转换为byte [],除非文件足够大,否则我会遇到outofmemory错误。我一直在寻找解决方案,我试图在这里做到这一点,但它没有用。任何帮助,将不胜感激。

7 个解决方案

#1


5  

Few suggestions:

几点建议:

  1. You don't need to create string builder. You can directly read bytes from the file.
  2. 您不需要创建字符串构建器。您可以直接从文件中读取字节。
  3. If you read multiple files, check for those byte[] arrays remaining in memory even when not required.
  4. 如果您读取多个文件,请检查即使不需要,仍保留在内存中的byte []数组。
  5. Lastly increase the maximum memory for your java process using -Xmx option.
  6. 最后使用-Xmx选项增加java进程的最大内存。

#2


3  

As we know the size of this file, somewhat half of the memory can be saved by allocating the byte array of the given size directly rather than expanding it:

我们知道这个文件的大小,通过直接分配给定大小的字节数组而不是扩展它,可以节省一半的内存:

byte [] data = new byte[ (int) file.length() ];
FileInputStream fin = new FileInputStream(file);
int n = 0;
while ( (n = fin.read(data, n, data.length() - n) ) > 0);

This will avoid allocating unnecessary additional structures. The byte array is only allocated once and has the correct size from beginning. The while loop ensures all data are loaded ( read(byte[], offset, length) may read only part of file but returns the number of bytes read).

这将避免分配不必要的额外结构。字节数组只分配一次,从头开始具有正确的大小。 while循环确保加载所有数据(read(byte [],offset,length)可能只读取文件的一部分但返回读取的字节数)。

Clarification: When the StringBuilder runs out, it allocates a new buffer that is the two times larger than the initial buffer. At this moment, we are using about twice the amount of memory that would be minimally required. In the most degenerate case (one last byte does not fit into some already big buffer), near three times the minimal amount of RAM may be required.

澄清:当StringBuilder用完时,它会分配一个比初始缓冲区大两倍的新缓冲区。此时,我们使用的内存量大约是最低要求的两倍。在最简并的情况下(最后一个字节不适合某些已经很大的缓冲区),可能需要接近最小RAM量的三倍。

#3


2  

If you haven't enough memory to store there whole file, you can try rethink your algorithm to process file data while reading it, without constructing large byte[] array data.

如果没有足够的内存来存储整个文件,您可以尝试重新考虑算法以在读取时处理文件数据,而无需构建大型byte []数组数据。

If you have already tried increase java memory by playing with -Xmx parameter, then there isn't any solution, which will allow you store data in memory, which can not be located there due to its large size.

如果你已经尝试通过使用-Xmx参数来增加java内存,那么就没有任何解决方案,这将允许你将数据存储在内存中,由于它的大小,它不能位于那里。

#4


0  

This is similar to File to byte[] in Java

这类似于Java中的File to byte []

You're currently reading in bytes, converting them to characters, and then trying to turn them back into bytes. From the InputStreamReader class in the Java API:

您当前正在读取字节数,将它们转换为字符,然后尝试将它们转换回字节。从Java API中的InputStreamReader类:

An InputStreamReader is a bridge from byte streams to character streams: It reads bytes and decodes them into characters..

InputStreamReader是从字节流到字符流的桥接器:它读取字节并将它们解码为字符。

It would be way more efficient to just read in bytes.

以字节为单位读取会更有效率。

One way would be to use a ByteArrayInputStream directly on context.openFileInput(), or the Jakarta Commons IOUtils.toByteArray(InputStream), or if you're using JDK7 you can use Files.readAllBytes(Path).

一种方法是直接在context.openFileInput()或Jakarta Commons IOUtils.toByteArray(InputStream)上使用ByteArrayInputStream,或者如果您使用的是JDK7,则可以使用Files.readAllBytes(Path)。

#5


0  

The 'cleaner and faster way' is not to do it at all. It doesn't scale. Process the file a piece at a time.

“更清洁,更快捷的方式”根本就不是这样做的。它不会扩展。一次处理文件。

#6


0  

You are copying bytes into char (which use twice the space) and back into bytes again.

您正在将字节复制到char(使用两倍的空间)并再次返回字节。

InputStream in = context.openFileInput("data.txt");
ByteArrayOutputStream bais = new ByteArrayOutputStream();
byte[] bytes = new byte[8192];
for(int len; (lne = in.read(bytes) > 0;)
   bais.write(bytes, 0, len);
in.close();
return bais.toByteArray();

This will half your memory requirement but it can still mean you run out of memory. In this case you have to either

这将是你的内存需求的一半,但它仍然意味着你的内存不足。在这种情况下,你必须要么

  • increase your maximum heap size
  • 增加最大堆大小
  • process the file progressively instead of all at once
  • 逐步处理文件而不是一次处理所有文件
  • use memory mapped files which allows you to "load" a file without using much heap.
  • 使用内存映射文件,允许您“加载”文件而不使用太多堆。

#7


-1  

This solution will test the free memory before loading...

此解决方案将在加载前测试可用内存...

File test = new File("c:/tmp/example.txt");

    long freeMemory = Runtime.getRuntime().freeMemory();
    if(test.length()<freeMemory) {
        byte[] bytes = new byte[(int) test.length()];
        FileChannel fc = new FileInputStream(test).getChannel();
        MappedByteBuffer mbb = fc.map(FileChannel.MapMode.READ_ONLY, 0, (int) fc.size());

        while(mbb.hasRemaining()) {
            mbb.get(bytes);
        }
        fc.close();
    }

#1


5  

Few suggestions:

几点建议:

  1. You don't need to create string builder. You can directly read bytes from the file.
  2. 您不需要创建字符串构建器。您可以直接从文件中读取字节。
  3. If you read multiple files, check for those byte[] arrays remaining in memory even when not required.
  4. 如果您读取多个文件,请检查即使不需要,仍保留在内存中的byte []数组。
  5. Lastly increase the maximum memory for your java process using -Xmx option.
  6. 最后使用-Xmx选项增加java进程的最大内存。

#2


3  

As we know the size of this file, somewhat half of the memory can be saved by allocating the byte array of the given size directly rather than expanding it:

我们知道这个文件的大小,通过直接分配给定大小的字节数组而不是扩展它,可以节省一半的内存:

byte [] data = new byte[ (int) file.length() ];
FileInputStream fin = new FileInputStream(file);
int n = 0;
while ( (n = fin.read(data, n, data.length() - n) ) > 0);

This will avoid allocating unnecessary additional structures. The byte array is only allocated once and has the correct size from beginning. The while loop ensures all data are loaded ( read(byte[], offset, length) may read only part of file but returns the number of bytes read).

这将避免分配不必要的额外结构。字节数组只分配一次,从头开始具有正确的大小。 while循环确保加载所有数据(read(byte [],offset,length)可能只读取文件的一部分但返回读取的字节数)。

Clarification: When the StringBuilder runs out, it allocates a new buffer that is the two times larger than the initial buffer. At this moment, we are using about twice the amount of memory that would be minimally required. In the most degenerate case (one last byte does not fit into some already big buffer), near three times the minimal amount of RAM may be required.

澄清:当StringBuilder用完时,它会分配一个比初始缓冲区大两倍的新缓冲区。此时,我们使用的内存量大约是最低要求的两倍。在最简并的情况下(最后一个字节不适合某些已经很大的缓冲区),可能需要接近最小RAM量的三倍。

#3


2  

If you haven't enough memory to store there whole file, you can try rethink your algorithm to process file data while reading it, without constructing large byte[] array data.

如果没有足够的内存来存储整个文件,您可以尝试重新考虑算法以在读取时处理文件数据,而无需构建大型byte []数组数据。

If you have already tried increase java memory by playing with -Xmx parameter, then there isn't any solution, which will allow you store data in memory, which can not be located there due to its large size.

如果你已经尝试通过使用-Xmx参数来增加java内存,那么就没有任何解决方案,这将允许你将数据存储在内存中,由于它的大小,它不能位于那里。

#4


0  

This is similar to File to byte[] in Java

这类似于Java中的File to byte []

You're currently reading in bytes, converting them to characters, and then trying to turn them back into bytes. From the InputStreamReader class in the Java API:

您当前正在读取字节数,将它们转换为字符,然后尝试将它们转换回字节。从Java API中的InputStreamReader类:

An InputStreamReader is a bridge from byte streams to character streams: It reads bytes and decodes them into characters..

InputStreamReader是从字节流到字符流的桥接器:它读取字节并将它们解码为字符。

It would be way more efficient to just read in bytes.

以字节为单位读取会更有效率。

One way would be to use a ByteArrayInputStream directly on context.openFileInput(), or the Jakarta Commons IOUtils.toByteArray(InputStream), or if you're using JDK7 you can use Files.readAllBytes(Path).

一种方法是直接在context.openFileInput()或Jakarta Commons IOUtils.toByteArray(InputStream)上使用ByteArrayInputStream,或者如果您使用的是JDK7,则可以使用Files.readAllBytes(Path)。

#5


0  

The 'cleaner and faster way' is not to do it at all. It doesn't scale. Process the file a piece at a time.

“更清洁,更快捷的方式”根本就不是这样做的。它不会扩展。一次处理文件。

#6


0  

You are copying bytes into char (which use twice the space) and back into bytes again.

您正在将字节复制到char(使用两倍的空间)并再次返回字节。

InputStream in = context.openFileInput("data.txt");
ByteArrayOutputStream bais = new ByteArrayOutputStream();
byte[] bytes = new byte[8192];
for(int len; (lne = in.read(bytes) > 0;)
   bais.write(bytes, 0, len);
in.close();
return bais.toByteArray();

This will half your memory requirement but it can still mean you run out of memory. In this case you have to either

这将是你的内存需求的一半,但它仍然意味着你的内存不足。在这种情况下,你必须要么

  • increase your maximum heap size
  • 增加最大堆大小
  • process the file progressively instead of all at once
  • 逐步处理文件而不是一次处理所有文件
  • use memory mapped files which allows you to "load" a file without using much heap.
  • 使用内存映射文件,允许您“加载”文件而不使用太多堆。

#7


-1  

This solution will test the free memory before loading...

此解决方案将在加载前测试可用内存...

File test = new File("c:/tmp/example.txt");

    long freeMemory = Runtime.getRuntime().freeMemory();
    if(test.length()<freeMemory) {
        byte[] bytes = new byte[(int) test.length()];
        FileChannel fc = new FileInputStream(test).getChannel();
        MappedByteBuffer mbb = fc.map(FileChannel.MapMode.READ_ONLY, 0, (int) fc.size());

        while(mbb.hasRemaining()) {
            mbb.get(bytes);
        }
        fc.close();
    }