使用Bytebuffers和NIO时如何避免OutOfMemoryError?

时间:2022-02-11 06:56:47

I'm using ByteBuffers and FileChannels to write binary data to a file. When doing that for big files or successively for multiple files, I get an OutOfMemoryError exception. I've read elsewhere that using Bytebuffers with NIO is broken and should be avoided. Does any of you already faced this kind of problem and found a solution to efficiently save large amounts of binary data in a file in java?

我正在使用ByteBuffers和FileChannels将二进制数据写入文件。当为大文件或连续多个文件执行此操作时,我得到OutOfMemoryError异常。我在其他地方读过,使用带有NIO的Bytebuffers会被破坏,应该避免。你们中是否有人遇到过这种问题,并找到了一个有效地在java文件中保存大量二进制数据的解决方案?

Is the jvm option -XX:MaxDirectMemorySize the way to go?

是jvm选项-XX:MaxDirectMemorySize的方法吗?

6 个解决方案

#1


6  

I would say don't create a huge ByteBuffer that contains ALL of the data at once. Create a much smaller ByteBuffer, fill it with data, then write this data to the FileChannel. Then reset the ByteBuffer and continue until all the data is written.

我想说不要创建一个包含所有数据的巨大ByteBuffer。创建一个小得多的ByteBuffer,用数据填充它,然后将这些数据写入FileChannel。然后重置ByteBuffer并继续,直到写入所有数据。

#2


4  

Check out Java's Mapped Byte Buffers, also known as 'direct buffers'. Basically, this mechanism uses the OS's virtual memory paging system to 'map' your buffer directly to disk. The OS will manage moving the bytes to/from disk and memory auto-magically, very quickly, and you won't have to worry about changing your virtual machine options. This will also allow you to take advantage of NIO's improved performance over traditional java stream-based i/o, without any weird hacks.

查看Java的映射字节缓冲区,也称为“直接缓冲区”。基本上,这种机制使用操作系统的虚拟内存分页系统将缓冲区“直接”映射到磁盘。操作系统将自动,非常快速地管理磁盘和内存中的字节,您不必担心更改虚拟机选项。这也将使您能够利用NIO相对于传统基于Java流的i / o的改进性能,而不会出现任何奇怪的黑客攻击。

The only two catches that I can think of are:

我能想到的唯一两个捕获量是:

  1. On 32-bit system, you are limited to just under 4GB total for all mapped byte buffers. (That is actually a limit for my application, and I now run on 64-bit architectures.)
  2. 在32位系统上,对于所有映射的字节缓冲区,总数限制在4GB以下。 (这实际上是我的应用程序的限制,现在我运行在64位体系结构上。)

  3. Implementation is JVM specific and not a requirement. I use Sun's JVM and there are no problems, but YMMV.
  4. 实现是特定于JVM的,而不是必需的。我使用Sun的JVM并没有问题,但是YMMV。

Kirk Pepperdine (a somewhat famous Java performance guru) is involved with a website, www.JavaPerformanceTuning.com, that has some more MBB details: NIO Performance Tips

Kirk Pepperdine(一位有点着名的Java性能大师)参与了一个网站www.JavaPerformanceTuning.com,它有更多的MBB细节:NIO性能提示

#3


1  

If you access files in a random fashion (read here, skip, write there, move back) then you have a problem ;-)

如果你以随机的方式访问文件(在这里阅读,跳过,写在那里,然后回去),那么你有问题;-)

But if you only write big files, you should seriously consider using streams. java.io.FileOutputStream can be used directly to write file byte after byte or wrapped in any other stream (i.e. DataOutputStream, ObjectOutputStream) for convenience of writing floats, ints, Strings or even serializeable objects. Similar classes exist for reading files.

但是如果你只写大文件,你应该认真考虑使用流。 java.io.FileOutputStream可以直接用于逐字节写入文件或包装在任何其他流(即DataOutputStream,ObjectOutputStream)中,以方便编写浮点数,整数,字符串甚至可序列化对象。存在用于读取文件的类似类。

Streams offer you convenience of manipulating arbitrarily large files in (almost) arbitrarily small memory. They are preferred way of accessing file system in vast majority of cases.

Streams为您提供了在(几乎)任意小内存中操纵任意大文件的便利性。在绝大多数情况下,它们是访问文件系统的首选方式。

#4


0  

The previous two responses seem pretty reasonable. As for whether the commandline switch will work, it depends how quickly your memory usage hits the limit. If you don't have enough ram and virtual memory available to at least triple the memory available, then you will need to use one of the alternate suggestions given.

前两个回答似乎很合理。至于命令行开关是否有效,它取决于你的内存使用量达到限制的速度。如果没有足够的ram和虚拟内存可用于至少三倍的可用内存,那么您将需要使用给定的备用建议之一。

#5


0  

Using the transferFrom method should help with this, assuming you write to the channel incrementally and not all at once as previous answers also point out.

使用transferFrom方法应该有助于此,假设您以增量方式写入通道而不是一次性写入,因为之前的答案也指出了。

#6


0  

This can depend on the particular JDK vendor and version.

这可能取决于特定的JDK供应商和版本。

There is a bug in GC in some Sun JVMs. Shortages of direct memory will not trigger a GC in the main heap, but the direct memory is pinned down by garbage direct ByteBuffers in the main heap. If the main heap is mostly empty they many not be collected for a long time.

某些Sun JVM中的GC存在错误。直接内存的短缺不会在主堆中触发GC,但直接内存由主堆中的垃圾直接ByteBuffers固定。如果主堆大部分是空的,那么很多都不会被收集很长时间。

This can burn you even if you aren't using direct buffers on your own, because the JVM may be creating direct buffers on your behalf. For instance, writing a non-direct ByteBuffer to a SocketChannel creates a direct buffer under the covers to use for the actual I/O operation.

即使您没有自己使用直接缓冲区,这也会烧毁您,因为JVM可能代表您创建直接缓冲区。例如,将非直接ByteBuffer写入SocketChannel会在封面下创建一个直接缓冲区,以用于实际的I / O操作。

The workaround is to use a small number of direct buffers yourself, and keep them around for reuse.

解决方法是自己使用少量直接缓冲区,并保留它们以供重用。

#1


6  

I would say don't create a huge ByteBuffer that contains ALL of the data at once. Create a much smaller ByteBuffer, fill it with data, then write this data to the FileChannel. Then reset the ByteBuffer and continue until all the data is written.

我想说不要创建一个包含所有数据的巨大ByteBuffer。创建一个小得多的ByteBuffer,用数据填充它,然后将这些数据写入FileChannel。然后重置ByteBuffer并继续,直到写入所有数据。

#2


4  

Check out Java's Mapped Byte Buffers, also known as 'direct buffers'. Basically, this mechanism uses the OS's virtual memory paging system to 'map' your buffer directly to disk. The OS will manage moving the bytes to/from disk and memory auto-magically, very quickly, and you won't have to worry about changing your virtual machine options. This will also allow you to take advantage of NIO's improved performance over traditional java stream-based i/o, without any weird hacks.

查看Java的映射字节缓冲区,也称为“直接缓冲区”。基本上,这种机制使用操作系统的虚拟内存分页系统将缓冲区“直接”映射到磁盘。操作系统将自动,非常快速地管理磁盘和内存中的字节,您不必担心更改虚拟机选项。这也将使您能够利用NIO相对于传统基于Java流的i / o的改进性能,而不会出现任何奇怪的黑客攻击。

The only two catches that I can think of are:

我能想到的唯一两个捕获量是:

  1. On 32-bit system, you are limited to just under 4GB total for all mapped byte buffers. (That is actually a limit for my application, and I now run on 64-bit architectures.)
  2. 在32位系统上,对于所有映射的字节缓冲区,总数限制在4GB以下。 (这实际上是我的应用程序的限制,现在我运行在64位体系结构上。)

  3. Implementation is JVM specific and not a requirement. I use Sun's JVM and there are no problems, but YMMV.
  4. 实现是特定于JVM的,而不是必需的。我使用Sun的JVM并没有问题,但是YMMV。

Kirk Pepperdine (a somewhat famous Java performance guru) is involved with a website, www.JavaPerformanceTuning.com, that has some more MBB details: NIO Performance Tips

Kirk Pepperdine(一位有点着名的Java性能大师)参与了一个网站www.JavaPerformanceTuning.com,它有更多的MBB细节:NIO性能提示

#3


1  

If you access files in a random fashion (read here, skip, write there, move back) then you have a problem ;-)

如果你以随机的方式访问文件(在这里阅读,跳过,写在那里,然后回去),那么你有问题;-)

But if you only write big files, you should seriously consider using streams. java.io.FileOutputStream can be used directly to write file byte after byte or wrapped in any other stream (i.e. DataOutputStream, ObjectOutputStream) for convenience of writing floats, ints, Strings or even serializeable objects. Similar classes exist for reading files.

但是如果你只写大文件,你应该认真考虑使用流。 java.io.FileOutputStream可以直接用于逐字节写入文件或包装在任何其他流(即DataOutputStream,ObjectOutputStream)中,以方便编写浮点数,整数,字符串甚至可序列化对象。存在用于读取文件的类似类。

Streams offer you convenience of manipulating arbitrarily large files in (almost) arbitrarily small memory. They are preferred way of accessing file system in vast majority of cases.

Streams为您提供了在(几乎)任意小内存中操纵任意大文件的便利性。在绝大多数情况下,它们是访问文件系统的首选方式。

#4


0  

The previous two responses seem pretty reasonable. As for whether the commandline switch will work, it depends how quickly your memory usage hits the limit. If you don't have enough ram and virtual memory available to at least triple the memory available, then you will need to use one of the alternate suggestions given.

前两个回答似乎很合理。至于命令行开关是否有效,它取决于你的内存使用量达到限制的速度。如果没有足够的ram和虚拟内存可用于至少三倍的可用内存,那么您将需要使用给定的备用建议之一。

#5


0  

Using the transferFrom method should help with this, assuming you write to the channel incrementally and not all at once as previous answers also point out.

使用transferFrom方法应该有助于此,假设您以增量方式写入通道而不是一次性写入,因为之前的答案也指出了。

#6


0  

This can depend on the particular JDK vendor and version.

这可能取决于特定的JDK供应商和版本。

There is a bug in GC in some Sun JVMs. Shortages of direct memory will not trigger a GC in the main heap, but the direct memory is pinned down by garbage direct ByteBuffers in the main heap. If the main heap is mostly empty they many not be collected for a long time.

某些Sun JVM中的GC存在错误。直接内存的短缺不会在主堆中触发GC,但直接内存由主堆中的垃圾直接ByteBuffers固定。如果主堆大部分是空的,那么很多都不会被收集很长时间。

This can burn you even if you aren't using direct buffers on your own, because the JVM may be creating direct buffers on your behalf. For instance, writing a non-direct ByteBuffer to a SocketChannel creates a direct buffer under the covers to use for the actual I/O operation.

即使您没有自己使用直接缓冲区,这也会烧毁您,因为JVM可能代表您创建直接缓冲区。例如,将非直接ByteBuffer写入SocketChannel会在封面下创建一个直接缓冲区,以用于实际的I / O操作。

The workaround is to use a small number of direct buffers yourself, and keep them around for reuse.

解决方法是自己使用少量直接缓冲区,并保留它们以供重用。