将PNG加载到OpenGL性能问题中 - Java和JOGL比C#和Tao.OpenGL慢得多

时间:2022-04-05 17:22:30

I am noticing a large performance difference between Java & JOGL and C# & Tao.OpenGL when both loading PNGs from storage into memory, and when loading that BufferedImage (java) or Bitmap (C# - both are PNGs on hard drive) 'into' OpenGL.

我注意到Java和JOGL以及C#和Tao.OpenGL在将PNG从存储器加载到内存时,以及将BufferedImage(java)或Bitmap(C# - 两者都是硬盘驱动器上的PNG)加载到'OpenGL中时,性能差异很大。

This difference is quite large, so I assumed I was doing something wrong, however after quite a lot of searching and trying different loading techniques I've been unable to reduce this difference.

这种差异非常大,所以我认为我做错了,但经过大量的搜索和尝试不同的加载技术后,我一直无法减少这种差异。

With Java I get an image loaded in 248ms and loaded into OpenGL in 728ms The same on C# takes 54ms to load the image, and 34ms to load/create texture.

使用Java,我得到一个248ms加载的图像,并在728ms内加载到OpenGL中C#上加载图像需要54ms,加载/创建纹理需要34ms。

The image in question above is a PNG containing transparency, sized 7200x255, used for a 2D animated sprite. I realise the size is really quite ridiculous and am considering cutting up the sprite, however the large difference is still there (and confusing).

上面讨论的图像是一个包含透明度的PNG,大小为7200x255,用于2D动画精灵。我意识到尺寸确实非常荒谬,并且正在考虑削减精灵,但是差异仍然存在(并且令人困惑)。

On the Java side the code looks like this:

在Java端,代码如下所示:

BufferedImage image = ImageIO.read(new File(fileName));
texture = TextureIO.newTexture(image, false);
texture.setTexParameteri(GL.GL_TEXTURE_MIN_FILTER, GL.GL_LINEAR);
texture.setTexParameteri(GL.GL_TEXTURE_MAG_FILTER, GL.GL_LINEAR);

The C# code uses:

C#代码使用:

Bitmap t = new Bitmap(fileName);

t.RotateFlip(RotateFlipType.RotateNoneFlipY);
Rectangle r = new Rectangle(0, 0, t.Width, t.Height);

BitmapData bd = t.LockBits(r, ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);

Gl.glBindTexture(Gl.GL_TEXTURE_2D, tID);
Gl.glTexImage2D(Gl.GL_TEXTURE_2D, 0, Gl.GL_RGBA, t.Width, t.Height, 0, Gl.GL_BGRA, Gl.GL_UNSIGNED_BYTE, bd.Scan0);
Gl.glTexParameteri(Gl.GL_TEXTURE_2D, Gl.GL_TEXTURE_MIN_FILTER, Gl.GL_LINEAR);
Gl.glTexParameteri(Gl.GL_TEXTURE_2D, Gl.GL_TEXTURE_MAG_FILTER, Gl.GL_LINEAR);

t.UnlockBits(bd);
t.Dispose();

After quite a lot of testing I can only come to the conclusion that Java/JOGL is just slower here - PNG reading might not be as quick, or that I'm still doing something wrong.

经过大量测试后,我只能得出结论,Java / JOGL在这里速度较慢 - PNG读取可能不会那么快,或者我仍然做错了。

Thanks.

Edit2:

I have found that creating a new BufferedImage with format TYPE_INT_ARGB_PRE decreases OpenGL texture load time by almost half - this includes having to create the new BufferedImage, getting the Graphics2D from it and then rendering the previously loaded image to it.

我发现创建一个格式为TYPE_INT_ARGB_PRE的新BufferedImage会使OpenGL纹理加载时间减少近一半 - 这包括必须创建新的BufferedImage,从中获取Graphics2D,然后将之前加载的图像渲染到它。

Edit3: Benchmark results for 5 variations. I wrote a small benchmarking tool, the following results come from loading a set of 33 pngs, most are very wide, 5 times.

Edit3:5种变化的基准测试结果。我写了一个小的基准测试工具,下面的结果来自加载一组33个png,大多数是非常宽,5倍。

testStart: ImageIO.read(file) -> TextureIO.newTexture(image)  
result: avg = 10250ms, total = 51251  
testStart: ImageIO.read(bis) -> TextureIO.newTexture(image)  
result: avg = 10029ms, total = 50147  
testStart: ImageIO.read(file) -> TextureIO.newTexture(argbImage)  
result: avg = 5343ms, total = 26717  
testStart: ImageIO.read(bis) -> TextureIO.newTexture(argbImage)  
result: avg = 5534ms, total = 27673  
testStart: TextureIO.newTexture(file)  
result: avg = 10395ms, total = 51979  

ImageIO.read(bis) refers to the technique described in James Branigan's answer below. argbImage refers to the technique described in my previous edit:

ImageIO.read(bis)指的是James Branigan在下面的答案中描述的技术。 argbImage是指我之前编辑中描述的技术:

img = ImageIO.read(file);
argbImg = new BufferedImage(img.getWidth(), img.getHeight(), TYPE_INT_ARGB_PRE);
g = argbImg.createGraphics();
g.drawImage(img, 0, 0, null);
texture = TextureIO.newTexture(argbImg, false);

Any more methods of loading (either images from file, or images to OpenGL) would be appreciated, I will update these benchmarks.

任何更多的加载方法(从文件中的图像,或图像到OpenGL)将不胜感激,我将更新这些基准。

5 个解决方案

#1


7  

Short Answer The JOGL texture classes do quite a bit more than necessary, and I guess that's why they are slow. I run into the same problem a few days ago, and now fixed it by loading the texture with the low-level API (glGenTextures, glBindTexture, glTexParameterf, and glTexImage2D). The loading time decreased from about 1 second to "no noticeable delay", but I haven't done any systematic profiling.

简短回答JOGL纹理类比必要的要多得多,我想这就是为什么它们很慢。我几天前遇到了同样的问题,现在通过使用低级API(glGenTextures,glBindTexture,glTexParameterf和glTexImage2D)加载纹理来修复它。加载时间从大约1秒减少到“没有明显的延迟”,但我没有进行任何系统的分析。

Long Answer If you look into the documentation and source code of the JOGL TextureIO, TextureData and Texture classes, you notice that they do quite a bit more than just uploading the texture onto the GPU:

长答案如果你查看JOGL TextureIO,TextureData和Texture类的文档和源代码,你会发现它们不仅仅是将纹理上传到GPU上:

  • Handling of different image formats
  • 处理不同的图像格式

  • Alpha premultiplication

I'm not sure which one of these is taking more time. But in many cases you know what kind of image data you have available, and don't need to do any premultiplication.

我不确定其中哪一个花费更多时间。但在许多情况下,您知道您可以使用哪种图像数据,而不需要进行任何预乘。

The alpha premultiplication feature is anyway completely misplaced in this class (from a software architecture perspective), and I didn't find any way to disable it. Even though the documentation claims that this is the "mathematically correct way" (I'm actually not convinced about that), there are plenty of cases in which you don't want to use alpha premultiplication, or have done it beforehand (e.g. for performance reasons).

alpha预乘功能无论如何都完全放错了这个类(从软件架构的角度来看),我没有找到任何方法来禁用它。即使文档声称这是“数学上正确的方式”(我实际上并不相信),但是有很多情况下你不想使用alpha预乘,或者事先已经完成了(例如表现原因)。

After all, loading a texture with the low-level API is quite simple unless you need it to handle different image formats. Here is some scala code which works nicely for all my RGBA texture images:

毕竟,使用低级API加载纹理非常简单,除非您需要它来处理不同的图像格式。这里有一些scala代码可以很好地适用于我所有的RGBA纹理图像:

val textureIDList = new Array[Int](1)
gl.glGenTextures(1, textureIDList, 0)
gl.glBindTexture(GL.GL_TEXTURE_2D, textureIDList(0))
gl.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_MIN_FILTER, GL.GL_LINEAR)
gl.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_MAG_FILTER, GL.GL_LINEAR)
val dataBuffer = image.getRaster.getDataBuffer   // image is a java.awt.image.BufferedImage (loaded from a PNG file)
val buffer: Buffer = dataBuffer match {
  case b: DataBufferByte => ByteBuffer.wrap(b.getData)
  case _ => null
}
gl.glTexImage2D(GL.GL_TEXTURE_2D, 0, GL.GL_RGBA, image.getWidth, image.getHeight, 0, GL.GL_RGBA, GL.GL_UNSIGNED_BYTE, buffer)

...

gl.glDeleteTextures(1, textureIDList, 0)

#2


1  

I'm not sure that it will completely close the performance gap, but you should be able to use the ImageIO.read method that takes a InputStream and pass in a BufferedInputStream wrapping a FileInputStream. This should greatly reduce the number of native file I/O calls that the JVM has to perform. It would look like this:

我不确定它是否会完全缩小性能差距,但您应该能够使用ImageIO.read方法,该方法接受InputStream并传入包装FileInputStream的BufferedInputStream。这应该会大大减少JVM必须执行的本机文件I / O调用的数量。它看起来像这样:

File file = new File(fileName);
FileInputStream fis = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(fis, 8192); //8K reads
BufferedImage image = ImageIO.read(bis);

#3


1  

Have you looked into JAI (Java Advanced Imaging) by any chance, it implements native acceleration for tasks such as png compressing/decompression. The Java implementation of PNG decompression may be the issue here. Which version of jvm are you using ?

你有没有机会研究JAI(Java Advanced Imaging),它为png压缩/解压缩等任务实现了原生加速。 PNG解压缩的Java实现可能是这里的问题。您使用的是哪个版本的jvm?

I work with applications which load and render thousands of textures, for this I use a pure Java implementation of DDS format - available with NASA WorldWind. DDS Textures load into GL faster since it is understood by the graphics card.

我使用加载和渲染数千种纹理的应用程序,为此我使用DDS格式的纯Java实现 - 可与NASA WorldWind一起使用。由于图形卡可以理解DDS纹理加载到GL更快。

I appreciate your benchmarking and would like to use your experiments to test out DDS load times. Also tweak the memory available to JAI and JVM to allow loading of more segments and decompression.

我感谢您的基准测试,并希望使用您的实验来测试DDS加载时间。还调整JAI和JVM可用的内存,以允许加载更多段和解压缩。

#4


1  

Actually, i load my textures in JOGL like this:

实际上,我在JOGL中加载我的纹理,如下所示:

TextureData data = TextureIO.newTextureData(stream, false, fileFormat);
Texture2D tex = new Texture2D(...);   // contains glTexImage2D
tex.bind(g);
tex.uploadData(g, 0, data);  // contains glTexSubImage2D

Load textures in this way can bypass the extra work for contructing a BufferedImage and interpreting it. It's pretty fast for me. U can profile it out. im waiting for your result.

以这种方式加载纹理可以绕过构造BufferedImage并解释它的额外工作。这对我来说非常快。你可以把它描出来。我在等你的结果。

#5


0  

you can also try loading the Texture directly from a BufferedImage There is an example here.

你也可以尝试直接从BufferedImage加载纹理这里有一个例子。

Using this you can see if the image load is taking the time, or the write to Create / Video Memory.

使用此功能,您可以查看图像加载是占用时间还是写入创建/视频内存。

You may also want to think about the size of the image to a power 2, ie 16,32,64,128,256,1024... dimensions, some gfx card will not be able to process non power 2 sizes, and you will get blank textures when using on those gfx cards.

您可能还想将图像的大小考虑到功率2,即16,32,64,128,256,1024 ......尺寸,某些gfx卡将无法处理非功率2尺寸,并且您将获得空白纹理在那些gfx卡上使用时。

#1


7  

Short Answer The JOGL texture classes do quite a bit more than necessary, and I guess that's why they are slow. I run into the same problem a few days ago, and now fixed it by loading the texture with the low-level API (glGenTextures, glBindTexture, glTexParameterf, and glTexImage2D). The loading time decreased from about 1 second to "no noticeable delay", but I haven't done any systematic profiling.

简短回答JOGL纹理类比必要的要多得多,我想这就是为什么它们很慢。我几天前遇到了同样的问题,现在通过使用低级API(glGenTextures,glBindTexture,glTexParameterf和glTexImage2D)加载纹理来修复它。加载时间从大约1秒减少到“没有明显的延迟”,但我没有进行任何系统的分析。

Long Answer If you look into the documentation and source code of the JOGL TextureIO, TextureData and Texture classes, you notice that they do quite a bit more than just uploading the texture onto the GPU:

长答案如果你查看JOGL TextureIO,TextureData和Texture类的文档和源代码,你会发现它们不仅仅是将纹理上传到GPU上:

  • Handling of different image formats
  • 处理不同的图像格式

  • Alpha premultiplication

I'm not sure which one of these is taking more time. But in many cases you know what kind of image data you have available, and don't need to do any premultiplication.

我不确定其中哪一个花费更多时间。但在许多情况下,您知道您可以使用哪种图像数据,而不需要进行任何预乘。

The alpha premultiplication feature is anyway completely misplaced in this class (from a software architecture perspective), and I didn't find any way to disable it. Even though the documentation claims that this is the "mathematically correct way" (I'm actually not convinced about that), there are plenty of cases in which you don't want to use alpha premultiplication, or have done it beforehand (e.g. for performance reasons).

alpha预乘功能无论如何都完全放错了这个类(从软件架构的角度来看),我没有找到任何方法来禁用它。即使文档声称这是“数学上正确的方式”(我实际上并不相信),但是有很多情况下你不想使用alpha预乘,或者事先已经完成了(例如表现原因)。

After all, loading a texture with the low-level API is quite simple unless you need it to handle different image formats. Here is some scala code which works nicely for all my RGBA texture images:

毕竟,使用低级API加载纹理非常简单,除非您需要它来处理不同的图像格式。这里有一些scala代码可以很好地适用于我所有的RGBA纹理图像:

val textureIDList = new Array[Int](1)
gl.glGenTextures(1, textureIDList, 0)
gl.glBindTexture(GL.GL_TEXTURE_2D, textureIDList(0))
gl.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_MIN_FILTER, GL.GL_LINEAR)
gl.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_MAG_FILTER, GL.GL_LINEAR)
val dataBuffer = image.getRaster.getDataBuffer   // image is a java.awt.image.BufferedImage (loaded from a PNG file)
val buffer: Buffer = dataBuffer match {
  case b: DataBufferByte => ByteBuffer.wrap(b.getData)
  case _ => null
}
gl.glTexImage2D(GL.GL_TEXTURE_2D, 0, GL.GL_RGBA, image.getWidth, image.getHeight, 0, GL.GL_RGBA, GL.GL_UNSIGNED_BYTE, buffer)

...

gl.glDeleteTextures(1, textureIDList, 0)

#2


1  

I'm not sure that it will completely close the performance gap, but you should be able to use the ImageIO.read method that takes a InputStream and pass in a BufferedInputStream wrapping a FileInputStream. This should greatly reduce the number of native file I/O calls that the JVM has to perform. It would look like this:

我不确定它是否会完全缩小性能差距,但您应该能够使用ImageIO.read方法,该方法接受InputStream并传入包装FileInputStream的BufferedInputStream。这应该会大大减少JVM必须执行的本机文件I / O调用的数量。它看起来像这样:

File file = new File(fileName);
FileInputStream fis = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(fis, 8192); //8K reads
BufferedImage image = ImageIO.read(bis);

#3


1  

Have you looked into JAI (Java Advanced Imaging) by any chance, it implements native acceleration for tasks such as png compressing/decompression. The Java implementation of PNG decompression may be the issue here. Which version of jvm are you using ?

你有没有机会研究JAI(Java Advanced Imaging),它为png压缩/解压缩等任务实现了原生加速。 PNG解压缩的Java实现可能是这里的问题。您使用的是哪个版本的jvm?

I work with applications which load and render thousands of textures, for this I use a pure Java implementation of DDS format - available with NASA WorldWind. DDS Textures load into GL faster since it is understood by the graphics card.

我使用加载和渲染数千种纹理的应用程序,为此我使用DDS格式的纯Java实现 - 可与NASA WorldWind一起使用。由于图形卡可以理解DDS纹理加载到GL更快。

I appreciate your benchmarking and would like to use your experiments to test out DDS load times. Also tweak the memory available to JAI and JVM to allow loading of more segments and decompression.

我感谢您的基准测试,并希望使用您的实验来测试DDS加载时间。还调整JAI和JVM可用的内存,以允许加载更多段和解压缩。

#4


1  

Actually, i load my textures in JOGL like this:

实际上,我在JOGL中加载我的纹理,如下所示:

TextureData data = TextureIO.newTextureData(stream, false, fileFormat);
Texture2D tex = new Texture2D(...);   // contains glTexImage2D
tex.bind(g);
tex.uploadData(g, 0, data);  // contains glTexSubImage2D

Load textures in this way can bypass the extra work for contructing a BufferedImage and interpreting it. It's pretty fast for me. U can profile it out. im waiting for your result.

以这种方式加载纹理可以绕过构造BufferedImage并解释它的额外工作。这对我来说非常快。你可以把它描出来。我在等你的结果。

#5


0  

you can also try loading the Texture directly from a BufferedImage There is an example here.

你也可以尝试直接从BufferedImage加载纹理这里有一个例子。

Using this you can see if the image load is taking the time, or the write to Create / Video Memory.

使用此功能,您可以查看图像加载是占用时间还是写入创建/视频内存。

You may also want to think about the size of the image to a power 2, ie 16,32,64,128,256,1024... dimensions, some gfx card will not be able to process non power 2 sizes, and you will get blank textures when using on those gfx cards.

您可能还想将图像的大小考虑到功率2,即16,32,64,128,256,1024 ......尺寸,某些gfx卡将无法处理非功率2尺寸,并且您将获得空白纹理在那些gfx卡上使用时。