Login  Register

Performance Issues with VBOs

Posted by bgroenks96 on Dec 31, 2013; 5:34am
URL: https://forum.jogamp.org/Performance-Issues-with-VBOs-tp4031089.html

I'm sure this is an issue that's come up before, and I'm sure I'm doing something horrendously wrong.  Basically, I can't match let alone exceed the performance of immediate mode drawing with VBOs.

Basically, this library is set up as a higher level abstraction around the JOGL direct function calls.

Where I'm having the issues right now is in the code for drawing textured quads with VBOs and GL_TRIANGLE_STRIP.  Everything draws correctly, but the performance is lagging behind immediate mode functions by nearly 50%.

The process is broken down into three main components, each of which I will display independently to make things clearer.  Allocation, modification, and drawing.  There are two unique things being rendered: a static background image colored with glColor4f(0.5f, 0.5f, 0.6f, 1), and a quad with a texture that displays a semi-transparent circle (blending is enabled).  The background quad is drawn once, the texture quad is drawn 100 times with its location modified by a random number on each frame (this is just for testing).

The buffers are initialized with this method code:

                // rectBuffInd is an int[] to hold the buffer IDs for rectangle VBOs
                // ind is just an int specifying the current index to store the id in
                gl.glGenBuffers(1, rectBuffInd, ind);
                gl.glBindBuffer(GL_ARRAY_BUFFER, rectBuffInd[ind]);
                if(textured) {
                        gl.glBufferData(GL_ARRAY_BUFFER, 16 * Buffers.SIZEOF_FLOAT),
                                        null, storeType.usage);
                } else {
                        gl.glBufferData(GL_ARRAY_BUFFER, 8 * Buffers.SIZEOF_FLOAT),
                                        null, storeType.usage);
                }

                gl.glBindBuffer(GL_ARRAY_BUFFER, 0);

                // this probably looks weird but it only runs once, and I need it to map IDs properly
                int id = rectBuffInd[ind];
                Arrays.sort(rectBuffInd);
                ind = Arrays.binarySearch(rectBuffInd, id);
                return ind;

"storeType.usage" is an enum containing the usage flags.  This is GL_STREAM_DRAW for the texture quads that are updated each frame and GL_STATIC_DRAW for the background image.  Likewise,, texBound and texEnabled are only true for the textured quads.

The next part actually stores the vertex data.  This runs every frame for the texture quads and only once for the background image:

                int buffSize = 8 * Buffers.SIZEOF_FLOAT;
                if(texBound && texEnabled)
                        buffSize *= 2;
                gl.glBindBuffer(GL_ARRAY_BUFFER, rectBuffId);
                gl.glBufferData(GL_ARRAY_BUFFER, buffSize, null, storeType.usage); // discard previous data
                ByteBuffer buff = gl.glMapBuffer(GL_ARRAY_BUFFER, GL_WRITE_ONLY); // upload via write-only mapping
                FloatBuffer floatBuff = buff.order(ByteOrder.nativeOrder()).asFloatBuffer();
                floatBuff.put(x); floatBuff.put(y);
                floatBuff.put(x); floatBuff.put(y + ht);
                floatBuff.put(x + wt); floatBuff.put(y);
                floatBuff.put(x + wt); floatBuff.put(y + ht);
                if(texBound && texEnabled)
                        floatBuff.put(texCoords);
                gl.glUnmapBuffer(GL_ARRAY_BUFFER);
                gl.glBindBuffer(GL_ARRAY_BUFFER, 0);

Finally, the drawing code:

                // draw all quads in vertex buffer
                gl.glBindBuffer(GL_ARRAY_BUFFER, buffId);
                gl.glEnableClientState( GL_VERTEX_ARRAY);
                gl.glVertexPointer(2, GL_FLOAT, 0, 0);
                if(texBound && texEnabled) {
                        gl.glEnableClientState(GL_TEXTURE_COORD_ARRAY);
                        gl.glTexCoordPointer(2, GL_FLOAT, 0, 8 * Buffers.SIZEOF_FLOAT);
                }
                gl.glDrawArrays(GL_TRIANGLE_STRIP, 0, nverts);

                gl.glBindBuffer(GL_ARRAY_BUFFER, 0);
                gl.glDisableClientState(GL_VERTEX_ARRAY);
                if(texBound && texEnabled)
                        gl.glDisableClientState(GL_TEXTURE_COORD_ARRAY);

Again, everything looks correct (it's a blob of randomly moving circles), but the performance is the issue.

With this procedure I get ~650 FPS
With the immediate mode functions equivalent I get ~1100 FPS

Now this I find REALLY weird.  If I change GL_WRITE_ONLY to GL_READ_WRITE, I get ~900 FPS.  Much better (although still not matching immediate mode :/ )

How does that make sense?  Shouldn't GL_READ_WRITE be slower?  It's setting up the mapping as a two-way bus right?  I don't need to read anything off of the VBO, I just need to keep uploading new data.

I would greatly appreciate any suggestions!