NOTE: Rapidly creating and deleting textures should be avoided. It'll work, but you have extra overhead. Instead, try to batch up all your font letters into a single texture (your font glyph cache), and then draw in one operation. You can upload new glyphs to the texture via glTexSubImage2D() instead of allocating a new texture. Even then, it's best to do that as few times as possible.
One of my ideas for Warp3D Nova, would be to have debug wrappers that could do things like: validate the parameters and warn you about common errors, or log the calls like PVRTrace does. That would make developing Warp3D Nova apps/games easier.
I'm too busy working on other stuff, but this is something that another developer could build.
@All Hans fixed issue which we found in lettersfall. Thanks !
@Hans Quote:
NOTE: Rapidly creating and deleting textures should be avoided. It'll work, but you have extra overhead. Instead, try to batch up all your font letters into a single texture (your font glyph cache), and then draw in one operation. You can upload new glyphs to the texture via glTexSubImage2D() instead of allocating a new texture. Even then, it's best to do that as few times as possible.
Yeah, just some games done like this, and its good if they can works as well without total rewrite :)
"Ok, let's start from the buggy ones (as find out roots and fix problems is more important than rereleasing another time quake3 :) ).
So, all the tests should be done on latest ogles2.library and warp3dnova.library. ogles2.library should be 1.22, and warp3dnova.library should be 1.58. It is very important to have those versions , as if not, tests may not help.
There is how it looks like on my setup when i choice "options" or "howtoplay" or "highscores" or start a game (you can see distorted textures of text, they like doubled and shifted randombly):" ------------------------------------------------------------------------------------------
On my X1000 all screens look and work normal...screen resizing too went well
my setup:Corsair GS800 800w PSU, Western Digital 250 GB SATA HDD (model:WD2500YS), 4GB Ram, Radeon HD7950-3GB GDDR5- Dual Slot, Lite-On RW DVD/CD, On-Board Nemo sound,8139B NIC, Catweasel MK4+
On my X1000 all screens look and work normal...screen resizing too went well
That because you didn't play with settings a lot. Sometime there needs to go from/to many times to get trashing and that probably related to the amount of memory installed on gfx card.
But that anyway not important anymore, as Hans already fix that issue.
Now we fight with the same issues which was happens in quake3 and irrlicht engine before, to which Daniel add workaround before, but we want to be it without workaround and all necessar conversion code be in Nova (so, maybe it will give another speed up, if all goes well).
@all Currently Hans add necessary conversion code, so all almost works without Daniel's workaround, but just need some tweaks from ogles2 side, so we wait when Daniel will back from his trip.
@Hans,Daniel Another new facts interesting to discuss about speed of q3 over gl4es/ogles2/warp3dnova.
Surprisingly, we think before, that "compiled vertex arrays" extension give us some fps in gl4es , but its not. I mean at all, its pure placeholder. That what say gl4es author:
Quote:
Really, that compiled vertex arrays extension is not usable by gl4es. What it does is tell the opengl driver that the vertex data (and only the vertex data) are set and will not change between glLockArrays(...) and glUnlockArrays() so a opengl driver that don't have hardware transform can transform the vertices... But has gl4es use Hardware T&L (in shaders), it's just useless. And quake3 make changes to other arrays (colors, textures UV) in between, so I cannot really build anything stable...
What it mean, that it just placeholder, and doing nothing for us.
All the speed we gain when enable gl extensions in q3 in gl4es version, come from "multitexture" extension: it give 10fps+. And another speed up we have when just use glDrawElements() instead of glBegin/glEnd route (another 10fps+ only).
compiled vertex arrays extension - does nothing as author say.
texture_env_add - dunno if does, but add nothing in q3 as well.
Soo.. Problem is that author of gl4es can't come up with anything good to make that glLockArrays (compiled vertex arrays) extension. He already tried a few things, but it's really only good when you have to transform the vertices in full software, and he have not been able to do anything usefull with it.
Maybe any of us can give any ideas so he may try to implement it ?
We see the call to glLockArrays(...) and then the engine enable GL_TEXTURE_COORD_ARRAY and GL_COLOR_ARRAY. That means those 2 arrays are not Locked, but still used for drawing (and the values of thoses 2 arrays will be changed between the Lock and Unlock)...
So the client software can enable Arrays that are not locked, so that make the Lock/Unlock mecanism useless (for gl4es at least), because a drawing command is part on locked Arrays (like Vertex coordinates) and part on unlocked Arrays (like vertex colors or UV).
If anyone have any ideas how to make anything usefull with gl4es with that extension, any idea can help us.
Maybe any of us can give any ideas so he may try to implement it ?
A few possible ideas: 1. Put the vertex position data into non-interleaved arrays in a VBO. Then you can update the colour & texture coordinates more efficiently (better with caches and non need to upload the positions again) 2. Put the vertex positions in a separate VBO. Again, you'll be able to update the colour & texture coordinates more efficiently
1. Create a VBO with vertex position, color, texcoords, normals, but non interleaved. And only update changed color / texcoords.
2. Create a VBO with only vertex position. Color and Texcoords and Normals out of the VBO.
?
Well, for (1) I can probably try to implement, but that seems to be quite some work, and I'm unsure of the performances gain. Plus I don't know what vertex attributes (color, how many texcoords, normals) will be needed for drawing.
For (2) I'm unsure how standard this thing is: some vertex attributes in a VBO and some in another VBO. While I can probably try to implement that, again, I'm unsure of the performances gain (as you still need to transfert colors and other VA) and unsure how various GLESv2 driver will accept this kind of things.
Has this optimisation are only for old engine, and those engines probably are running pretty well on most hardware already, I don't think it's worth the risk of slowing other stuff down, and not worth the added complexity of the code.
1. Create a VBO with vertex position, color, texcoords, normals, but non interleaved. And only update changed color / texcoords.
Yes. Something like glBufferSubData() can be used to upload just the changed attributes.
Quote:
2. Create a VBO with only vertex position. Color and Texcoords and Normals out of the VBO.
The remaining attributes would go into a separate VBO in this variation. Basically, you'd have one VBO for static data, and one VBO for the dynamic ones (with appropriate usage hints: GL_STATIC_DRAW and GL_STREAM_DRAW). This option would probably work best on OpenGL implementations where the driver can put GL_STREAM_DRAW buffers in GART space (which is on the to-do list for Nova).
This is an entirely valid and normal way to use VBOs, see this link.
Quote:
Has this optimisation are only for old engine, and those engines probably are running pretty well on most hardware already, I don't think it's worth the risk of slowing other stuff down, and not worth the added complexity of the code.
Your call ptitSeb. I doubt it'll slow stuff down because drivers for GLES2 level hardware use VBOs internally for the data, anyway. With option #2, you're actually giving the driver the hints to optimize each VBO for the data being sent.
Mmm ok. I'll look at the separate VBO then, to change the current (quite inefective) VBO stuffs I implemented some time ago and see if this could be implemented. It should be easier to do (VBO for all vertex attrib active at the time of glLock, other vertex attrib will remain outside).
>We see the call to glLockArrays(...) and then the engine enable GL_TEXTURE_COORD_ARRAY and GL_COLOR_ARRAY. That means those 2 arrays are not Locked, but still used for drawing
What does it mean ? I mean what does the game want to do ?
Does it mean that the game want to no update the vertices but just update the texcoords and colors ? Perhaps in this case the VBO will do the vertices transform GPU side but the lighting (colors) will be done by CPU ? and what about texcoords ?
It will looks more logical to have the whole array not updated and that light&transform is done GPU side , no ?
Anyway from my own experiments: updating a vbo is truly the bottleneck in Nova (so also in GL-ES and GL4ES)
Dunno about that, at least ogles2 works fine over nova. Can you bring some test cases to Hans so we can see if anything wrong at all ?
I asume he's simply talking about the fact that updating VBOs in general is a comparably slow process. Which is of course simply in the nature of things. And until Nova gets DMA/GART support here things are even slower on AOS4 compared to e.g. PC. Those are all well known facts, nothing new at all. And of course Hans knows that So I guess this isn't something anybody needs to come up with test cases for
I asume he's simply talking about the fact that updating VBOs in general is a comparably slow process. Which is of course simply in the nature of things. And until Nova gets DMA/GART support here things are even slower on AOS4 compared to e.g. PC. Those are all well known facts, nothing new at all. And of course Hans knows that So I guess this isn't something anybody needs to come up with test cases for
Correct on all counts. Vertex uploads are relatively slow because we don't have DMA/GART yet. Hence avoiding uploading data more then necessary is a good idea.
Is it necessary to update graphics.library as well, or all can be done outside of graphics.library (so to not rely on when hyperion will release updates) ?
I mean is it _that_ dma which we miss in graphics.library for x5k ?
1) Object & VBO got exacly the same points formats so you can memcpy the whole XYZUVWRGBA list in one call = slow but cant be faster
2) Object & VBO dont have the same points formats so you must pick xyz at one place to copy it elwhere in the VBO, uvw ditto, etc.... and so become more slow
A partial glLockArrays is case 2) because even if XYZ is in a separate static VBO you will (certainly) have to pick the UVW RGBA in the original points list to copy them to their VBO
I mean is it _that_ dma which we miss in graphics.library for x5k ?
No. The graphics library's DMA is internal to it, so it's not available to external drivers.
I'm talking about using the graphics card's own GART/IOMMU and DMA. The Graphics Address Remapping Table, which these days is more of an I/O Memory Management Unit maps main memory into the GPU's address space. Once that's set up, the GPU can read stuff from main memory using its own DMA engines. It's on the to-do list for the future, and we will get there at some point...
@all While waiting for others to sort their real-life deals, hacking around and made some hacks to our (Capehill's) SDL2, so it can use GL4ES as well.
Changes was almost the same as for SDL1, with some small differences (like some different names of structs and datas in, but logic about the same). And as almost all initialisation done in the init stages of gl4es itself, i have no needs to open/close librairies from sdl1/sdl2 , and co, just createcontext/makeactivewindow. Also whole "wrapper" code for gl functions are inside gl4es too, so for sdl2 (and for sdl1 too) file is empty. In other words changes just in 1 file (SDL_os4opengl.c), and another one, wrapper one, are empty (SDL_os4opengglwrapper.c). There they are:
And result are: (press open image in new window for full size)
What good from it ? Authors of active projects start to move to SDL2, so its good that we can try to build not only SDL1 based apps over opengl, but also SDL2 ones. At moment it can be quake3 for seeing how it different by speed in compare with sdl1 version, when both over gl4es.