Put more emphasize on the good news The rendering output looks almost perfect.
Yes, i even for first run think "damn, i run minigl version ?". But then, i can find little glitches, just few, but you probably can try to see them on your side :) But its really almost perfect yes. Wasn't expected this (expectually, that quake3 use glDrawArrays, and we know about that vertexattrib4 problem).
Will pack all the shit and make it all easy peasy in a hour or so :)
Thanks for help !
EDIT: Just for future notice : gl4es author build pandora version (it is 1ghz only machine) , and he test it against GLES2 backend (so that one which we use in gl4es on os4):
Quote:
Working perfectly on GLES2 backend on the Pandora. Graphics are just fine, and the demo runs at 40fps, with some parts lowing down to 22, and sometime up to 60fps...
So, we probably should expect about 100 fps for real..
- MiniGL, as every other GL implementation, has functions to take care. If you use MGLResizeContext and glViewport on resize, then all is fine. - ogles2.lib doesn't need something like an explicit MGLResizeContext. That's just a different convention compared to MGL. In MiniGL you must call this function on window resize, in ogles2 it's done automatically. Both ways of doing it are equally good / bad / legal.
MiniGL also has a lower level API with a function called MGLUpdateContextTags() that enables settings front and back buffer bitmaps directly. Which blender uses, so it can create it's own rastports for combined AmigaOS4 and OpenGL rendering.
@broadblues Yes, ogles2.library also has different ways how to specify the render target. Besides supplying a window-handle like in this case here you can also supply your own Bitmaps or tell the lib to manage a buffered screen for your. Since recently you can also switch between all those methods on the fly without losing your context.
BTW, Warp3D Nova supports anisotropic texture filtering, so this extension can be enabled (after querying whether the driver supports W3DN_Q_ANISOTROPICFILTER, of course): GL_EXT_texture_filter_anisotropic
Throuh main problem, is that fps in 4 times lower than in minigl. That show that somehitng really wrong somewhere ..
Disabling GL_EXT_compiled_vertex_array will certainly have contributed to that, although it sounds like something else is wrong for it to be that much slower. Maybe something is flushing the pipeline like crazy? That'll give a performance hit, because there's a limit in how many draw operations/command-queues can be submitted per second.
I really noticed this when developing the W3D_SI Warp3D driver, as MiniGL is notoriously bad at cutting render ops into little bits. I initially got 10 fps with OpenArena due to this, and so the driver does its best to reassemble tiny render ops into larger batches.
Disabling GL_EXT_compiled_vertex_array will certainly have contributed to that
With all extensions disabled in minigl version or wih all extenions enabled in the same minigl version, make no differences. Maybe 0.5-1 fps, not more.
Reassons why it didn't works for now very well maybe because of "Shader attribute expansion to four components" feature we need :) As that GL_EXT_compiled_vertex_array extension enable the use of glDraw...(...). Without it it use glBegin(...)/glEnd() blocks. So this is consistant with the issue with vertex size.
Problem with 5FPS very visibly in the first level, when we load it, and just do nothing (so we look at the Mirror). Once we move away from mirror, fps jump a bit more (maybe to 10-12), but 5FPS very visibly and stable when look at mirror (minigl version give 22 fps on the same place from the same code builds)
Quote:
I really noticed this when developing the W3D_SI Warp3D driver, as MiniGL is notoriously bad at cutting render ops into little bits. I initially got 10 fps with OpenArena due to this, and so the driver does its best to reassemble tiny render ops into larger batches.
gl4es guy profiling gles2 backed version of gl4es yesterday, and all what he found is that he use "clipplane" in shaders, which as he say "can be done not very optimal", so, then we tryin to build version shaders of which do not have it , but no, it only give +3fps differences, maybe less. What mean there is some other problem involved which make things be so slow. But for him, on Pandora (1ghz), he have more FPS on gles2 backend, than i on x5000.
Also he say, that GLES2 Trace he have done shows that when facing the mirror (i.e. in that first level when we watch at mirror, where we have 5fps now), there is around 550 draw commands happens. So, that "flushing the pipeline like crazy", seem reasonable.
From another side, we can rule out probably NOVA , as if MiniGL give us a lot bigger fps (and works over NOVA). Also with software TCL and all those "bad at cutting render ops into little bits".
But,also, ogles2 was tested already by Entwickler-x guys, and probably if there was perfomance issues they should notice it.. (through, they may very well use some features which do things right, while we with gl4es use everything possible, which can slow things down in places which ones no one else touch before).. But to be seen what Daniel say after profiling ..
@Daniel Is MiniGL version of quake3 runs on MGL reloaded ? Just so we can rule out some more things.
Edited by kas1e on 2018/3/1 7:15:43 Edited by kas1e on 2018/3/1 7:17:22 Edited by kas1e on 2018/3/1 7:19:12 Edited by kas1e on 2018/3/1 7:33:11
With all extensions disabled in minigl version or wih all extenions enabled in the same minigl version, make no differences. Maybe 0.5-1 fps, not more.
That sounds a bit suspect. I vaguely remember looking into this, and I thought it had a bigger impact than that.
Quote:
Reassons why it didn't works for now very well maybe because of "Shader attribute expansion to four components" feature we need :) As that GL_EXT_compiled_vertex_array extension enable the use of glDraw...(...). Without it it use glBegin(...)/glEnd() blocks. So this is consistant with the issue with vertex size.
I'm not convinced. If a vertex and fragment shader use variables of different sizes, then Warp3D Nova currently rejects them outright. Basically, it won't render anything.
When used well, GL_EXT_compiled_vertex_array allows the driver to shift vertices into VRAM, which should result in a decent speed up (if it's used a lot).
Quote:
Also he say, that GLES2 Trace he have done shows that when facing the mirror (i.e. in that first level when we watch at mirror, where we have 5fps now), there is around 550 draw commands happens. So, that "flushing the pipeline like crazy", seem reasonable.
5 * 550 = 2750 draw calls/s. We can manage a lot more than that, so something must be getting in the way.
That sounds a bit suspect. I vaguely remember looking into this, and I thought it had a bigger impact than that.
Dunno how it sounds, its just truth: tried on x5000 for now again. Maybe on some other machines back in past it give noticable differences, but not with x5k at least.
Quote:
I'm not convinced.
?:) Sadly i need to convince you to "Shader attribute expansion to four components" fast fix :( Its not like we ask for features or make BZ just because :)
Its not about quake3 of course, it about gl4es in whole, in many place of which we can't use it all because of that feature we need. Many glDrawArrays calls fail/render bad/render nothing because of it , and as result, 5 or 7 games i tried, fails to render, as they use "old school" way of doing things (so glDrawArrays, etc), and that in end lead to problems. Daniel can implemnt it in gles2, but that will give perfomace issues, so it need to be implemented in GPU, but you know it all yourself. So plz, pretty please, make that feature available, or gl4es can't progress without.
You may think "how then Cadog game works?", it works, just when author of gl4es, made a moster workaround for that, so we can test it. Later he remove it all together, as its should be done on our driver's level. In other words, gl4es can't progess without that feature added.
As well, as it can't progress without "OpSelect not implemented" thing, too :)
While with latest one "OpSelect not implemented" thing it give us less problems, issues with "Shader attribute expansion to four components" just make it almost unpossible to make gl4es works. Please, make it if it easy enough. We do not create BZ just for sake of BZ :)
Quote:
5 * 550 = 2750 draw calls/s. We can manage a lot more than that, so something must be getting in the way.
Author of gl4es says that many GLES2 hardware also doesn't like to have many draw command, so gl4es tries to group them as much as it can. So, even if gl4es tries to make it better than possible, maybe issue in elsewhere and not in "crazy flush"
Finished just in time. Half an hour later my FTP etc. would have gone to bed
Quote:
as for "content resizing", maybe add it as option when context creates ?
Yes, adding such an option is no problem and doesn't hurt.
@Hans Quote:
Warp3D Nova supports anisotropic texture filtering, so this extension can be enabled
Yes, I saw this in the logs too and already added that missing extension string.
Quote:
Maybe something is flushing the pipeline like crazy?
ogles2 contains logic to only flush if necessary, because yes: I also found too many flushs to be the main performance killer back then. But never say never : we'll see when I found the time to check it out.
@kas1e Quote:
here is around 550 draw commands happens. So, that "flushing the pipeline like crazy", seem reasonable.
The number of draw commands being issued is not directly coupled to any flushs (e.g. my boing-ball test prog issues 1024 glDraw-calls and performs well even if not using VBOs).
Quote:
ogles2 was tested already by Entwickler-x guys, and probably if there was perfomance issues they should notice it.
Yes, but they also use VBOs a lot, so I'm actually wondering why those would cause any problems. On the other hand they use hand-crafted shaders that are designed with current Nova limitations in mind and for optimal performance, so maybe it's also just something at that area.
Quote:
we can rule out probably NOVA , as if MiniGL give us a lot bigger fps (and works over NOVA)
Does it? Since when?
Quote:
Is MiniGL version of quake3 runs on MGL reloaded ? Just so we can rule out some more things.
Of course it does *not*. Don't you think I would have told you? MGLReloaded is currently good enough for about 80% of the demos that ship with MiniGL, that's it.
Quote:
Do you mean in the quake3 options to change in "lighting" lightmap to the vertex ? If so, then it give 10 fps then.
I actually find this observation here rather interesting. Sounds as if the shaders in use / number of textures being used has a very strong impact on performance here; actually one I wouldn't expect in that strength.
But well, as being said: we'll know more / for sure what's happening when I checked it out! This may take some time though.
Of course it does *not*. Don't you think I would have told you? MGLReloaded is currently good enough for about 80% of the demos that ship with MiniGL, that's it.
What i mean there is : you can try to run minigl verion of q3 from my archive over minigl reloaded, as well, as over usuall minigl.library, to see differences.
If over minigl.realoded all will be fine and good and faster than over pure minigl, then problem is gl4es probably. If it will be the same slow as gl4es version, then it can be ogles2/nova.
just to rule out gl4es layer (or at least point on it).
You may think "how then Cadog game works?", it works, just when author of gl4es, made a moster workaround for that, so we can test it. Later he remove it all together, as its should be done on our driver's level. In other words, gl4es can't progess without that feature added.
Which is something we knew before that already. Really, why do you guys burn your time for a "monster workaround" which will be reverted anyway for sth. which we already found out and Hans already said he's taking care of?
@kas1e Rest asured (and I mean rest ): Hans knows very well how important that is for gl4es. The point is simply: you should not modifiy gl4es. Just throw test-progs at us one by one, we'll fix the issues step by step as they appear. The constant "monster workarounds" in gl4es are of exactly zero benefit, things won't get done faster by that and it won't reveal issues that won't get revealed anyway. It's just a waste of time of you and the gl4es-guy and it may actually complicate things as it also may reveal "pseudo-issues" that in fact turn out to be only side-effects of already known issues.
What is even stranger is that the other games that already work use similar stuff anyway. Still,there must be either something OGES2/Warp3D doesn't like in shaders or something in how the data are fed in the driver.
Just to note, gl4es doesn't use any VBO for now (I plan to try use them, but for now, all VBO are emulated), and the array generated by glBegin(..) / glEnd() are not interlaced, they are separate arrays (I'll try to work on that also, it can helps performances I think).
Just to note, VBO are not used by Quake3, like in most OpenGL 1.x games. But maybe OGLES2 driver expect all its data in VBO yes. Using actual VBO in gl4es require some work. It was not designed to use VBO in the first place, so I need to alter many critical place. Using real VBO is part of my TODO, as I expect some speed boost in some architecture (but not on the Pandora according to some preliminary tests done with Doom3), but, it's not a small change...
@kas1e The reason for Q3 being so slow is that the game does practically zero batching. ogles2 is flooded by glDraw-calls of practically always less or equal to 10 triangles. If I artifically limit ogles2 to ignore any draw-calls with more than 10 triangles, then everything looks like before
@Daytona675x Not that i understand why minigl faster there (as i think before that minigl do all wrong), but is it something which can be fixed in drivers so it will show us 50 fps instead of minigl's 20 ? :)
@kas1e Drawing a scene like that is the ultimate most inefficient way to do things and one of the big "donts" in terms of GL.
I suppose W3DSI does extra batching internally. I however am not certain yet if I shall add sth. like that to ogles2.lib as it's such an insane API abuse.
If MiniGL does it, then maybe it's sth. to add to gl4es, since it's the MiniGL equivalent here, right?
@Daytona675x All in all its quake3 : works on all possible platforms and drivers. I assume if quake3 do something, then almost all other games do the same.
What make me curious, is why that "bad" minigl faster in 4 times, even with quake3 which do no batching. And we of course want not "the same as minigl" speed (as then why we need it all), but kind of faster.. Or then there is minigl already (which is bad, but faster even with TCL in software :)) )
Maybe its not quake3, but gl4es split it all like this ? Through that cant explain why he on Pandora with 1ghz cpu and some not so goid gfx card have results which beat even minigl on x5k(taking aside gl4es version, which gl4es on pandora just crash)
Ogles2 if course not minigl equalent, but imho we all expect it to be so much better, that will crash minigl in compare.. But..:)
I assume if quake3 do something, then almost all other games do the same.
Drawing single triangles pre glDraw-call? Certainly not. And if a game does so, bad luck, at least with ogles2
Quote:
What make me curious, is why that "bad" minigl faster in 4 times, even with quake3 which do no batching.
I already told you my guess: probably W3DSI does some batching internally. Or MiniGL itself, dunno.
But I know one thing for sure: ogles2 is not optimized / designed for thousands of single-triangles-from-client-mem-draw-calls per frame, which is why we got that low performance here. ogles2 is well optimized for what people usually do with it, while considering the way Nova likes it: mostly use VBOs and if using client-mem-arrays at all, then those are usually not just a handful of triangles. It is not optimized for what Q3+gl4es deliver right now and I probably won't optimize it for that kind of stuff.
Quote:
Maybe its not quake3, but gl4es split it all like this ?
Don't know.
Quote:
Ogles2 if course not minigl equalent, but imho we all expect it to be so much better, that will crash minigl in compare..
It will, as soon as you start to feed it with something else than single triangles