@All In meantime, some new optimisation frmo gl4es guy:
Quote:
I have made a new optimization to the glBegin/glEnd merger, to reduce CPU load during merge (infact, to do the merge on the fly instead of after the glEnd). It also reduce the number of malloc/free used. Overall, it makes a nice boost in some cases (and Quake3 with r_primitive=1 is one).
And it give huge boost. We still don't crash minigl, but we pretty close to it at least. I do not post GL_Extensions results, as for us they that GL_Ext_Vertex_array_compile didn't works currently, so we can't compare, but what we can compare is plain , non gl_extensions version. So:
After new optimisation we have +10-11fps everywhere in timedemo1/demo four:
@Raziel I not tested, but you just need to download source (exactly of that "update-ks" directory, and run "make" over makefile to have ready to use library. Did you try already and it fail by some reassons ?
I'm asking because Daytona enhanced/created some missing functions in a beta of his, which fixed some obious problems in a port of mine when using OpenGL/MiniGL.
But that "fix" was never released and i wonder if it ever will be...
Changes in v2.21 ---------------- - glGet*() now returns the value of GL_BLEND_DST, GL_BLEND_SRC, GL_COLOR_CLEAR_VALUE, GL_STENCIL_BITS, - GL_ALPHA8 and GL_LUMINANCE8_ALPHA8 are now recognized as valid internal formats - Added missing stub for glGetPointerv() - Added fake VBO support. For source code compatibility only, the buffers are stored in the system memory. New functions: glGenBuffers(), glDeleteBuffers(), glBindBuffer(), glBufferSubData(), glBufferData(), glGetBufferSubData(), glMapBuffer(), glUnmapBuffer(), glGetBufferParameteriv() - glTexEnvi no longer sets GL_INVALID_ENUM unnecessarily when multitexturing is used - Fixed glCopyTexImage2D and glCopyTexSubImage2D with GL_RGB textures - glTexGenfv() now supports GL_EYE_PLANE - Added S3TC texture compression using the S2TC compressor. Supports all of the S3TC, DTXn and ARB formats - Fixed glCopyTexImage2D() - Added GL_DEPTH_COMPONENT support to glReadPixels(). Only supports GL_FLOAT type - Added support for the GL_INCR_WRAP and GL_DECR_WRAP stencil ops - glTexSubImage2D and glCopyTexSubImage2D now work on mip levels larger than 0 - Added support for the GL_ARB_vertex_array_bgra extension
I don't know are those changes made By Daytona?, or is there any practical difference to current version, but those are differences to version 2.20 wich is on OS4.depot.net
Broadblues said that those fixes are "tiny".
Wondering is there another beta laying around Daytona's hard drive.
If some of you think that those would make a difference, could you please compile it and upload it to OS4 depot?
Release version is 2.21 and afaik was already bundled into latest update of AmigaOS4 so i think there is no need, on repo there is some minor fixes till 2.22 but really nothing special And Imho no need to compile even that one as Huno still finalizing a new update that should be released soon
@samo I only hope, that hunos branch will not replace clasdic minigl,as it just totaly different thing, with doing things minigl imho do not need. Its just different thing, mostly like making something with minigl by adding more mess in.
Its like if we will now add to lowlevel drivers high level things, like wget or curl, making non needit "installers" on top of it and co. Drivers are drivers they have no needs to be extended like that. Making prefs or gui for drivers also not the drivers thing.
You can install Subversion and use "svn log" command if you want to check deeper.
It would be probably more ideal to move the classic MiniGL project to GitHub because the current repository doesn't have any issue tracker, for example. GitHub would allow also pull requests which would make contributing easier.
I don't know are those changes made By Daytona?, or is there any practical difference to current version, but those are differences to version 2.20 wich is on OS4.depot.net
Most were made by BSzili, some were made by me, some we made hand in hand (I fixed some stuff in R200, he added the coresponding features to MiniGL). 2.21 is really a fat update and it is key to exploit some of the R100/R200 improvements made back then (e.g. texture compression, stencil effects). The one most interesting to Raziel was some rather experimental support for glDrawPixels(GL_DEPTH_COMPONENT), which makes some ScummVM games working well. BSZili committed that glDrawPixels(GL_DEPTH_COMPONENT) thing in March 2016 (I had no write access back then). It's just that we forgot to increase version number / update the readme back then.
Quote:
Wondering is there another beta laying around Daytona's hard drive.
No.
@samo79 @broadblues While 2.22 is a real micro fix, it's not sooo unimportant as it might look, because as being mentioned the bug probably caused side-effects in the mipmapping code (I didn't check it out in depths, I just saw that it got called from there).
@kas1e back on topic Great, some very nice boost with gl4es! And best of all: it was achieved without touching ogles2 or Nova, you just needed to feed it correctly and get rid of some slow code in gl4es And from what you told the Pandora comparison was incorrect: it was even slower (2 fps) than the Amiga version if using the same code-path.
From my quick tests it looks as if the one eating most frame-time here is still gl4es though. E.g. if I actually cancel out every draw call with more than 20 triangles now (so that just enough gets rendered to see the fps counter and being able to get through the menus etc.) then performance doesn't change for me significantly at all. But the thing is that practically all potentially time-consuming stuff inside ogles2 is only triggered if such a draw-call is actually executed. I don't know what the concrete issues in gl4es are. More such malloc/free stuff could cause it. Or very cache-unfriendly behaviour. Both are things which can easily have much less impact on other systems.
I don't say that there isn't any optimization potential inside ogles2, but the by far major time-eater seems to be gl4es at the moment, so before I concentrate on squeezing out some more performance out of ogles2 (which is kind of very hard as long as the client is the major bottleneck) this should be accelerated.
That messed up display when using Q3's ext-drawing looks to me like corupt memory somewhere. I will probably take a deep look at ogles2 today, maybe it's not "corupt" memory but rather "misinterpreted" memory and the internal client-RAM-emulation VBOs are copied together falsely. That's rather unlikely but not too unplausible from the symptoms. Or maybe the indices-VBO is falsely setup (considering that there was a recent change in that area in ogles2, maybe sth. got messed up there).
What speaks against all this is the fact that gl4es uses ogles2's glDrawElements all the time (and glDrawArrays uses pretty identical code too), e. g. when it translates glBegin / glEnd, but well, there's more to glDrawElements than the pure function call. It's mostly about pointer setup / data conversion and such. And this may look totally different in one render-path compared to the other. And there may well be an until now undetected issue inside ogles2.
I'll get back when I ruled in / out this. Cheers, Daniel
That messed up display when using Q3's ext-drawing looks to me like corupt memory somewhere. ...
I suggest you hold off on investigating the graphics corruption with glDrawElements() problem until I've fixed the vertex attrib padding issue, because I've spotted this in Quake 3's code:
qglVertexPointer (3, GL_FLOAT, 16, input->xyz);
Assuming that the code expects the fourth element to be 1 (very likely), that's probably causing utter chaos.
IIRC hunno just "added" waZp3d_lib usage to minigl_lib, so user that don't have a waRp3d gfx card can use programs/games that use minigl_lib in a transparent way (so don't need to rename/subtitute real waRp3d_lib with waZp3d_lib in LIBS:)
@jabirulo Adding wazp3d to minigl imho wrong. And thats why: wazp3d good for tests, play a bit with systems, run the apps on winuae, etc. But that chaotic "configs" which are different for every game , tons of options (which just casual user don't need), make a real unfriendly mess. And huno want to put it inside minigl , with all those configs ?:) and add installers with wget :)
Minigl wasnt mean to have support of systems where is no working warp3d. Its exactly in opposite. If anyone didnt have working warp3d , he can just play with wazp3d separately. No need to mess drivers, with adding wazp3d in it. Just swap libs as wazp3d doc says, and no need to add it all to the minigl, to create more mess in the code.
Like we have no problems with minigl, but need to add wazp3d to it, then installer (like you cant put library to libs:) , and whatever else.
The only needs in minigl its add more GL functions and fixes, just like Bszilli and Daniel do lately, but not adding a software emulator with configs, prefs gui, downloaders, installers, and whatever else of such kind.
@All Some more impovements from gl4es author, at this time its "more improvements to the glbegin/glend memory merger". Through it add not a lot, but 3-4fps everywhere:
--640x480--
MGL/SDL1: 56.2 MGL/SDL2: 60.6 GL4ES/SDL1: 48.4
--1024x768--
MGL/SDL1: 50.2 MGL/SDL2: 54.9 GL4ES/SDL1: 45.0
--1600x1200--
MGL/SDL1: 45.2 MGL/SDL2: 47.1 GL4ES/SDL1: 40.4
And when looking at the mirror in the first level , in 1600x1200 , minigl give us 19-20fps, and last gl4es version give us 18-19fps. Still a little slower as we can see from all tests, but better a lot in compare with few days ago 5fps.
@Daniel I also ask about more batching (i.e. about those calls most of which less or equal to 20 triangles), and author says that sadly there is limit for batching, and it stops when some glstate changes (blend, texture, etc..).
But we will see what speed are, once we will have working glDrawElements version, as it will mean:
1). no glBegin/glEnd overhead and memory merger 2). gl_ext_compiled_vertex_arrays 3). no batching need it, everything will be at one call (will it ? or it still will be the same calls with less than 20 triangles anyway, just without glBegin/glEnd overhead?)
And all what Hans say:
Quote:
- The overhead of analysing the index array and copying vertices to a new buffer (increases the chance of the data not fitting the CPU's cache) - It's cutting up the vertex array into smaller fragments (more batches) - Vertices in a mesh that are shared between tri-strips are duplicated, meaning more vertices need to be sent to the GPU (more data == more transfer time, and it increases the chance of the data not fitting the CPU's cache even more)
So, taken in account, that minigl version give with glDrawElements ~100fps in 640x480, we probably can have all 200, if, of course, no other bottlenecks anywhere will wait us (but they will, for sure :)) ).
I also ask about more batching (i.e. about those calls most of which less or equal to 20 triangles),...
Sorry, you completely misunderstood and mixed up what I wrote I said that I artificially modified ogles2 for testing to cancel out every draw call with more than 20 triangles. I did so to see if there is a performance difference. If ogles2 is the bottleneck in this scenery here then there should be a significant difference, because the draw-commands are where most of the time-consuming stuff inside ogles2 happens. Since this is not the case, gl4es is very very likely the major bottleneck. Where this bottleneck is, no idea. But I also told you that the batching is fine now, at least in terms of number of triangles sent.
Quote:
... and author says that sadly there is limit for batching, and it stops when some glstate changes (blend, texture, etc..).
I know, that's the very nature of things. Glad I didn't ask such a question...
@Daniel Yeah, seems messed it all up .. Was under impression that once you limit gles2 to cancel out every draw call with more than 20 triangles, but still can see fps/menu/etc, then it mean that all calls are still small ones, with less than 20 triangles each -> bad -> need batching :))
And yeah, he also explain me that batching in fine now , as "A draw scene of game as complex as Quake3 with 326 draw call is completly normal, and is even quite low when you take the mirror into account (mirror is a redraw of everything, it's not an image)."