@BSzili Its all sarcastic and ritorical in some ways. I know how it all good for other platfoms.
I just remember how everyone keep saing that quake3 perfoms slow on minigl/warp3d, as it didnt' have hardware TCL. Now we have it , and no diffirences.
I just disappointed a bit by resulst we have in end, to say truth. Even in pure glBegin/glEnd, with _hadware_ tcl. Like, you know, its only problem of how handle glBegin/glEnd, and minigl do it better, and because of it it faster (or the same , not so matter).
Everyone keep saying and explain things, but result is : gl4es quake3 in glBegin/glEnd , with HARDWARE TCL, slower/the same as quake3 for minigl in glBegin/glEnd with SOFTWARE TCL. And that its after author of GL4ES make lots of optimisation (yeah, we all remember how we talk that Regaaaal ! Regalll will help us ! Which is slower than gl4es in 2 times :) )
Just a bit disappoined, that all. Was expected to have 150-200 fps in compare with minigl's 50 , even on pure glBegin/glEnd. Even with worse than in minigl implementation of it. Just because we have shaders and hardware TCL in gl4es version. But it just give no big sense with glBegin/glEnd route, at least. I expected more. Not the same 50 fps, but 150, 250, 350.
I feel your pain, but it's difficult to compare things this way, so everybody keeps guessing. You could take Hieronymus and collect some profiling data to see what takes up the CPU time.
If vertex array were available, Jedi Outcast and Jedi Academy would be better test cases, since they have high poly models. When running MOS and OS4 on the same machine the performance gap gets smaller if you lower the model detail. With glBegin/glEnd you would end up with a gazillion function calls, again more work for the CPU while the GPU sits idle :(
This is just like television, only you can see much further.
I not sure that guys who have 130 fps on the Radeon9250, have GART support. Did they ?:)
Yes, Radeon 9250s have GART support, and yes, it's used on other OSes.
Quote:
If, GART is soo much cool and nice, why amigans in last years all the time keep told that its software TCL which make problems with speed :)
Because software-TCL is a major bottleneck even if you refuse to believe us.
Quote:
Probably, if it _that_ nice and good , then instead of polaris drivers, having GART support in drivers we have , much much better course ?:) But its ritorical question :)
Supply of Southern Islands cards is drying up globally, which makes Polaris a high priority. We're well aware of GART's potential, and have more data to base priorities on than you...
Quote:
Yeah, and don't forget sky overlaping, z-fighting (or whatever, i just call it like this) when we watch in the mirror, and mess of textures in the letters fall (which can be side effect of that issues with gldrawlements). Also i found another bug, but will keep it for ourself, until Hans done with vertexattrib, as it can be side effect of it, as well.
I wonder if it really is z-fighting. We're using a 32-bit floating-point depth buffer and MiniGL has no issues using the same models, data, and graphics engine. So, depth-buffer precision is *not* an issue. Something weird seems to be going on. Is GL4ES converting data to a lower-precision format (which some mobile devices prefer) internally? In that case the GLES2 lib would have to convert it back. Such a two-step conversion could introduce errors (not to mention add unnecessary overhead). Or, is it rescaling depth to better fit finite precision depth-buffers in a way that doesn't work well with a floating-point buffer?
There could be a bug at our end, but this issue is suspicious.
Quote:
Everyone keep saying and explain things, but result is : gl4es quake3 in glBegin/glEnd , with HARDWARE TCL, slower/the same as quake3 for minigl in glBegin/glEnd with SOFTWARE TCL. And that its after author of GL4ES make lots of optimisation (yeah, we all remember how we talk that Regaaaal ! Regalll will help us ! Which is slower than gl4es in 2 times :) )
I guess you missed my point in my earlier post: MiniGL's glBegin()/glEnd() pipeline is pretty decent; better than GL4ES' even with the new optimizations.
BSzili is right that Q3 has a low polygon count per mesh, so testing with something more demanding like OpenJK could deliver different results. And, heironymus profiling could tell more about where CPU time is being used up.
Either way, you need to take a big step back. You've done one test using archaic glBegin()/glEnd() and are then jumping to all sorts of conclusions that you don't have the data to make. Commercial games/apps stopped using glBegin()/glEnd() a long time ago for a reason...
I wonder if it really is z-fighting. We're using a 32-bit floating-point depth buffer and MiniGL has no issues using the same models, data, and graphics engine. So, depth-buffer precision is *not* an issue. Something weird seems to be going on. Is GL4ES converting data to a lower-precision format (which some mobile devices prefer) internally? In that case the GLES2 lib would have to convert it back. Such a two-step conversion could introduce errors (not to mention add unnecessary overhead). Or, is it rescaling depth to better fit finite precision depth-buffers in a way that doesn't work well with a floating-point buffer?
That what gl4es author say:
---- gl4es does as little conversion as possible. And it does no conversion for Depth Buffer. So if OGLES2 provide a 32bits float depth buffer, it will use it. But I'm not sure the artefact you have are really a precision problem, at least not the Sky issue. FYI, the sky is drawn very early in the frame, like 2nd or 3rd draw command IIRC.
As for mirror issues, I'll check later (tonight) on my capture if there is something special in the draw commands. ----
I checked that mirror-issue for another run, and now, i can see that there is just one diagonal line in the banner, not 2 as on another screenshot.
There is screenshot where i look on the mirror, and put head up, to the sky. See there all those triangles on the sky. Also there better visibly them on the walls too, in compare with first screenhot:
Edited by kas1e on 2018/3/5 9:28:25 Edited by kas1e on 2018/3/5 9:57:33 Edited by kas1e on 2018/3/5 10:05:31
---- gl4es does as little conversion as possible. And it does no conversion for Depth Buffer. So if OGLES2 provide a 32bits float depth buffer, it will use it. But I'm not sure the artefact you have are really a precision problem, at least not the Sky issue. FYI, the sky is drawn very early in the frame, like 2nd or 3rd draw command IIRC.
As for mirror issues, I'll check later (tonight) on my capture if there is something special in the draw commands. ----
I meant conversion of the vertex data. Anyway, that still wouldn't explain why triangles that are supposed to be sharing vertices would end up with seams (which normally only occurs when vertices aren't exactly aligned). Your screenshot looks more like it's deliberately drawing lines at the edges (like the /r_showtris 1 option would, except r_showris renders the wireframe differently).
For whatever reason unknown to me it's an artefact by the mipmap filter, obviously only happening if Q3 renders its mirrors; the same geometry with the same texture parameters is rendered correctly if not going through the mirror. If forcing the game to W3DN_LINEAR instead of a mipmap-filter, then the artefacts vanish.
I don't know how Q3 does its mirrors, apparently its neither render-to-texture nor via stencil, so it's probably done with some funny depth-buffer setup and / or via clip-planes (that latter would have to be implemented in some fancy gl4es shader), others will know better.
If anyone curious of why in the fragment shader he use min(0., clippedvertex_0)<0. instead of just clippedvertex_0<0. : it's because if there is more then one clipplane, he do min(0., clippedvertex_0)+min(0., clippedvertex_1)<0. to do only 1 "if".
Edited by kas1e on 2018/3/5 14:55:54 Edited by kas1e on 2018/3/5 15:09:44
For whatever reason unknown to me it's an artefact by the mipmap filter, obviously only happening if Q3 renders its mirrors; the same geometry with the same texture parameters is rendered correctly if not going through the mirror. If forcing the game to W3DN_LINEAR instead of a mipmap-filter, then the artefacts vanish.
Sounds like a driver bug. It looks like something I saw early on with anisotropic filtering.
@kas1e Please submit a bug report for this, and send me a link to download the files I need to test it.
Just a curiosity, what version of Quake3 are you using for your gl4es test ? There is somewhere an optimizated version from m3x/hunoppc .. it's a bit faster and maybe could be used for your test
@Capehill Yeah, why not. Through with that quake3 was mostly in interest to make GL4ES being bug free.
@Daniel Dunno if you recieve my last mail about hack we add to the gl4es to fix those weird issues in quake3 with "Vertex Attribut of GL_UNSIGNED_BYTE that need Normalisation" , where i wrote that it also fix the same kind of issues as quake3 have, in IrrLicht engine (which is heavy enough, as well). And all the examples was broken before the same way as with quake3, but with hack all start works. Which can mean that its not undefined behaviour maybe ?
If it will be of any help, i can upload few irrlicht engine examples binaries with and without hack.
@kas1e Just like the other ~100, I have received your mail, but since I covered pretty much everything in my previous mails answers and because I had other stuff to do, I didn't answer that one yet I said before "the more the merrier", so no need to ask me, just upload your test builds on my FTP (btw. I never received your hacked q3), but:
Quote:
Vertex Attribut of GL_UNSIGNED_BYTE that need Normalisation
I already told you and will repeat it again once more: This is definitely not the issue! As being said I had already checked this out in depth before you did (simply because it was the very obvious difference in the other render path) and found it to be all good.
Quote:
And all the examples was broken before the same way as with quake3, but with hack all start works. Which can mean that its not undefined behaviour maybe ?
Let me repeat this one too:
"Don't draw false conclusions only because your hack seems to work around the problem. ... The fact that some changes change that undefined behaviour doesn't conclude that those changes point directly at the true culprit, unfortunately."
The problem simply is not what you and the gl4es guy think it is. You are only changing the symptoms with your hacks. Therefore, again a repetition: better revert that hack again, it was interesting but it turned out to not be the problem, so the only effect it has now is that it complicates things because it tends to "hide" the still existing real issue.
As with that ql4es build of that letter-game (which is broken for you but not for me) we simply got (semi)random / undefined behaviour here, caused by something else somewhere, but definitely not caused neither by glDrawElements nor the vertex-attribute's type / normalization. Just let it go, man!
The root of the problem is somewhere else, earlier. It's most likely some memory coruption somewhere, where it is is still unknown, it can be in gl4es, it can be in ogles2, it can be in Nova (neither one can be ruled out yet).
As I told you I am investigating to make sure that it gets fixed - if it is an ogles2 issue. So far my investigations did not reveal a problem in ogles2, that's the current status. This does not mean that there is no problem in ogles2. It only means that I didn't find one yet. But I can say for sure that it is not directly related the vertex attributes being of type uint8 or normalized.
And also let me repeat something somebody else told you already: being pushy won't help
I don't think he's being pushy. I think he's just so excited to be making progress that he doesn't want to slow down. That's understandable given a platform where the progress has historically been measured in years....
@ferrels Yes, I know that kas1e is very excited, which is good of course. And he's of great help. After all it's thanks to his efforts with first Regal and now gl4es that ogles2.lib and Nova are being tested in real-life rather intensenly now and already got some nice improvements / fixes in short time.
@kas1e Sometimes I just have the feeling that you need to be grounded a bit Step by step. We'll for sure find and fix that and any upcoming issues. Sometimes it simply takes more than a couple of hours
@Daniel Dunno if you follow SDL1 thread , but if not, then i found some interesting issue, which very well related to memory trashing issues you mention.
So, while Capehill working on fixing and updating SDL1, i tried every new version with adding GL4ES to it. The way of how i do it from SDL1 side there:
As you can see, there i only create context and passing all the stuff to SDL internal code (as MiniGL do before). I only create context there, because opening of ogles2 iface/library happens in GL4ES itself (he need it inside of gl4es), there:
And there is helper functions, which named same as ogles2 ones, just calling without interface (so to have same names, but have some help code before running originals):
Now, with SDL1 , i found that once i put before creating of context that "prinfs" line which you see in the first link, and build Cadog with it, then i have no background titile picture in game ! Just all white ! But once, i limit that string to only prinfs 1, or 2, or 3 bytes, then title picture is back. But only i printf 4 or more bytes, then its all fucks again.
Then, for sake of tests, i tryied to add in SDL1 for context creation not IOGLES2->aglCreateContextTags(), but IOGLES2->aglCreateContext() (and prepare structure TagList ourself). Then issues still here. But once i call it without IOGLES2, i.e. pure: hidden->IGL=aglCreateContext(0, tags); (so, to use helper function from gl4es, which basically do nothing at moment, but only call real OGLES2->aglCreateContext), then, title picture is back even with big prinfs !
Then, i also rebuild LettersFall with that new SDL1, where i swap IOGLES2->aglCreateContextTags() on aglCreateContext(), and that problems i have before _almost gone_ ! They still here, but they cleary change behaviour.
That all can mean again the same : memory trashing somewhere. And with those "prinfs" of few bytes before creatig of context, or with swapping way of how i create context , i only shift the memory trashing somewhere else (but not fix it of course, as , issues in LettersFall still there. Less,less visibly, but still there).
Sadly, but its again the same 3 parts in the combo: SDL1, Gl4ES and OGLES2/WARP3DNOVA, and i can't create any test case which will show memory trashing, and not involve anything from those 3 parts. Well, and in case with SDL, it also used a lot of 3d party libs (sdl_image, sdl_mixer, png, jpeg, etc ,etc) any of which can be memory-trashers as well, but with SDL1/MiniGL i never noticed such issues, so probably problem come from our usuall combo.
Now question is : how to detect where and what cause that.
In the SDL code, do you want to call the wrapper function in gl4es or OGLES2 library function?
I didn't realize that there are similarly named wrapper functions (agl*) inside gl4es code. This could be the key. I don't know much about libraries but I checked AHX.library for a reference, and see how parameters are passed from varargs function:
In the SDL code, do you want to call the wrapper function in gl4es or OGLES2 library function?
I do not know how to do it better. Probably it should make no difference if it call them like OGLES2-> , or call them as helpers from agl.c (taken in account if they do nothing in helpers). All in all we have in agl.c "extern struct OGLES2IFace *IOGLES2; " , so it all should works by any way.
Before i just use pure OGLES2-> from SDL code, and where there was needs to add code from helpers (like for aglSwapBuffer), i just add them myself. Then i also tried to use all as helpers, and found that helper fucntion for aglCreateContextTags just do not works (i can see how SDL window creates and then closes immediately).
But anyway, even if didn't use helper's functions, but IOGLES2-> , then problem with Cadog still there. Just when i replace IOGLES2->aglCreateContextTags, on version without Tags, then problem with Cadog disappear, but that mean nothing, as issue with thashed memory still there, just shiftes.
Quote:
I didn't realize that there are similarly named wrapper functions (agl*) inside gl4es code. This could be the key.
Probably i need to ask him to rename them all, to something like gl4es_aglXXXX , etc to avoid misleading.
I do not know, is it possible that wrong writen helper for aglCreateContextTags (with all those varargs), can cause memory trashing, even, if it didn't used anywhere ? Probably no ? Sure on amigaos everything possible, but if i didn't call badly writen function, it should't trashing memory imho.
You cannot have a vararg wrapper function calling a vararg wrapped function (passing varargs from one to the other). That's not possible.
A wrapper function with vararg params only works if the wrapped function does *not* have vararg parms, ie. there is an alternative version of the wrapped function which instead of ..., takes for example a "va_list" param. Or a "struct TagItem *" param.
It may look like it sort of works (but things like adding variables, adding dprintfs() changes behaviour) but it really doesn't work. It just looks like that because the vararg params that were passed to wrapperfunction() are still there on the stack (and on PPC maybe partly in registers), so the wrapped function may pick them up, but incorrectly/half wrong/half right/completely wrong/not at all.
Try playing with this test program (try also on 68k, x86 if possible):
#include <stdio.h>
#include <stdarg.h>
void wrappedfunction(int a, ...)
{
int i;
va_list args;
va_start(args, a);
va_end(args);
printf("\nwrappedfunction:\n");
for(i = 1; i <= 10; i++)
{
printf(" ARG %2d: %d\n", i, va_arg(args, int));
}