Do you really want the responsibility of doing all the endianness conversion?
No, the driver knows best which endianness is needed.
Quote:
If there's enough interest, I could add the ability to check the GPU's endianness and disable the endianness conversion, making it the app's/game's responsibility (or the GLES2 wrapper's).
The only thing that comes to my mind where it could in theory result in somewhat better performance is if you are updating a VBO every frame and have direct access to the VRAM (if you don't then another internal copy is required anyway and then e.g. a tight stwbrx loop is likely better) and if you can do the conversion without poluting your code. IMHO it's not worth the trouble. Althoug... if you can offer that feature for free and optional and if you don't have more important things to do, then go ahead
Quote:
There's no bug in the way; it's all about optimization. However, I'm not sure why it's 30% slower, or how to optimize it. We don't have tools that could identify the bottlenecks (e.g., cache misses, etc.), so it's more guess work than anything else. I suppose I could try insert some cache prefetching instructions to see if that helps.
Maybe the opposite is better. Did you by accident call dcbz on the (VRAM) destination? This could probably result in such a dramatic slowdown. Also: are you 100% sure that there aren't any debugging artefacts remaining?
Other than that: we are talking about Roman's performance numbers here. He has an X5000 and from my experience (!) its automatic cache prefetching works pretty well, in contrast to other PPCs. So, unless you have any weird access pattern (which you should not have?) manual prefetching shouldn't change the picture toooo much in this case here.
But it's hard to come up with concrete hints without source, of course
@Hans I feel your pain, but there is another shader's issue: Just porting another game, and it fail right in the face: this time is fragment one (for fogging), and pretty small one. I attach all the stuff to the ticket (all info, shader, plain output of w3dnshaderinfo, and verbose one). There is:
Maybe the opposite is better. Did you by accident call dcbz on the (VRAM) destination? This could probably result in such a dramatic slowdown. Also: are you 100% sure that there aren't any debugging artefacts remaining?
No dcbz in the copy to VRAM routines. However, I just realized that the endianness handler is C++ code, and all C++ code is currently compiled with optimization disabled because the optimizer somehow trashes shared pointer reference counts.**
So it's probably generating crappy code right now.
Hans
** NOTE: This seems to only happen with shared libraries, and *not* with general application code.
@Hans Optimizations turned off is most likely the culprit for the performance penalty here, at least it could easily explain a penalty of that order of magnitude.
Weird with your smart pointers though. STL? For ogles2 I use my own templated ref-counting smart pointers without any issue s (I like to write my own stuff and avoid the STL whenever I can). Crap.
Found some interesting issue, dunno is it related to SDL2, or to ogles2, or to nova, but: when i compile quake3 via gl4es (so ogles2/warp3dnova) and SDL2, then, when i choice size of window equal or more than my size of workbench, i have rendered black screen in the window (in full screen all is fine).
With the same version of SDL2 and minigl i have no problems with that : window opens, rendering happens, just i can't see everything, but it renders.
For example if i set workbench to 1024x768, then i will have rendering in window and not black screen in all resolutions till 1024x768, but once more, or 1024x768, then nothing renders to window. All works, etc, i can exit from game, just nothing can be seen.
Or, if i set workbnech to 1920x1080 , but then choice resolution in the q3 1600x1200 : also render black window, but anything lower , renders fine.
As minigl there works ok, and so SDL2 is fine then, i can think only about:
1). i do something wrong when adding gl4es support to sdl2 2). something with ogles2/warp3nova (or the way how i create context maybe?) 3). gl4es probabaly ruled out, as its too related to the workbench and resolution which main screen has.
Optimizations turned off is most likely the culprit for the performance penalty here, at least it could easily explain a penalty of that order of magnitude.
I hope so. Haven't had a chance to test the theory, though.
Quote:
Weird with your smart pointers though. STL? For ogles2 I use my own templated ref-counting smart pointers without any issue s (I like to write my own stuff and avoid the STL whenever I can). Crap.
The STL ones weren't available on AmigaOS when I started, so it's the BOOST shared_ptr.
@kas1e Quote:
Found some interesting issue, dunno is it related to SDL2, or to ogles2, or to nova, but: when i compile quake3 via gl4es (so ogles2/warp3dnova) and SDL2, then, when i choice size of window equal or more than my size of workbench, i have rendered black screen in the window (in full screen all is fine).
Which module is responsible for blitting the final rendered image to the window? That's most likely where the problem lies.
Since this involves a width > screen size, it's possible that something like unsigned arithmetic is mucking it up. For example: uint32 width1 = 1024; uint32 width2 = 1280; uint32 offsetX = width1 - width2; // Oops! Going negative with a uint32 will give you a very large number
This can be even more subtle: float offsetX = width1 - width2; // Still bad because the subtraction is done in uint32 before converting to float (i.e., end result is a really big and incorrect number)
@Hans Checked v1.63 of nova, and yes, bug with stencil and checks the first pointer three time fixed, thanks !
I found another issue which i report to Daniel month or so ago, but seems he for now very busy to even answer on my annoing mails about :) (and maybe everything fine from ogles2 side and its all come from w3dnova, dunno). In other words maybe you will have a clue just from seeing a crashlog. Issue is:
In friking shark game, there is some ability to draw fps on screen pressing on some button (currently in release version disabled), which, cause skippable dsi error.
And, its crashes not when it showups, but when you die (so, some refresh of things happens or dunno what). I.e. you start play, press key, have all renders fine, then you die, and when screen going "fade out", then it crashes, you skip dsi and all going well futher.
The code which cause that, are simple:
SOpenGLSystemFont *pFont=GetSystemFontForHeight((unsigned int)dFontHeight);
if(pFont)
{
int nFinalY=(int)(y+pFont->nMetricDescent);
And the functions which actually cause a DSI are: glRasterPos2d(x,nFinalY);
When i see crashlog, it point out on ogles2.library, thats why i report it to Daniel firstly, and he send me some ogles2.library with debug symbols, so we can see where exactly and what cause that. And the crash are:
So it looks like Daniel guess at begining: crash in glDrawArrays, inside the vertex-attrib setup, which most likely means that the vertex memory / parameters are invalid.
As usuall it crashes only on our side, on Pandora there is no crash, so probabaly gl4es is ruled out as usual.
We of course tried to debug it a bit with ptitSeb, but he say that's crash are strange, because glRasterPos2d(x, nFinalY) internally in gl4es, will just call glRasterPos3f(x, nFinalY, 0.0f), so there isn't any real call to OGLES2 driver there, only matrix maths... Seems odd. He guess the coordinates that are setup makes the call to bitmap_flush() later crash 'probably durring the call to blitTexture().
@kas1e Sorry for the delay. I've been away, and typing replies on a tablet is pretty slow...
Judging from the crash log it's likely that it's trying to use a non-existant vertex attribute array. It's loading data from address 0xFFFFFFFF (i.e., the end of addressable memory).
Warp3D Nova is nowhere to be seen in that crash log, so the problem is most likely somewhere else.
@Hans Porting some stuff, which works sadly very-very slow. I start to debug why so, and found that on serial i have lots (not _that_ lots, but many), error of such kind:
Quote:
WARP3D_SI.library: ERROR: Interleaved arrays with different strides detected in VBO 0x00000000
What did it mean ? I mean, what did it mean that app do which should't do ? Maybe gl4es doing something which should't or should do instead, but how to understand what/where/when ?
Also question: is CubeMaps works on nova without problems ? I remember we disciuss it somewhere, that while nova say it has it, it still didn't or something of that sort ?
That error message means exactly what it says: the VBO has arrays that are interleaved but the strides (how many bytes between one element in the array and the next) for are different for different attributes. When vertex attribute arrays are interleaved, *all* of the interleaved attributes must use the same stride or the arrays will interfere with one another.
There is one exception to this rule, and that is if one of the attribute arrays has a stride of 0 (which means that it'll read the same value for all vertices. The endianness conversion code doesn't know how to handle that case, though.
Does the app in question sometimes set a stride to 0? Please file a bug report for the missing case.
Quote:
Also question: is CubeMaps works on nova without problems ? I remember we disciuss it somewhere, that while nova say it has it, it still didn't or something of that sort ?
The API supports it, but none of the drivers do. Warp3D Nova correctly reports that cube-maps are unavailable when you query that feature.
Mmmm, I don't see how interleaved array with different strides can work.
So, there is either something very wrong in some drawing command (but those drawing command must produce wrong result), or the warning is incorrect (and maybe there are interleaved array mixed with non-interleaved array, and the driver doesn't like that).
Not sure how to put some check to identify the source of the issue.
Ok, will check this out and create BZ if it will be indeed the case.
Please create a BZ even if it's not the case. We should allow the special case where one array's stride is 0.
Quote:
Btw, can those errors be cause of pauses, i mean, broke fps and co ?
The debug output will definitely slow things down. Especially if you have debug output redirected to serial.
Quote:
Mmmm, I don't see how interleaved array with different strides can work.
So, there is either something very wrong in some drawing command (but those drawing command must produce wrong result), or the warning is incorrect (and maybe there are interleaved array mixed with non-interleaved array, and the driver doesn't like that).
It's pretty much impossible to stuff up if(stride != baseStride), and the driver can handle VBOs with both interleaved and non-interleaved arrays.
Another possibility is that the game/app is leaving an old array enabled that the shader doesn't use. That could trigger the error message.
@Hans I think that one array probably have a stride of 0 (and is not interleaved) while other arrays are interleaved (with a stride != 0). But I haven't checked (c) ptitseb.
ptitSeb: So, there is either something very wrong in some drawing command (but those drawing command must produce wrong result)
He's right. You'd immediately notice a wrong stride, at least if the respective VA is actually used.
Quote:
Mmmm, I don't see how interleaved array with different strides can work.
As Hans indicated there is a special case, namely stride 0. ogles2.lib uses such a stride 0 to implement non-varying vertex attributes, e.g. one single color for a whole mesh. To be concrete: whenever you do a glVertexAttribXf call, this is going to result in a VBO's data slot with a 0-stride. The alternative would be to "replicate" the VA-value N times where N is the number of vertices - not a good idea, of course.
If Nova really produces that warning for 0-strides, then it is extremely likely that this is the reason for it in your case.
@Hans This 0-stride thing is extremely common and vital for efficient handling of single VAs. It must not create a warning, it'll just pollute the warning stream.
Quote:
We should allow the special case where one array's stride is 0.
Also note that it's not just about one array's stride. Any number of arrays are allowed to have a 0-stride. In general you can say that a warning only makes sense if there are 2 or more arrays with a different stride != 0 inside one VBO.
Edited by Daytona675x on 2019/1/17 8:45:45 Edited by Daytona675x on 2019/1/17 10:23:36
@Hans,Daniel Btw, ptitseb says that it can be that some "strided" value are mixed with 0-stride arrays : this is possible, if the base pointer are different, that's completly legal.
Also, all unused VA are turned off by gl4es, so there can't be unused VA
Btw, ptitseb says that it can be that some "strided" value are mixed with 0-stride arrays : this is possible, if the base pointer are different, that's completly legal.
Of course. But he forgets / mixes up two things:
1. that glVertexAttribPointer with a stride 0 actually means "stride sizeof(attribute)". It is not the same as a true stride 0. It is not possible for him to actually define a true stride 0 array by that. So this is not what we're talking about.
2. that his array definition has nothing to do with the internals inside ogles2 if he uses old-school client-memory instead of VBOs. So when he provides a packed (stride 0 -> sizeof(attrib)) client-memory vertex attribute array, this is usually be stuffed into an interleaved internal VBO.
So in short:
1. only if he uses one of the glVertexAttribXf functions then this will result in an internal VBO array slot with a true stride 0 (stride 0 essentially means "repeat that attribute for all vertices" by not incrementing the data pointer).
2. especially if he creates and setups his own VBOs, then he can of course build (and mess up ) whatever he wants, with all sorts of funny and wrong strides - but still not a true stride 0!
3. if he doesn't use his own VBOs then ogles2 will internally create packed interleaved VBOs out of the different enabled vertex arrays.