@Hans I ask ptitSeb about "does gl4es convert eveything for 32bit?", and yes, except for this one (and technically, it's only accepted when size=4, so RGBA color fit in a 32bits data space, but yeah, it's 8 bits data basically).
But, GL_UNSIGNED_BYTE are no "endian-sensitive": there are the same in big endian and little endian, so it should not be an issue, unless GL_UNSIGNED_BYTE are indeed not supported at all by Warp3d driver because only 32bits data is implemented.
ptitSeb can add workoround to gl4es for example, so to be sure no GL_UNSIGNED_BYTE are used (of course, that will convert a few data, so it would be better if Warp3d accept 8bits datas, speed-wise).
Btw, there is also another case of non 32bits data in gl4es: the elements indices used in glDrawElements can be GL_UNSIGNED_SHORT (so 16bits), but this seems supported (?) or nothing would work at all. Strange through, if , as you say only 32bit supported.
So.. as 8bit are the same for little/big endians and if warp3d nova have no support for, probably adding just 8 bit will be easy ?
There are the question to awnser to understand what the next step should be.
Does OGLES2 transfert GL_UNSIGNED_BYTE as is or does it transform to GL_FLOAT ?
if NO: can GL_UNSIGNED_BYTE (with normalization) be implemented in Warp3d.
If YES: there is something odd in that conversion
if NO: ptitSeb can made a workaround in gl4es (unless OGLES2 handled that, as GL_UNSIGNED_BYTE support by GLES2 driver is mandatory).
If YES: lets wait for it.
PS. But any workoround mean loose of speed, of course.
EDIT: taking another close look at one of the two readmes, in the "known limitations" section of the one, reveals that there is indeed information about such limitations. Namely only 32bit float vertex data is supported, which renders 4x8 bit ubyte color VAs offcially unsupported indeed.
Okay, I have no idea how I could overlook this. Most likely because a) the reason mentioned regarding endianness makes no sense in this case, b) because it (seemed to) worked so far, c) because it's something so basic and d) because we had lots of talk about ubyte colors and the idea of this not being supported at all didnt pop up until now.
So, due to the fact that I overlooked it and did not take care of this limitation (if it exists for real, as being said, so far it seemed to work), I will add a workaround for it to ogles2. This will probably result in a severe performance penalty, especially if using the lib in a non-VBO-way.
However, now that I became aware of this limitation: IMHO this should be the very very first entry on the Nova todo-list! 4x8bit color (or similar) VAs are used extremely often to have compact vertex structures, so this can be considered an absolute basic building block.
Also the doc should be edited to not show completely misleading / false information. The doc is what people use when coding and this limitation requires a fat warning note in VBOSetArray "only supports this and that for now".
Before EDIT: @Hans Quote:
However, the W3DN_SI driver currently can't handle anything other than 32-bit datatypes (it says so in the readme). So that excludes GL_UNSIGNED_BYTE.
The very latest Nova 1.53's readme from two days ago suddenly contains a new information saying:
Quote:
Support 32-bit integer vertex attributes (both normalized and unnormalized). NOTE: Only 32-bit datatypes supported at present
So from this it now sounds as if e.g. VA colors composed out of 4 unsigned char / GL_UNSIGNED_BYTE suddenly became unsupported now, excluded en passant with this readme entry.
But: This is real news. OGLES2 uses ubyte VA colors with normalization since day one without issues (well, at least I thought so...). And so far no readme or doc of Nova contained info saying that this was not supported! Also, although I'm not using it: until two days ago I also believed that 32bit VA ints would work. The docs didn't give any reason not to think so. So what's presented as news in the latest readme is sth. everybody reading the docs until now expected to work anyway.
The only limitations of Nova of that style that were known until two days ago regarding 32bit was if using the data as indices for vertex data. However, even that limitation has been lifted long ago, so that 16bit indices are possible natively. And then there was / is the "recommendation" to internally align every VA to 32bit.
That was it. There was zero hint until now that unsigned byte VAs and thus the typical RGBA8 colors won't work or won't work reliably.
Quote:
So the driver has to convert the endianness as its copied to the GPU.
As kas1e correctly said, there is no endian problem with single unsigned bytes. There is also no endian problem with 4 RGBA unsigned bytes. The client application has to take care that the components are in the correct order, neither ogles2 nor Nova have to be "smart" here. Endian-issues are no explanation for missing / broken unsigned byte VA support. Besides that (although not of interest in this case here) the doc for VBOSetArray explicitely says that calling it before VBOLock would "allow the driver to know what endianness conversion to perform beforehand". It does not mention that such endian conversion is not implemented or broken.
Quote:
Right now it assumes that everything is 32-bit, and it probably returns an error if you try to use 8/16-bit vertex data.
"Probably"? Until now Nova did not return an error. It simply swallowed such VA colors / VBO layouts and it worked. Well, unless the semi-random vertex-trash e.g. Q3-gl4es generates under certain circumstances and which keeps us puzzled for a long time now is such a "probable" effect... Which in turn would raise the question why we weren't informed that such VA attributes are potentially broken in Nova despite the tons of discussions / reports on this topic.
Quote:
So long as you're feeding it with floats and have given it the right pointers, strides, etc., it works.
Now it's even floats only all of a sudden?
Sorry, man, clarification please! Apparently you cannot rely on the information in the docs at all? Simple question: which are the allowed / (not just probably) functional values for the following parameters of VBOSetArray?
W3DN_ElementFormat elementType: so far the doc didn't forbid any (the before-mentioned index limitation aside). Nothing about 32bit float only.
BOOL normalized: for which elementTypes is it working? So far the doc said that it worked for ALL signed / unsigned integer types; actually for everything but floats.
Hopefully this is all just some kind of weird misunderstanding
Does OGLES2 transfert GL_UNSIGNED_BYTE as is or does it transform to GL_FLOAT ? if NO: can GL_UNSIGNED_BYTE (with normalization) be implemented in Warp3d. If YES: there is something odd in that conversion if NO: ptitSeb can made a workaround in gl4es (unless OGLES2 handled that, as GL_UNSIGNED_BYTE support by GLES2 driver is mandatory).
ogles2 followed the Nova docs and what seemed to work until now... As such it does not do any conversion of GL_UNSIGNED_BYTE VAs, of course.
If it turns out that Nova cannot handle this, then I am going to interprete this as a severe Nova bug. I won't add any workarounds in ogles2 for this but wait until it gets fixed, sorry.
EDIT: see EDIT of previous post. In the meantime I found out that I really overlooked a limitation mentioned in one of the readmes. So in contrast to what I said above I will add an internal uchar8 converter to ogles2. This will stay active until this gets fixed inside Nova. Don't add such a conversion to gl4es, it really makes no sense to mess around there.
EDIT #2: The first workaround has been added, check it out on my FTP, ogles2_wtf_ubyte_1.zip This version will internally convert every GL_UNSIGNED_BYTE VA for client memory usage. For safety I am converting to float internally, I'm not relying on this new 32bit integer support, because I better prefer to do my own normalization and because that way it should work with previous Nova versions too. Note that I'm only patching GL_UNSIGNED_BYTE for now since this is the one that's used in 99% of the non-32bit use-cases.
Next to come is the patching of VBOs supplied by the client application. This will be really ugly since internally a totally different looking VBO has to be created and maintained as soon as there's at least one GL_UNSIGNED_BYTE VA. :P
Edited by Daytona675x on 2018/6/11 15:24:47 Edited by Daytona675x on 2018/6/11 17:03:00
@Daniel Tested new library, results are : both irrlicht engine example and q3 with enabled extensions start to show things and almost works ! I.e. they works, just something with colors happens.
That how irrlicht engine looks like (our hacked binary at left, and non hacked, but over new patched ogles2.library, at right). It looks about correct, just colors differs (looks like endianes issues?). See in patched version it more "black", and hello world words at left-top are purple:
(press rmb in new tab for full size)
And that how looks like q3 with enabled extensions:
@kas1e With extensions enabled? Note that this first workaround-version only tackles the situation if you don't use VBOs of any kind (by that I mean ogles2 VBOs)! So what this first workaround is supposed to patch is if e.g. gl4es does a call like
where "pointer_to_first_color" is really a pointer and no VBO is bound. So it's the typical old-school vertex-data-via-client-RAM-pointer setup.
Your statement about "enabled extensions" somehow sounds as if you're trying to use VBOs? That's not done yet.
Anyway, for the above situation ogles2 already has to manage an internal (hidden) Nova-VBO. Therefore the workaround is not that hard (it only costs performance and RAM is being wasted): if such a glVertexAttribPointer call is made, then copy-convert to float (incl. eventual normalization) and tell Nova that it now got std. floats in the internal VBO, instead of simply copying it over.
I have no idea where endian-issues should come from (just as I don't know why not supporting uchar8 x 4 has anything to do with any endian-issues in the first place). For Nova those are simple floats now, just like e.g. the xyz coordinates.
Anyway, check out the fresh version on the FTP. I found a typo that probably was the reason for this ;)
Sorry for the confusion. I'll do my best to clarify, and also correct what I've been saying (got myself confused too ).
1. The big endianness issue concerns VBOs containing data of mixed sizes. If you try mixing byte, 16-bit and/or 32-bit data in one VBO, then it will fail. However, stick a 16-bit index array in its own VBO (or a VBO with only 16-bit data), and it *will* work. I just checked the code, and that is handled correctly.
So the "32-bit data only" statement isn't strictly true any more, and hasn't been for a while. Will have to update that...
NOTE: DBOs are still 32-bit only.
2. The second limitation is that the driver used to treat all Vertex Attributes (VAs) as 32-bit floats. The latest beta sets the VA's attributes correctly now, so it'll treat ints as ints, uints as uints, etc.
This is where I got a myself confused because I thought the hardware would treat floats and ints differently. However, 32-bit ints and floats are handled the same way: they get passed on to the shader unchanged (i.e., int VAs must go to an int shader input). So, 32-bit int VAs have probably been working all along.
The latest beta also correctly sets the VA descriptor for 8 and 16-bit attributes, including whether it's normalized. So, it should work provided you restrict each VBO to having one data size only (8-bit data in one VBO, 16-bit data in another, etc.). I haven't tested that, though, because I'd completely forgotten that my endianness handler was a bit more sophisticated than "32-bits only." Let me know what happens if you try it...
Quick summary: restrict each VBO to data of one size each (8-bit data in one, 16-bit in another, etc.). The latest beta sets the VA descriptors correctly based on type, so VAs of all types might work provided the restriction above is observed (untested so far).
My apologies for the confusion and gaps/errors in the docs. I hope that it's clear now, and will update the docs.
Anyway, check out the fresh version on the FTP. I found a typo that probably was the reason for this ;)
Yes! Now irrlicht engine exampe and q3 with enabled extensions (so use the gldrawelements, not glBegin/glEnd route) works fine ! There current results:
@kas1e Great! Did you check the performance impact of the workaround (by comparing it to a version where you did the ubyte-float-conversion of the color inside gl4es)? Next to come is the workaround for VBOs. Do you have sth. for testing here too?
@Hans Quote:
Wish I'd thought of the existing Nova limitations sooner.
And I wish I had not overlooked this limitation info in this readme
Quote:
Sorry for the confusion. I'll do my best to clarify, and also correct what I've been saying (got myself confused too ).
And sorry for my initial rant Although it's a real hefty limitation that came as a big surprise to me, it was my fault to miss the note in the first place.
Quote:
NOTE: DBOs are still 32-bit only.
Luckily that's no problem. The respective commands of ogles2, namely glUniformXX, only operate with 32bit-floats and 32bit-ints. So this limitation "only" affects VAs, and out of those only VAs that are defined via glBufferXXX and glVertexAttribPointer (the glVertexAttribXf functions only work with floats).
Quote:
Quick summary: restrict each VBO to data of one size each (8-bit data in one, 16-bit in another, etc.). The latest beta sets the VA descriptors correctly based on type, so VAs of all types might work provided the restriction above is observed (untested so far).
I could do this for the internal client-RAM-emu-VBOs, since I have full control over those. However I already made the workaround for that situation, which seems to work well looking at kas1e's latest feedback, which essentially patches the data when the emu-VBO is being built up internally, no need for extra VBOs here.
For "real" VBOs supplied by the user via glBufferXXX etc. this could be exploited though. However in my current wip workaround for this I already took another route: when I detect that a VBO contains critical element-types, then I create one additional internal "sub"-VBO and copy-convert the critical data over to that one when a draw-call with that VBO is being issued and if there have been data modifications since the last sub-VBO-refresh. So it's one sub-VBO with x float-arrays for all x critical arrays of the user-supplied VBO. Patching the original VBO is a no-go, of course. Just imagine the fun if somebody does a partial glBufferSubData... Adding the patched arrays at the end of the original VBO is no good idea neither, this just complicates all the internal book-keeping. The solution I'm implementing now is the "easiest" in this regards.
In both cases interleaved and linear memory layouts are supported, of course.
However, I won't put too much effort into optimizing those workarounds. I hope that this is only a temporary necessity and that it can be removed rather soon again
Great! Did you check the performance impact of the workaround (by comparing it to a version where you did the ubyte-float-conversion of the color inside gl4es)?
In case with q3 exactly, in 1600x1200, differences is about 3fps. I.e. workaround done in ogles2 better, and give 3+ fps in compare when we has workaround in gl4es. Probably when it all will be done in hardware, in warp3d, and without workarounds, it will give us another little speed increase for few more fps ?
Quote:
Next to come is the workaround for VBOs. Do you have sth. for testing here too?
Not sure to be honest.. ptitSeb added some initial support of VBO some time ago, when we test q3, but at that time it give no speed differences at all (and i rechecked now, its still make no differences in quake3 at least), so i just think it didn't works at moment in gl4es.
As i understand all VBO support in gl4es for now emulated. And that what was added as "initiall support of VBO" , or not works , or works wrong. I will ask him about again, maybe now he can have a look at this more deeply.
Luckily that's no problem. The respective commands of ogles2, namely glUniformXX, only operate with 32bit-floats and 32bit-ints. So this limitation "only" affects VAs, and out of those only VAs that are defined via glBufferXXX and glVertexAttribPointer (the glVertexAttribXf functions only work with floats).
That'll change when uniform booleans are finally supported.
Quote:
However, I won't put too much effort into optimizing those workarounds. I hope that this is only a temporary necessity and that it can be removed rather soon again
Yes, hopefully.
@kas1e Quote:
Not sure to be honest.. ptitSeb added some initial support of VBO some time ago, when we test q3, but at that time it give no speed differences at all (and i rechecked now, its still make no differences in quake3 at least), so i just think it didn't works at moment in gl4es.
I'm pretty sure that Quake III does NOT use VBOs. GL4ES might use VBOs when implementing the compiled vertex array extension, though.
@Hans Yeah, i mean usage of real VBO in gl4es when compiled vertex array is used, and that should made difference in q3 as well when extensions enabled, and that what i test when says about VBO.
Anyway, ptitSeb say it isnt done yet, so all VBO in gl4es in whole only emulated for now, and that what was done as initial support of VBO, was just fast-test-hack, so -> no VBO at moment
That'll change when uniform booleans are finally supported.
Yes, if Nova / the damn SI then expects 1-byte DBO data internally (and from what you told me it will ), then this will become an issue. Luckily the respective DBO-update-code is well isolated inside ogles2 and integrating a type conversion of this kind should be rather (!) easy.
@kas1e Quote:
In case with q3 exactly, in 1600x1200, differences is about 3fps. I.e. workaround done in ogles2 better, and give 3+ fps in compare when we has workaround in gl4es.
Hehe But yes, actually something like that was to be expected after looking at the workaround in gl4es. ogles2.lib does the conversion on the fly while copying the data to the internal VBO. That this is faster than to loop over the memory twice is no wonder.
Quote:
Probably when it all will be done in hardware, in warp3d, and without workarounds, it will give us another little speed increase for few more fps ?
Of course it will be faster to don't do those workarounds, but I can't say by how much. It depends on how often the respective game or lib triggers it per frame and how big the data is (that's of course also of interest when it comes to sending that stuff to VRAM, e.g. 4 bytes vs. 16 bytes per color per vertex). And it depends on how the game works: if a game would use true VBOs (which aren't modified all the time) then the impact would probably be near to zero, because all the conversion and sending to VRAM would ideally be done only once per polygon-soup.
Quote:
Anyway, ptitSeb say it isnt done yet, so all VBO in gl4es in whole only emulated for now, and that what was done as initial support of VBO, was just fast-test-hack, so -> no VBO at moment
Alright, then I'll stick do my boing-ball test for now
Do you have in mind new progs or games that may now be ported to OS4 with a working GL4ES ?
In general everything opensourced, which fit our deps (like no XWindow, no QT5, no heavy boost usage, but which use SDL1/SDL2 and opengl up to 3.x).
Probably everything which done for minigl before and was slow can be reported for sake of tests, to keep those ones which will be faster. But in case with minigl probably better wait once Daniel's minigl-reloaded will be done. So everything will works over ogles2/nova, and no need to recompile (and gl4es version probabaly will be no faster than Daniel's minigl-reloaded).
From top of head the new things which can be ported over gl4es are:
New version of Blender (at least version 2.68 for sure was tested with gl4es, and it works. ptitSeb through didn't tests latest ones which needs shaders, but should work. Of course, if those shaders will not use arrays, as arrays in shaders didn't supported in nova at moment).
Everything which done over Irrlich Engine. Almost all examples from it already works, and i even ported supertuxkart 0.8.1, which half of working, but that not because of problems with gl4es, but because patches boredom in that version of supertukkart are ugly. They somewhere used "//", somewhere "\/", somewhere another way, and there no single rule, all looks like mess, but game in terms of rendering of graphics for sure works. Through, without game's shaders, as they use arrays, and nova didn't support them. But game can be runs without their shaders anyway.
Also those ones works over gl4es: Foobillard++, Zyn-Fusion, OpenRA tiberian sun, Astromenace , freeorion, Minecraft, OpenMW, SeriousSam (both First and Second Encounters), RVGL (ReVolt GL), parsec47 and so on.
In other words, there almost no restriction about opengl now, except one: shaders in Nova didn't support arrays, so, any opengl stuff which use shaders, which use arrays, will not works till arrays support in nova not done. In others: no restrictions anymore. Sure, some quirks can arise, but if they gl4es related ptitSeb fix it all fast always.
All work there done by 3 persons: Hans, Daniel and ptitSeb. All i do its only put pieces together and report bugs, nothing more :)
You're undervaluing the work you're doing. Thanks to you we have a GL4ES port progressing, which will open up a lot of extra software. Plus, it's helped test GLES2 and Warp3D Nova more extensively, and that's resulted in both fixes and improvements.
* Warp3D Nova and OpenGLES 2 library have had major updates including significant work towards enabling the recent GL4ES wrapper and game ports to work **
@Spectre660 Yes, now when its all in hands of users, i can at least release quake3 and neverball/neverputt ports :) While quake3 didn't offer any speed wise improvements (even a little slower than minigl ones), but it give better rendering in few places. But in the same time, neverball/neverputt are 2x times faster, which is worth of course.
In reality, in what i hope now, is that now i can just make LettersFall game port public and fix issues in it: distored and buggy textures in game on my setup, but not on Daniel's setup, which mean its something random and dunno to what related. On other (non amiga) hardware all is ok of course. Maybe with more tests from other's we will be able to find the roots of issue.