GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

	Bottom Previous Topic Next Topic
Register To Post

« 1 ... 30 31 32 (33) 34 35 36 ... 43 »

Daytona675x

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/6 10:00 #658

Not too shy to talk

@kas1e
Those two examples trigger one of the many performance boosters in ogles2.library without which things would crawl to resize the internal multibuffer's VBOs all to rather huge sizes. I added a simple safety mechanism now to avoid such situations. This quick workaround comes at a rather huge performance penalty (only) for such situations though. I will improve that for the next official ogles2 version.

@Hans
However it's important to note that there is no memory-leak or bug in ogles2.library!
All those allocations are legal, Nova always reports "success". What is causing the crash here is something else, pretty interesting actually:

As being said, all VBO allocations are reported as being successful by Nova. This is true even if the physical gfx memory is already practically fully depleted. The stuff still runs, albeit with a significant slowdown. Something else is causing Nova to freeze the system then, namely a call to DestroyVertexBufferObject. This is what happened in ogles2: when gfx RAM was already in critical areas, yet another VBO was in the process of being resized - which involved its prior destruction before its recreation, bang.
Note that the call to DestroyVertexBufferObject is fully legal and the respective VBO is not in use anywhere or whatever.

So, to sum it up: Nova freezes if you call DestroyVertexBufferObject when gfx RAM is low, at least.

EDIT: filed this as new W3D Nova bug report 0000447.

Edited by Daytona675x on 2019/9/7 9:56:15
Edited by Daytona675x on 2019/9/7 9:58:53

[Facebook] [YouTube Channel] [ko-fi]

kas1e

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/12 19:16 #659

Home away from home

@All
Ok so, for that example from Irrlicht called TerrainRendering we are done : Daniel add workaround for Nova bug when Nova freezes if you call DestroyVertexBufferObject when gfx RAM is low, as well as he added that safety mechanism to avoid such situations.

But now more news about : PtitSeb add real use of VBO for glBindBuffer() (at this time only for it), so, all the apps/games which use it much, benefit from it. For example, in FrickingShark i got +10fps more (not a lot, but still), but in that TerrainRendering example, it is now 550 fps instead of 50 (!) yeah, in 10 (ten) times faster.

But in case with TerrainRendering example, it was also slow and on Linux with gl4es. So, it was gl4es slow things down this time. But now all Irrlicht examples have on linux with gl4es almost the same speed as on linux with Mesa, so it didn't add as a layer speed issues anymore.

Now, probabaly the last thing i want to understand (and maybe it possible to speed up it somehow) with Irrlicht, is why some Irrlicht examples are 2-3 times slower, than on 10 years old AMD 1.6 ghz with some shity-inbuild gfx card.

To explain it more, we have in Irrlicht few different modes of rendering : OpenGL and Software Rendering.

Now, let's see figures of software rendering , on 3 different machines:

1). notebook icore_i5 2.7ghz + intelHD 620 (win10)
2). notebook AMD e360 1.6ghz with HD 6300m (winxp)
3). X5000 2ghz RadeonHD r7-250 (last aos4)


Software Rendering:

   

   Example                     icore_i5     AMD e360    X5000

                                                  

02.Quake3Map                     92           28          29

03.CustomSceneNode              283           93          85

04.Movement                     236           72          67

08.SpecialFX                     88           24          30

09.MeshViewer                   121           38          37

10.Shaders                      241           72          74

11.PerPixelLighting              93           29          29

12.TerrainRendering             124           34          34

13.RenderToTexture              193           59          60

15.LoadIrrFile                  176           47          54

16.Quake3MapShader               82           24          23

18.SplitScreen                   56           17          16

20.ManagedLights                231           67          69

26.OcclusionQuery                72           28          26

As we can see, x5000 by hardware specs on the level of 10 years old amd 1.6ghz notebook.

That time, we didn't take in account opengl, or warp3d, or anything of that sort. It is pure software rendering. And only what have impact there , is CPU, graphics.library and radeonHD driver.

Through if think more about it, its probabaly ok. Everyone know that x5000 is 10 years behind of current computer world, so, kind of ok.

Now, taking those results in mind, and knowing that we on the level of amd 1.6ghz , we expect x5000 with opengl rendering be at least on the same level (of course, as we have better gfx card, it is wish to be faster, but its ok to be just on the same level).

And there is table , with OpenGL. The same 3 machines are used:

1). notebook icore_i5 2.7ghz + intelHD 620 (win10)
2). notebook AMD e360 1.6ghz with HD 6300m (winxp)
3). X5000 2ghz RadeonHD r7-250 (last aos4)

But also added there results from icore_i5 (first config) under Linux with MESA, and under linux with GL4ES (so to see that GL4ES is ok, and on the same level as MESA and we can't blame it for some issues i will point out now). Also i added MiniGL results (i was able to made Irrlicht works for MiniGL somehow, which offten crashes with it, have rendering bugs, some example's didn't work either, but it still something to compare with).

So, table:


OpenGL Rendering:



   Example              icore_i5 (Linux,MESA)    icore_i5 (Linux,GL4ES)   icore_i5 (win10)    AMD e360 (winxp)    X5K (GL4ES)    X5K (MiniGL)



02.Quake3Map                    1182                     1070                   1023               342               152             87

03.CustomSceneNode              3374                     3177                   2671              1316              2462           1576

04.Movement                     2610                     2412                   2071               756              1280            492

08.SpecialFX                    1459                     1359                    911               306               275             55

09.MeshViewer                    834                      841                    887               237               189            165

10.Shaders                      1954                     2005                   1457               658               723        (no shaders)

11.PerPixelLighting             1328                     1356                   1141               393               452             57

12.TerrainRendering             1625                     1597                   1423               567               570             44

13.RenderToTexture              1932                     1776                   1915               482               709         (cant start)

15.LoadIrrFile                  2160                     2033                   1835               706               890            522

16.Quake3MapShader               657                      634                    610               170                90             53

18.SplitScreen                   409                      370                    480               110                37             31

20.ManagedLights                1255                     1342                   1290               400               253            265

26.OcclusionQuery               2898                     2618                   2152              1184              1378             61

What we can say there ? For first, MiniGL is suck. Only 2 examples at least on the same speed level (09.MeshViewer and 20.ManagedLights). For others its just too slow indeed.

Next thing we can notice there, that X5000 with GL4ES, with some examples pretty much faster that old AMD notebook (at least that what we expect when we have modern RadeonHD), like 03.CustomSceneNode, 04.Movement and most of them if not on the same level, but a little bit faster there and there.

Issue i see now, is that some examples show pretty degradate results, which i want to discuss and find out why: maybe it will be again possible to speed up somehow/somewhere , or at least we can find WHY.

Examples about which i told are:

02.Quake3Map (slower on 50% than even on 10 years old AMD with shiti intel gfx card)

16. Quake3MapShader (again slower on 50% than 10 years old notebook, but i assume its the same issue as with first example). But with that example, DISABLING of compositing make it be 90 fps, while with ENABLED compositing it is around 60-65 (maybe that will point out on something).

18. SplitScreen (slower in 300% ! Example again load quake3map, and split screen on 4 parts, and rendering happens independent in each).

So those 3 examples probably have the same single issue (i hope), as all of them use quake3map.

And, last one:

20.ManagedLights. That one the same as MiniGL one by speed, nothing changes, so i assume there is that issue with GART this time, or non-dma in graphics.library for x5k.

Now, to have something to discuss, i firstly upload all those 4 examples ready to run, so everyone can try:

http://kas1e.mikendezign.com/aos4/irrlicht/irrlicht_slow_tests.lha

I am in interest if any X1000 user (so with DMA in graphics.library) can run them, and see the maximum fps they have in all of them (FPS is writen in window title). So we can avoid or not avoid that its because of DMA. Quiting from examples can be done via close gadget or via "alt+f4". Run it from directory where they are (bin/amigaos4) as they want root's "media" directory.

Next, i made a tracing/profile for all those 4 examples via today's glsnoop which catch almost everything now (Capehill, thank you very much for that!). Profilings for both warp3dnova and ogles2 are at end of files

02.Quake3Map : trace_profile

16.Quake3MapShader : trace_profile

18.Split screen : trace_profile

20.ManagedLights : trace_profile

So, very wellcome any kind of analysis and we probabaly can understand wtf happen with those examples.

Thanks !

Join us to improve dopus5!
AmigaOS4 on youtube

kas1e

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/12 20:45 #660

Home away from home

@All
PtitSeb made a capture of 02.QuakeMap:

(press open in new tab for fullszie):
Resized Image

To be honest, the capture is very clean. Only 27 drawing call in the entire frame! No suspicious state change. Only the bare minimum per drawing call and that's all. And most drawing are using a quite simple shader that do multitexture rendering:
Vertex shader:


#version 100

precision mediump float;

precision mediump int;

uniform highp mat4 _gl4es_ModelViewProjectionMatrix;

attribute highp vec4 _gl4es_Vertex;

attribute lowp vec4 _gl4es_Color;

attribute highp vec4 _gl4es_MultiTexCoord0;

attribute highp vec4 _gl4es_MultiTexCoord1;

// FPE_Shader generated

varying vec4 Color;

varying vec2 _gl4es_TexCoord_0;

varying vec2 _gl4es_TexCoord_1;



void main() {

gl_Position = _gl4es_ModelViewProjectionMatrix * _gl4es_Vertex;

Color = _gl4es_Color;

_gl4es_TexCoord_0 = _gl4es_MultiTexCoord0.xy / _gl4es_MultiTexCoord0.q;

_gl4es_TexCoord_1 = _gl4es_MultiTexCoord1.xy / _gl4es_MultiTexCoord1.q;

}

Fragment shader:


#version 100

precision mediump float;

precision mediump int;

// FPE_Shader generated

point=0

varying vec4 Color;

varying vec2 _gl4es_TexCoord_0;

uniform sampler2D _gl4es_TexSampler_0;

varying vec2 _gl4es_TexCoord_1;

uniform sampler2D _gl4es_TexSampler_1;

uniform float _gl4es_TexEnvRGBScale_1;

void main() {

vec4 fColor = Color;

vec4 texColor0 = texture2D(_gl4es_TexSampler_0, _gl4es_TexCoord_0);

vec4 texColor1 = texture2D(_gl4es_TexSampler_1, _gl4es_TexCoord_1);

vec4 Arg0, Arg1, Arg2;

fColor = texColor0;

Arg0 = texColor1;

Arg1 = fColor;

fColor = Arg0 * Arg1;

fColor.rgb *= _gl4es_TexEnvRGBScale_1;

fColor = clamp(fColor, 0., 1.);

gl_FragColor = fColor;

}

His first guess was is the time difference is converting from BigEndian to LittleEndian to send the ever changing Vertex Arrays data to the GPU.

But yeah, sure that conversion take place, but it can't take THAT MUCH. I mean, old amd 1.6ghz with some slow intel-gfx card give us ~350fps, and on our setup ~150fps. I can't belive that such a conversion can take that much time. Its probabaly something else ?

Then we checked tracing/profiling logs, and can see that it's the Drawing itself that is the main bottleneck, not the VBO creation/handling it seems (well, it takes some time too, but ok-ish).

And currently run out of ideas. Its a pretty simple things as can be seen from log, through, that a big diffrence bettwen 10 years old amd with shity gfx-card. So something wrong somewhere, and we can't see why and where.

Edited by kas1e on 2019/9/12 21:44:14

Join us to improve dopus5!
AmigaOS4 on youtube

TearsOfMe

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/13 10:34 #661

Just popping in

Here are my X1000 results:

02.Quake3Map ~114 FPS
16.Quake3MapShader ~60 FPS
18.SplitScreen ~28 FPS
20.ManagedLights ~235 FPS

PROCESSOR: P.A. Semi PWRficient PA6T-1682M
VERSION:
Kickstart version 53.89
Exec version 53.89
Disk version 53.15
graphics.library 54.226 (13.09.2016)
RadeonHD.chip 2.22 (25.03.2017)
Warp3DNova.library 1.65
ogles2.library 2.8

GfxCard: RADEON HD 7800

kas1e

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/13 11:21 #662

Home away from home

@TearsOfMe
Thanks. So its not because of missing DMA on x5k. Less to check then..

But from trace logs it visibly that bottleneck is drawing function itself. But why, and what make it be THAT slower than on old amd1.6ghz with crappy inbuild gfx board that unknown still.

Join us to improve dopus5!
AmigaOS4 on youtube

thellier

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/13 13:13 #663

Not too shy to talk

@kas1e
>not the VBO creation/handling it seems

Are you sure ?
(When I coded Wazp3D57 with Nova rendering) I have tried differents methods for updating a VBO but it seems to be slow

Perhaps having a patched MiniGL that will update (say) 11 times the VBO will allow to know how much time a VBO update is REALLY during in a REAL program
(delta time / 10)

I mean when I was testing Cow3D on Wazp3D57-> Nova it was (say) 80 % of a real waRp3D (massive VBO update but one time) so bandwith seems +- ok

But when I was testing Quake2 (real life test) it was 1-2 FPS ... weird

An other thing:
Also when I was testing a simple raymarching test i found that Nova GLSL seems to have very strange bugs: I mean all is fine when GLSL code is compiled but strange artefacts appears like the GLSL code was computing badly at some pixels (like a rounding fpu bug)in frag shader

kas1e

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/13 13:55 #664

Home away from home

@thellier

Quote:

Are you sure ?
(When I coded Wazp3D57 with Nova rendering) I have tried differents methods for updating a VBO but it seems to be slow

In tracers log i post, you can see at end 2 tables: warp3dnova profile and ogles2 profile. So in warp3d one you can see how much and what function of warp3dnova take time. And creating of vbo there is okish, didnt take all time.

Besides, VBO used everywhere in other games/examples, but only those examples slow that much. So they seems to do something which make our drivers be that slow in compare with even just amd1.6ghz with inbuild intel. I even didnt say about modern computers, but be THAT slow, show that something is really wrong somewhere.

Quote:

Also when I was testing a simple raymarching test i found that Nova GLSL seems to have very strange bugs: I mean all is fine when GLSL code is compiled but strange artefacts appears like the GLSL code was computing badly at some pixels (like a rounding fpu bug)in frag shader

Nova shaders compiler still WIP, but lately it start to be better and better. It sadly have optimisation disabled because of some bug, but visual bugs can be checked and reported, so Hans may fix it.

If you have that raymarching test, we can test it on last nova, then test it on ogles2 on windows (to be sure your code correct), and the create a bugreport. I can help with tests (but lets create another topic about ?)

Join us to improve dopus5!
AmigaOS4 on youtube

Capehill

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/13 17:53 #665

Just can't stay away

@kas1e

I tried 02 QuakeMap test, letting it run for at least 10 seconds and it seems that (roughly):

OpenGL:
- DrawElements 90%

Nova:
- BufferUnlock 50%
- DrawElements 30%

kas1e

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/14 15:50 #666

Home away from home

@Capehill
Yeah, i just wait 15 seconds now too to stabilize all thigs, and in my case i have

OpenGL:
- DrawElements 90%

Nove:
- DrawElements 60%
- BufferUnlock 15%

and all other things take other % bit by bit.

(at least i check that table where i have "% of 1880.049764 ms")

There is my new log (22mb unpacked, 1mb packed):
http://kas1e.mikendezign.com/aos4/irrlicht/02.quake3map_trace.zip

Dunno through what it can say to us.. I mean this time it didn't explain still, why it slower that much in compare with old amd1.6ghz with shiti gfx card

Join us to improve dopus5!
AmigaOS4 on youtube

Capehill

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/14 17:47 #667

Just can't stay away

@kas1e

I should have mentioned that the tracing was disabled in my test. Anyway, hope we can do more comparable measurements soon.

kas1e

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/14 18:01 #668

Home away from home

@Capehill
Quote:

I should have mentioned that the tracing was disabled in my test.

With disabled tracing i have almost same results as you:

ogles2:
DrawElements - 93%

nova:
BufferUnlock - 50%
DrawElements - 30%

We talk about "% of 8344.085574 ms" table right, not about % of CPU time or any other ?

Edited by kas1e on 2019/9/15 6:40:12

Join us to improve dopus5!
AmigaOS4 on youtube

Capehill

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/14 18:09 #669

Just can't stay away

@kas1e

Yes, I mean exactly that column (% of ms).

Our results are now aligned. BufferUnlock takes more time than DrawElements in Nova in that QuakeMap test.

kas1e

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/14 18:31 #670

Home away from home

@Capehill
Talking with ptitSeb about all that, that what he say:

---
VBO are 1 thing: vertices data in VRAM, so in the graphic card memory ready to use. So the main thing that consume time is the time to transfert the data in VRAM. If you reuse those data, you just activate the VBO, no need to transfert the data.

So, when a software/game use VBO, it create one, fill it with data, and then simply use the transfered data (sometimes, it changes pert of the VBO, to update some of data).

The Transfert of data is traditionnaly at the "Unlock" part of the VBO (lock gives you and address where to put the data, Unlock transfert from that address to VRAM).

On the Amiga, the transfert to VRAM can be slow if you don't have some kind of DMA for that (that's the 1st thing), and all, all data need to be in LittleEndian, because the GraphicCard is LittleEndian (so you need to analyse the VBO, to know what data need swapping, and what data doesn't).

So yes, I think this VBO transfert can be a bottleneck.
---

But we also tested with working DMA in graphics.library on x1000, and results are still the same. Maybe by DMA he mean something like GART there, dunno. That DMA in graphics.library probabaly other kind of DMA expected from VRAM transfers ?

Interestengly also, that in the documentation about BufferUnlock of warp3dnova, there is no mention about any big->little endian conversion ..

Join us to improve dopus5!
AmigaOS4 on youtube

Capehill

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/15 9:32 #671

Just can't stay away

@kas1e

http://www.amigans.net/modules/xforum ... id=115027#forumpost115027

Even if graphics.library has DMA support on X1000, is has "only" capability to transfer bitmap data to VRAM (as far as I know). I suppose Nova needs a way to pull data from RAM using DMA and this is what GART would provide.

kas1e

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/15 12:49 #672

Home away from home

@Capehill
But isn't "bitmap" are the same data ? I mean , if there is DMA to transfer bitmap to VRAM (and bitmap are usuall data, or not ?) , then it should the same transfer any other data to VRAM ?

Or DMA in graphics.library its to transfer to VRAM only some specific data which no one use ?:)

Join us to improve dopus5!
AmigaOS4 on youtube

thellier

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/17 14:15 #673

Not too shy to talk

> the same data ? I mean , if there is DMA to transfer bitmap to VRAM

Yes,For Wazp3D57 I have made some code that hack a bitmap transfer to be used to copy vertices to VBO.
(ouuh what a hack but it works )

It changed almost nothing to speed on Sam460 & X5000 but dont know if those machines use dma for that

>about BufferUnlock of warp3dnova, there is no mention about any big->little endian conversion

IMHO what I understood:
There are differents method for updating the VBO
but in fact Nova just read and/or write the data
When you lock it (can) read/reorder the VBO data to a buffer that you will access.
When you unlock it (can) write/reorder the buffer to the VBO data.
As reordering is done Nova side you never accesss to real data that are on the GPU VRAM but on a reordered buffer

You can also do write only (ie write all new vertices values from your buffer)
or read only (ie read some GPU datas)
or read/write (ie change some vertices)

Certainly let Nova do the reordering was not a good idea as datas are then acessed several times (vs a cpu that will write to real GPU vram directly the reordered datas)

See below Nova doc
// W3DN_STATIC_DRAW: Written:(CPU) once Read: rendered many times
// W3DN_STATIC_READ: Written:(GPU) once Read: CPU many times
// W3DN_STATIC_COPY: Written:(GPU) once Read: rendered many times
// W3DN_DYNAMIC_DRAW: Written:(CPU) occasionally Read: rendered many times
// W3DN_DYNAMIC_READ: Written:(GPU) occasionally Read: CPU many times
// W3DN_DYNAMIC_COPY: Written:(GPU) occasionally Read: rendered many times
// W3DN_STREAM_DRAW: Written:(CPU) frequently Read: rendered a few times
// W3DN_STREAM_READ: Written:(GPU) frequently Read: CPU a few times
// W3DN_STREAM_COPY: Written:(GPU) very often Read: rendered a few times

kas1e

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/17 17:05 #674

Home away from home

@Capehill,Theiller

Hans explain those things a bit for me, and that what i understand :

That "DMA support in graphics.library" in Sam440, Sam460 & x1000 is not real DMA , its just a hack, which used their CPU's DMA to speed up those ram->vram transfers , but only for internal use inside of graphics.library , and only for bitmaps. I.e. its "DMA", but CPU's DMA and not real DMA or GART of anything of that sort. Just some little "speed up".

In other words, is of no use for drivers (like for minigl, for warp3dnova, radeon drivers, etc), and more of it , its only for use for those apps which use graphics.library and rely on those parts which is "hacked" inside of graphics.library to speed things ups (named those "graphics.library's bitmaps).

Probabaly, it may help somewhere and not only in benchmarks (at least it was added not just because), but it didn't help at all with drivers, so that explain why we didn't have a single difference when test irrlicht engine examples between x5000 (without that "cpu dma hack") or on x1000 (with that "cpu dma hack"). Those examples done for gl4s, which works over ogles2.library, which works on top of warp3dnova, which in turn, didnt have any kind of DMA acceleration for RAM->VRAM transfers. There is only proper implementation of GART can help.

Quote:

Certainly let Nova do the reordering was not a good idea as datas are then acessed several times (vs a cpu that will write to real GPU vram directly the reordered datas)

As i aware now, warp3dnova's BufferUnlock() do not only writing from RAM to VRAM, but also do endian conversion from big-endian to little-endian (as gfx card is little endian). So we have 2 stop factors there :

1. no real DMA (GART) is used , that mean transfering from RAM->VRAM are slow. We should be happy we even have something usable without it. We at least have 100fps in quake3 without GART, that for sure not bad.

2. Endian conversion inside of BufferUnlock(), may slow all things down, expectually if it didn't compiled with -O3 optimisation enabled by any of reassons. As it also mean buffers, working with them, etc, so pretty possible that add another bottleneck.

All of this probably explain well why some things works on minigl and on gl4es almost the same by speed, like that quake3, lugaru, supertuxkart : all those games do a lot of drawing per frame, which is limited by speed because of no GART, so both minigl.library which working over warp3dnova, and gl4es , limited by the same.

But that code which writen "right", i.e. not thousands of draw calls per frame with lots of data, those ones speeduped well by usage of VBO and co.

Imho, but i think pretty close to truth.

Join us to improve dopus5!
AmigaOS4 on youtube

Daytona675x

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/17 18:15 #675

Not too shy to talk

@kas1e
Quote:

Interestengly also, that in the documentation about BufferUnlock of warp3dnova, there is no mention about any big->little endian conversion

That's right, that documentation is not in the Unlock function's desc but in the Lock functions's

In VBOLock's description it is mentioned that you must call VBOSetArray first to tell Nova the data types and other info so that it knows for which parts of the buffer's data it has to perform endian conversion. Although it's not explicitely stated, you may safely asume that the actual conversion is then done inside BufferUnlock (and then back again in VBOLock if a read is requested, unless the driver keeps a copy of the original data, no idea if it does).
If the conversion would take place later, internally, then this info in the docs wouldn't make sense because then you could tell Nova the data layout later (prior to first use) as well.

@kas1e
@thellier
The fun part with that VBOSetArray convention:
you can (ab)use it to trick Nova into not doing its slow endian conversion. Simply tell it beforhand that the VBO is just a package full of plain bytes

But keep your seat, yes, it works, but unfortunately another Nova-slowdown-area kills the potential gain again:
you may remember that ogles2 contains a workaround for plain byte stuff like RGBA8 data. I found upload of such endian-free-simple-data to be so extremely dead-slow for unknown reasons, so the lib converts those to RGBAfloat32 data... You'd expect it to be much slower then (the 4x byte-to-float conversion, 4x as much data to transfer) but it's muuuuuuch faster than letting Nova do the simple job on the plain bytes.

So, unfortunately to avoid Nova's endian-conversion also means to switch to that slow byte-data layout, so we end up with a netto fps loss.
Damn

I will experiment a bit more, but this one looks like a dead end. So we'll have to wait for Hans to improve speed and eventually also implement that requested manual-endian-conv-conversion.

[Facebook] [YouTube Channel] [ko-fi]

Hans

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/18 4:35 #676

Home away from home

@Daytona675x

Quote:

The fun part with that VBOSetArray convention:
you can (ab)use it to trick Nova into not doing its slow endian conversion. Simply tell it beforhand that the VBO is just a package full of plain bytes

But keep your seat, yes, it works, but unfortunately another Nova-slowdown-area kills the potential gain again:
you may remember that ogles2 contains a workaround for plain byte stuff like RGBA8 data. I found upload of such endian-free-simple-data to be so extremely dead-slow for unknown reasons, so the lib converts those to RGBAfloat32 data... You'd expect it to be much slower then (the 4x byte-to-float conversion, 4x as much data to transfer) but it's muuuuuuch faster than letting Nova do the simple job on the plain bytes.

Huh? If a VBO contains only uint8 data, then it should be using a straight copy routine (one that uses doubles if possible).

You do need to make sure that *all* VBO arrays are 8-bit or disabled (W3DNEF_NONE), otherwise it'll fall through to the complex case of handling mixed data.

Hans

Join Kea Campus' Amiga Corner and support Amiga content creation
https://keasigmadelta.com/ - see more of my work

Daytona675x

Re: GL4ES: another OpenGL over OpenGLES2 emulation - some tech. info and porting progress

Posted on: 2019/9/18 6:43 #677

Not too shy to talk

@Hans
Quote:

You do need to make sure that *all* VBO arrays are 8-bit or disabled (W3DNEF_NONE), otherwise it'll fall through to the complex case of handling mixed data.

No, unfortunately it's not like this. Even setting all "unused" arrays to W3DNEF_NONE and size / stride 0 doesn't change anything.
The only thing that helps is to create a simple 1 array VBO in the first place. Which I should have done and usually do for pure index-VBOs, but which wasn't enforced in this case here indeed, thanks for pointing me at it

And oh yes, that makes a difference indeed! However, not for the "own" vs. "Nova" endian conversion, there's no measurable difference here in this simple 1-array-scenario.

But, damnit, all this revealed again just how slow Nova buffer copy becomes as soon as you don't have the most trivial 1 array VBO layout! Here it's the difference between 7 and 30 fps! And this happens for every VBO you create with 2 or more arrays inside.

Now the thing is:
obviously there is huge optimization potential here. Whatever you do in your multi-array-copy function, it's very bad. And if ogles2 could get rid of it that would result in an incredible speedup for sure.

But: unless you make VBOSetArry with W3DNEF_NONE work as you described above, I cannot implement it, because obviously an 1 array VBO is useless in that case. Or is any special parameter combination required for VBOSetLArray with W3DNEF_NONE to make it work as promised?

[Facebook] [YouTube Channel] [ko-fi]

Register To Post	« 1 ... 30 31 32 (33) 34 35 36 ... 43 »
	Top Previous Topic Next Topic

Currently Active Users Viewing This Thread: 1 ( 0 members and 1 Anonymous Users )