Maybe you will be in interest: i compile bench programm for morphos2.5 (on the same peg2 where i have latest aos4), and with latest powersdl (morphos version of sdl), it give that results for me:
So looks like while Slow Points and Fast Points give the same results on HW/SW surfaces, rect fill and 32x32 blits on HW are faster on powersdl/mos. (on SWSURFACE again, the same in general). RectFill faster pretty much, but 32x32 blits faster really well (in twice almost) (for example, in HW/640x480 for morphos we have 273.067, for aos4 146.286).
But imho, that it because of our realisation of OpenGL :( Did HWSURFACE use opengl at all ?
Also i noticed, that first-coming info from benchmark, are a bit different too:
I cant download that archive from page which you point on os4-sdl page, maybe you can upload it somethere ? So, i can try to compile it myself, check how bug is looks like, then i will try to strip all the code off, and only leave the bug-part, which, after all, will be easy for AfxGroup to fix the problem.
Just to avoid creating of new thread about the same, i bring that one.
As i see you start to works on SDL again, and your latest update (r42) was about adding a preliminary version of p96WritePixelArray replacement. I check the differences, and found that before BltBitMapRastPort() functions was used. And that cause a bit of interest from me : what fucntions are faster ? As far as i remember, in last time we all blame p96 pretty much (bottlneck, old, bad designed), and that make me curios what are better: that BltBitMapRastPort from graphics.library, or that p96WritePixelArray from p96 ? I check the docs, but there is nothing about which one are better or faster .. Maybe you already do some tests with it and can say about it more ?
Keep in mind that changes are in working and must not be spreaded since for example there are problems with SWFURFACES that freeze the machines. Also that last commint has a problem with first p96WritePixelArray since i've changed a variable name.. So wait before doing something with that changes. Hope Peter can work on it since i'm really busy at monet and maybe i must change also my real work. so i don't know when i can do domething on the code
I build that version right now, and benchmarks are VERY GOOD ! Also i see your benchmarks, and you 100% right, it the same BIG difference for me. Very cool. For example, if we will compare with MORPHOSs SDL speed for now on the same hardware (in uppear post), and that one which i have for now:
That mean, that OS4 version for now are faster than morphos ones in 300% in SW mode, and for example in 32x32 blits 640x480 in HW it also faster in 2 times for now !
I even cab see the differences beetwen old SDL and new one visually while Bench test are works.
I should to say that is what i call "radical speedup" ! Very good.
Trying to compile some stuff (just add on linking stage to compile it with shared SDL), and looks like it freezes not only in the SWSURFACE, but also and with HWSURFACE as well.
After compiling SDLBench to use a shared libSDL, I got some amazing benchmarks too for software surfaces.
I have a shared object version of e-uae, and unfortunately the system freezes when switching from fullscreen to windowed (hardware surfaces to software surfaces).
Pity, i'd like to see if there's a noticable difference.
@MickJT As afxgroup say its work in progress. So, stay tuned, its not something for end-usage for now. I just by some luck noticed some update on project page, and was in interest about :)
But yes, i pretty want to compile few projects to see the differences :) But even by that "bench" test, you can see visually big difference. Not sure that you will noticed it very much with UAE (because imho there is everything about CPU), but we will see :) But benchmarks are ultra good :)
@afxgroup I think that those SW/HWsurfaces speedup not mean opengl support here at all ? (tryed lodepaint , and no differences). I also check bench.c code, and yep, there is no relys to sdl_opengl. Maybe you can explain in brief, what the differences between SWSRUFACE and HWSURFACE from programming point of view ? (i mean what happenes with OS when you set SWSURFACE, and how it operate with memory/which functions are used from which libraryes of OS, and what happenes if you choice HWSURFACE)
@AfxGroup Btw, one more curios: is BltBitMapRastPort() functions should works over P96 fucntions in end ? Ds i understand, it should also call P96WritePixelArray in end (because in end P96 do all the work). Maybe BltBitMapRastPort() use some heavy checking on lock/unlock screens before do P96 stuff, and because of it its so slow ?
And maybe our current lockups because of "we need some more checking when we can lock/unlock" ? (what maybe do BltBitMapRastPort() and because of it , it so slow).
I just trying to understand, how it posible to have such _very big_ difference in thousands times ..
i don't know why i receive such bug differences. And ii don't know what is done internally by BltBitMapRastPort.. i only can try to test every possibility to increase the SDL speed.
If you in interest, you can follow by that thread on UB which i start today (about all that BliBltRasPort vs P96WritePixelArray stuff). For now the most intersting answer was from Georg:
Quote:
The changes for os4video_HWAccelBlit() func in this diff file look strange/buggy, as src_bm is basically ignored.
Maybe we indeed loose somethere src_bm ? (if its need it for).
That can explain such "speedup" (like it copy from nothing to memory).
as i said in my previous post that problem is only a wrong copy & paste on my working sdl. Don't worry.. the tests are real I have no time these days also to fix that problem. Hope Peter can do it. And however if someone wants to contribute to SDL just email me or Peter