Login
Username:

Password:

Remember me



Lost Password?

Register now!

Sections

Who's Online
121 user(s) are online (112 user(s) are browsing Forums)

Members: 1
Guests: 120

emeck, more...

Support us!

Headlines

 
  Register To Post  

(1) 2 3 »
My MiniGL experiments,recompilation,tips,etc...
Not too shy to talk
Not too shy to talk


See User information
Hello

Last month I have tried to recompile latest MiniGL to optimize it
So I have a Cygwin based cross compiler
1) I have fixed all "deprecated" warnings about MsgPort,etc.. in gl & glut sources
2) I have fixed all "deprecated" warnings about AllocVec in gl & glut sources
So no more "deprecated" warnings on glut & minigl sources (except an allocvec for chip memory that cant be removed)
==> clean compilation on gl & glut
(I didnt checked glu sources because glu is not so important)
3) tried to rewrite the "transform" part to use registers ==> but it dont speed up Huno's Ioquake
4) Apply the functions transform,CodePoints(clip test),LightVertices,VerticesToScreen to the whole primtive not on a per triangle basis
5) rewrote all the "Draw a primitive" functions in draw.c to make them bufferize the non clipped triangles (or points or lines) so allowing to draw a (clipped) primitive in a single pass
==> but it dont speed up Huno's Ioquake
6) Test if primitive not clipped (fully on screen)then draw it immediatly
==> but it dont speed up Huno's Ioquake
7) I have also tried to fully remove the "lighting" (ie vertex is simply colored with white) ==> almost dont speed up Huno's Ioquake ===> So the "lighting" almost have no influence on this program

All that changes have introduced new errors : so now Glexcess crash on lines drawings = certainly ane easy bug but I am too tired to look after
Hopefully Ioquake, that serve as test, still run perfectly but at almost the same speed (even a few slower)

Now I am very tired : those who says "MiniGL can be easily faster" are wrong : I have spend a lot of time on this for zero progress

Alain Thellier - Wazp3D





Go to top
Re: My MiniGL experiments,recompilation,tips,etc...
Home away from home
Home away from home


See User information
@thellier

Quote:
2) I have fixed all "deprecated" warnings about AllocVec in gl & glut sources
So no more "deprecated" warnings on glut & minigl sources (except an allocvec for chip memory that cant be removed)


Are you sure you can't remove it?
CHIP/FAST ram is just emulated anyway, and it normally limited to only 50mb or less, so you run out memory quickly, not a good idea, CHIP has nothing to do with video memory.

You should be using MEMF_Private if possible, if not use MEMF_Sheard, and possibly you should ask for aligned memory, and none swappable memory (maybe), but anyway AmigaOS should not swap memory unless absolute necessary, so you should not need to think about it, unless you're writing a DMA or Interrupt routine.

You should be using AllocVecTags(), not AllocVec().

If you worried about backwards compatibility, you can always warp the code into macro, or use preprocessor directives, #if #else #elif #endif.

(NutsAboutAmiga)

Basilisk II for AmigaOS4
AmigaInputAnywhere
Excalibur
and other tools and apps.
Go to top
Re: My MiniGL experiments,recompilation,tips,etc...
Not too shy to talk
Not too shy to talk


See User information
I am using AllocVecTags() + MEMF_Private almost everywhere and MEMF_Shared only where needed

The chip is needed here in video.c

static void vid_Pointer(GLcontext context, struct Window *window)
{
if (!context->MousePointer) {
context->MousePointer = IExec->AllocVec(12, MEMF_CLEAR|MEMF_CHIP); /* still use deprecated AllocVec to obtain chip on real classic Amigas */
}
if (window) {
IIntuition->SetPointer(window, context->MousePointer, 1, 16, 0, 0);
}
}

Go to top
Re: My MiniGL experiments,recompilation,tips,etc...
Not too shy to talk
Not too shy to talk


See User information
I am using AllocVecTags() + MEMF_Private almost everywhere and MEMF_Shared only where needed

The chip is needed here in video.c

static void vid_Pointer(GLcontext context, struct Window *window)
{
if (!context->MousePointer) {
context->MousePointer = IExec->AllocVec(12, MEMF_CLEAR|MEMF_CHIP); /* still use deprecated AllocVec to obtain chip on real classic Amigas */
}
if (window) {
IIntuition->SetPointer(window, context->MousePointer, 1, 16, 0, 0);
}
}

Go to top
Re: My MiniGL experiments,recompilation,tips,etc...
Home away from home
Home away from home


See User information
@thellier

MPlayer allocates EmptyPointer as 16 bytes of (MEMF_PUBLIC | MEMF_CLEAR)
I guess it can be MEMF_ANY, it does not matter.

(NutsAboutAmiga)

Basilisk II for AmigaOS4
AmigaInputAnywhere
Excalibur
and other tools and apps.
Go to top
Re: My MiniGL experiments,recompilation,tips,etc...
Quite a regular
Quite a regular


See User information
@LiveForIt
It does matter on OS4 classic, but I doubt anyone uses it without an RTG card.

This is just like television, only you can see much further.
Go to top
Re: My MiniGL experiments,recompilation,tips,etc...
Quite a regular
Quite a regular


See User information
@thellier
It's not easy to improve things, because the feature set of Warp3D is stuck in the late 90s. One tip I got from Hans was accumulating the clipped primitives into the vertex buffer, instead of drawing them one by one. I started implementing this for the vertex array, but couldn't finish it yet due to other commitments.

This is just like television, only you can see much further.
Go to top
Re: My MiniGL experiments,recompilation,tips,etc...
Not too shy to talk
Not too shy to talk


See User information
@BSzili
>accumulating the clipped primitives into the vertex buffer, instead of drawing them one by one
This is what I did too : It dont seems to have enhanced the speed (at least on Ioquake....)

When the "glexcess lines bug" will be fixed (when I will have some courage) then I will release this sources+binary


Alain Thellier


Go to top
Re: My MiniGL experiments,recompilation,tips,etc...
Home away from home
Home away from home


See User information
@thellier

If you're using Ioquake as benchmark for any improvements, then it's important to note that the Quake 3 engine almost exclusively renders GL_TRIANGLES in compiled vertex arrays via glDrawElements(). So it uses the GLDrawElementsTriangles() function in vertexarray.c. This bypasses some of the main transformation code (calling v_MaybeTransform() instead).

The bottom line is that only code in or called by GLDrawElementsTriangles() is likely to have any impact on the performance of Ioquake (or any Quake III port).

Quote:
Now I am very tired : those who says "MiniGL can be easily faster" are wrong : I have spend a lot of time on this for zero progress

Thanks for trying anyway, and posting the details of your efforts. That way anyone else who wants to look at it knows what's already been tried.

Personally, I think that it would take a rewrite of MiniGL's rendering pipeline to boost performance, and that's definitely not easy. My thoughts are that the pipeline should write the transformed and clipped vertex data to a compact buffer, and deliver them to Warp3D in large blocks. This would do two things:
- Rendering whole vertex arrays in one go is more efficient than rendering one triangle at a time (with modern hardware)
- Having a dedicated and compact output buffer for the transformed vertices would make more efficient use of CPU caches, and would make it easier to avoid unnecessary copies. I have no idea how much slowdown we're getting from cache misses, but MGLVertex (which stores both input and output vertex data) is very fat, and it doesn't take many vertices to exceed the CPU's L1 cache size.

Needless to say, such a large pipeline rewrite would be a lot of work, with no guarantee that we'd get much of a performance boost.

Hans

Join Kea Campus' Amiga Corner and support Amiga content creation
https://keasigmadelta.com/ - see more of my work
Go to top
Re: My MiniGL experiments,recompilation,tips,etc...
Just popping in
Just popping in


See User information
Speeding up MiniGL was always going to be a "simple" task for someone as long as it was just talk.

The reality is that it's not so easy. I wrote some basic built-in profiling to try and identify slow or often called code and performed some optimisations based on that. But the problems are mostly not going to be solved that way. Your optimisations will probably help the slowest machines, on faster ones, other factors become more important. Cache utilisation is a much bigger issue there I think. It is interesting to note that a lot of older MiniGL stuff is faster despite using theoretically* slower V3 Warp3D calls. This is probably due to having more compact MiniGL vertex structures back then (supporting fewer features), which leads to better cache usage.

*In practise, simpler and easier to write optimised code for in a driver than split/interleaved pointer even if the legacy W3D_Vertex format is a bit silly.

Go to top
Re: My MiniGL experiments,recompilation,tips,etc...
Quite a regular
Quite a regular


See User information
I'll try to split MGLVertex into a rendering and management part. The structure used for rendering will only have a pointer to the management structure. This could minimize the changes required, most of which could be done with mass-replace.

This is just like television, only you can see much further.
Go to top
Re: My MiniGL experiments,recompilation,tips,etc...
Just can't stay away
Just can't stay away


See User information
I won't pretend to understand what you are all doing, but I know that it is good!

Your efforts are appreciated.

Many thanks!

AmigaOne X1000.
Radeon RX550

http://www.tinylife.org.uk/
Go to top
Re: My MiniGL experiments,recompilation,tips,etc...
Not too shy to talk
Not too shy to talk


See User information
OK my sources + binaries are here
http://thellier.free.fr/src-02-jul-2015.zip
No changes in src/glu
23 files modified in src/ mostly just for fixing warnings in fact only hclip.c light.c texture.c and especially draw.c
were truly modified
include/mgl/context.h modified too

bug in draw.c/DrawStoredLines() still here : i give up

Alain Thellier

Go to top
Re: My MiniGL experiments,recompilation,tips,etc...
Home away from home
Home away from home


See User information
@Karlos

Avoid snuffling memory around, if it's not needed.


Edited by LiveForIt on 2015/7/3 16:24:02
(NutsAboutAmiga)

Basilisk II for AmigaOS4
AmigaInputAnywhere
Excalibur
and other tools and apps.
Go to top
Re: My MiniGL experiments,recompilation,tips,etc...
Just popping in
Just popping in


See User information
@thellier

I will try to find some time this weekend to merge your changes back into the svn repository. I will start with just the compiler warning fixes for now and examine what else can be incorporated without pulling in any new bugs such as the line draw you mention.

Don't be too disheartened at the lack of apparent performance improvements at this stage. You have eliminated one of several potential areas and it might be that your changes further increase performance after other, more significant bottlenecks are eliminated.



Go to top
Re: My MiniGL experiments,recompilation,tips,etc...
Just popping in
Just popping in


See User information
What is really needed is the ability to run parts of this code through a tool like cachegrind. I've done it for whole binaries on Linux but not sure what we can do here.

Go to top
Re: My MiniGL experiments,recompilation,tips,etc...
Home away from home
Home away from home


See User information
@thellier

Tested your new MiniGL on my system but in IoQuake3 (Huno version) i had a downgrade !
No better at all but worse, i switch back to the official 2.20 for now

Here the difference, not mutch in compare but for sure not better

Official MiniGL 2.20
HunoPPC R3 version of IoQuake3 1.36 (Sam440ep Flex 800 Mhz + ATI Radeon 9250, 128 MB, 64 Bit)
640x480 - 21,6 FPS

Your version
HunoPPC R3 version of IoQuake3 1.36 (Sam440ep Flex 800 Mhz + ATI Radeon 9250, 128 MB, 64 Bit)
640x480 - 21,0 FPS

Go to top
Re: My MiniGL experiments,recompilation,tips,etc...
Home away from home
Home away from home


See User information
@Karlos
Quote:
What is really needed is the ability to run parts of this code through a tool like cachegrind. I've done it for whole binaries on Linux but not sure what we can do here.

I've never used cachegrind myself, but having some hard data on what's going on would be very useful.

The biggest difficulty would be in porting valgrind to AmigaOS. Sure, the source code is available, but something that works at such a lowlevel is virtually guaranteed to be harder to port than the average *nix application.

Hans

Join Kea Campus' Amiga Corner and support Amiga content creation
https://keasigmadelta.com/ - see more of my work
Go to top
Re: My MiniGL experiments,recompilation,tips,etc...
Just popping in
Just popping in


See User information
Even without cachegrind we aren't helpless. One way to test the fat vertex hypothesis would be to write synthetic tests for warp 3d directly:

1) test W3D_DrawArray/Elements with a triangle list using a compact, minimal vertex structure.

2) the same tests again using a fat vertex structure (same size as MGL_Vertex) in which the w3d vertex strucrure is embedded. Initialise the extra space with any old crap to ensure the cache isn't hot for just the parts we care about.

Time both tests carefully for different sized vertex arrays.

I wouldn't be surprised if, once the vertex array exceeds some CPU dependent magic number, that the performance drops suddenly, especially for DrawElements which generally makes random accesses into the vertex array. This would correspond to the point at which you keep having to refetch the data from RAM because you can't keep enough vertices in your cache.

Go to top
Re: My MiniGL experiments,recompilation,tips,etc...
Home away from home
Home away from home


See User information
@Karlos

I hadn't thought of testing it that way. Yes, that would give us some usable results. It might be challenging to create a test where the vertex array that's sent to the driver has zero gaps. The reason that I'm interested in that case is because CPUs tend to be very good at prefetching data into caches with consecutive accesses. Hence, any gaps in the data are likely to screw up that process and cause delays (via unnecessary reads and cache misses). That's in theory; I have no idea how much of an effect this actually has.

Quote:
I wouldn't be surprised if, once the vertex array exceeds some CPU dependent magic number, that the performance drops suddenly, especially for DrawElements which generally makes random accesses into the vertex array.

The accesses don't have to be totally random. For example, this document about optimizing drivers for Quake 3 states that the vertices in the GL_TRIANGLES array are in tri-strip order. So, in that case the vertex accesses are always within a small window.

Hans

Join Kea Campus' Amiga Corner and support Amiga content creation
https://keasigmadelta.com/ - see more of my work
Go to top

  Register To Post
(1) 2 3 »

 




Currently Active Users Viewing This Thread: 2 ( 0 members and 2 Anonymous Users )




Powered by XOOPS 2.0 © 2001-2024 The XOOPS Project