X5000 optimized code compile
It seems that code compiled with the flag "-mcpu=750" works fine under X5000, but it is not optimal since we are using the CPU G3. G4 7450 crashes understandably, of course, since the X5000 does not have an AtiVec unit.

Is there a way to compile optimized for the X5000, and what should the flags look like?

MacStudio ARM M1 Max Qemu//Pegasos2 AmigaOs4.1 FE / AmigaOne x5000/40 AmigaOs4.1 FE
Re: X5000 optimized code compile
-mcpu=e5500 maybe?

Re: X5000 optimized code compile
Did you try to add -mno-altivec?

i'm really tired...
Re: X5000 optimized code compile
-mcpu=e5500 crashed, but I am trying now with -mcpu=e5500 -mno-altivec, if this works. Just sent the test exe to Maijestroy (don't have an x5000 myselves, I got a x1000). We'll see if this works.

Thanks for your help.

Re: X5000 optimized code compile
TheMagicSN wrote:-mcpu=e5500 crashed, but I am trying now with -mcpu=e5500 -mno-altivec, if this works. Just sent the test exe to Maijestroy (don't have an x5000 myselves, I got a x1000). We'll see if this works.

The Test.exe unfortunately still crashes but in the GR-log it looks like "libgcc.so" which causes the crash. Does "libgcc.so" have to be compiled with the same flag?

Crash log for task "GemRB"
Generated by GrimReaper 53.19
Crash occured in module libgcc
.so at address 0x7EA5BB54
Type of crash
DSI (Data Storage Interruptexception
Does anyone have any idea what is going wrong here?

MacStudio ARM M1 Max Qemu//Pegasos2 AmigaOs4.1 FE / AmigaOne x5000/40 AmigaOs4.1 FE
Re: X5000 optimized code compile
I do not think libgcc.so is the problem but simply
that the setting did not work. And libgcc.so is not
compiled by me and for all the other versions (x1000/
Sam440/Sam460 Versions) working fine. All use the
same libgcc.so (so library from os).

Sounds to me we have to go back to -mcpu=750
for x5000 which works.

Re: X5000 optimized code compile
I have no X5000, so I cannot test it.
You can try also -mcpu=e5500 -mno-powerpc64

AmigaOS3: Amiga 1200
AmigaOS4: Micro A1-C, AmigaOne XE, Pegasos II, Sam440ep, Sam440ep-flex, AmigaOne X1000
MorphOS: Efika 5200b, Pegasos I, Pegasos II, Powerbook, Mac Mini, iMac, Powermac Quad
Re: X5000 optimized code compile
IMHO if you want avoid specific binaries for specific cpus, or you want avoid dinamic runtime code paths for specific cpus, maybe the best practice is to use following gcc switches to generate code

for general OS4 use, considering also Amiga classic 1200/4000 accellerators
-mcpu=604e -mtune=604e

or use standard base code with specific tuning for superscalar cpus and their caches
-mcpu=604e -mtune=604e/7400/8540/etc..

With a focus on Amiga systems, PowerPC ISA is the same for all PowerPC family, there are very few instructions different and all them are not so much revelant for size/speed code.
Maybe the most revelant is the ISEL instruction to avoid jumps, but it works only on e5500/440 cpu (Power ISA v2.06)

For G4 specific code of course you can use
-mcpu=7400 -mtune=7400 -maltivec -mabi=altivec

but in this case your code will run only on G4, if altivec code is generated.

One switch I suggest to use always is
to free one register and use it for calculations

Last, I use always -O3 to highly optimize code and -O2 if I want to read generated assembler against C code

Edited by flash on 2025/2/25 15:44:13
Memento audere semper!
Re: X5000 optimized code compile
In the case of the program i do 604e is too slow
to run it. I already use -Ofast which is beyond-O3.
Using -mcpu=750 improved speed on G3 systems
(WarpOS exe, not os4) a lot so i wondered if for other
Cpu targets speedup can also be reached. Altivec
specific code is not used. I will try the no 64 bit thing.
In worst case i will go back to -mcpu=750.

Re: X5000 optimized code compile
if you look inside asm code generated with -mcpu=750 and compare it with code generated by -mcpu=604e, you'll see almost no change in instructions used.
What should do a major difference is -mtune parameter and its instrucion rescheduling for supercalar cpus, allowing more instruction execution in one clock cycle, and caches size.

At least these are results of my experiments.
So IMHO for standard apps should be better use common codebase granted by -mcpu=604e and a adopt a rescheduling for specific target with -mtune=7400 just as example.

The best solution is to implement a runtime different code path for different cpus, but I know it's quite complex.

Re: X5000 optimized code compile
Okay, -mcpu=e5500 -mno-altivec -mno-powerpc64 worked

According to my tester on the x5000 we are now at (though he also modified some settings) 29-30 fps in 1024x768 with Baldur's Gate 2 (30 fps is the max speed of the engine, so there might be potential for higher speed theoretically).

Re: X5000 optimized code compile
It's a bit off-topic, but since the question has been answered for the X5000 and since you seem to have some experience with the subject, I've always wondered which would be the best option for the X1000. There's no "-mcpu=pa6t" option, so which of the options that are available would be the best?

Re: X5000 optimized code compile
At my knowledge that should be

-mcpu=7400 -maltivec

(Altivec only used if vectors are used in case of
GemRB they aren’t)

Re: X5000 optimized code compile
TheMagicSN wrote:At my knowledge that should be

-mcpu=7400 -maltivec

(Altivec only used if vectors are used in case of
GemRB they aren’t)

I also suggest to try: -G5 -mno-powerpc64, or simply -G5 ?
I can test on weekend.

AltiVec instruction are used also with gcc auto-vectotization feature, it is no need for programming with vectors.
For example -maltivec -O3 or -maltivec -ftree-vectorize automatically uses AltiVec, if there are suitable loops.

AmigaOS3: Amiga 1200
AmigaOS4: Micro A1-C, AmigaOne XE, Pegasos II, Sam440ep, Sam440ep-flex, AmigaOne X1000
MorphOS: Efika 5200b, Pegasos I, Pegasos II, Powerbook, Mac Mini, iMac, Powermac Quad
Re: X5000 optimized code compile
Should we make a pinned thread / guide for optimised gcc flags for the different targets?

If liberty means anything at all, it means the right to tell people what they do not want to hear.
George Orwell.
George Orwell.
Re: X5000 optimized code compile
For X1000 you can go with this

-mcpu=7400 -mtune=7400 -maltivec -mabi=altivec -fomit-frame-pointer -O3

but if your code is executed on different cpus (without Altivec) it will crash.
So you can go with conditional compilation and build binaries for specific models or use runtime dinamic codepath for highly optimized specific code, just like altivec or spe units.

If you want produce a good job you need to go with assembler or lowlevel C code using intrinsics for SIMD.
For SIMD you have to rethink the whole algorithm and for loops go with steps of multiple of 4 (for altivec).
Autovectorization can't do miracles, maybe with modern IA you can ask for a help rearranging code.

I don't know -Ofast GCC switch and need to go inside it further.

Re: X5000 optimized code compile
-Ofast is Basically a shortcut for

-O3 -fomit-frame-pointer -ffast-math -fno-math-errno -fno-trapping-math

(Possibly some other optimization options not sure what is all included but
the ones above are).

Not all code works with -Ofast some code breaks but GemRB definitely works
with it ;)

What is this tune option btw? I use most the options you list above for x1000
but not the tune one

Re: X5000 optimized code compile
Just looked it up. Providing both cpu and tune options
makes the same as only cpu option.

Principially tune is an alternative to cpu so the exe
still runs on other cpu (but less specific optimization).

Re: X5000 optimized code compile
Tune option is for instruction rescheduling and yess it don't break compatibility but optimize code for superscalar CPU and their caches to run multiple instructions at same time.

Memento audere semper!
Re: X5000 optimized code compile
flash wrote:@msteed
For X1000 you can go with this

-mcpu=7400 -mtune=7400 -maltivec -mabi=altivec -fomit-frame-pointer -O3

Did somebody tested if -mtune=7400 is really the best for PA6T X1000 CPU?

Because it is about scheduling instructions, i.e. how many execution pipelines CPU has and how many instructions can be scheduled / dispatched / completed per cycle and probably other things..

I am not sure if G4 is closest to PA6T in this, maybe -mtune=G5 is better?

If nobody knows the right value, I can test it a little. PA6T is PowerISA 2.04 - the same like Power5++ and G5/970 (PowerISA 2.01) is derived from Power4 (PowerISA 2.00-2.01) - so I think that possible values are between this:
-mtune= 7400 | 7450 | G4 | 970 | G5 | power4 | power5 | power5+

I also will look into CPU docs, which pipeline model is closer to PA6T. Unfortunatelly, PA6T is not upgraded G5, but own new design...

AmigaOS3: Amiga 1200
AmigaOS4: Micro A1-C, AmigaOne XE, Pegasos II, Sam440ep, Sam440ep-flex, AmigaOne X1000
MorphOS: Efika 5200b, Pegasos I, Pegasos II, Powerbook, Mac Mini, iMac, Powermac Quad
