rjd324 wrote:@walkero Should we make a pinned thread / guide for optimised gcc flags for the different targets?
No, please, we have to stop multiplying unnecessary tweaks that will possibly raise problems, or in any case will bring nothing or almost. Let's be careful about supposed optimizations or real micro-optimizations (that possibly make things negative on other targets).
As usual, let's make things simple and it will be better.
Thinking about it, it might be more likely that this is an issue with GNU as. It is part of binutils that is rather old. Maybe the old version does not support all PPC assembler commands?
Thanks for the recommendations. I asked the question more for others than for myself, as my projects aren't ones that need the kind of speed that requires CPU-specific optimization. It's more important to me that my programs run on any PPC Amiga, so I stick with the default CPU selection (which I assume is the generic "-mcpu=powerpc"). And of course, no altivec.
The "-mtune=" option is potentially of use though, as it seems to reorder instructions to increase speed on a superscalar CPU while still working (with no change in speed) on one that isn't. If @sailor is able to run some tests to see which choice works best with the PA6T, that would be nice to know.
It would also be nice to know if there's a single "-mtune=" option that gives the best average performance increase across all the superscalar CPUs used in Amigas while not causing any problems with those that aren't superscalar; something that could be used as a default 'tune' for a program that wants to run on any PPC Amiga.
No, please, we have to stop multiplying unnecessary tweaks that will possibly raise problems, or in any case will bring nothing or almost. Let's be careful about supposed optimizations or real micro-optimizations (that possibly make things negative on other targets).
Of course, someone who optimizes for a specific target has to expect that the program will run only on that target, and must be prepared to provide multiple versions of the program to run on each different target. For most programs you're right, they don't require that kind of optimization and should be kept generic, so they run on any Amiga.
But for some programs, such as action games and perhaps video players, where getting every possible bit of speed is important, it may be worth creating different versions for each different Amiga. In that case it would be useful to have an article that summarizes the best options for each different target, to use as a starting point.
Thinking about it, it might be more likely that this is an issue with GNU as. It is part of binutils that is rather old. Maybe the old version does not support all PPC assembler commands?
I did a quick search in the AS executable, and the string 'mcrxr' is present. Perhaps the error message means that the opcode isn't supported by the CPU option being passed to AS?
Maybe e5500 does not recognize the instruction opcode simplify because it's not present. Try with another CPU target just like -mcpu=604e Some instructions are specific for PowerPC CPU families, other are derived from IBM Power ISA. In Amigaland we have a to support a mix of CPU with a shared common base ISA and some specific peculiarities that are incompatibile each other.
msteed wrote: It would also be nice to know if there's a single "-mtune=" option that gives the best average performance increase across all the superscalar CPUs used in Amigas while not causing any problems with those that aren't superscalar; something that could be used as a default 'tune' for a program that wants to run on any PPC Amiga.
All powerpc CPUs are superscalar. And even M68060 is superscalar CPU - first one on Amiga world.
But there are big difference between number of execution units, pipes parallelism and in many other parameters. It varies from 5 execution pipelines (603e, 440ep and 460ex) to 10 (G5) and 11 (G4). As corto said, probably the best solution is leave it to powerpc default.
One thing which from my point of view is important: extensions AltiVec, SPE and MAC But use it only for critical tasks like coding/decoding etc. and of course SPE also for FPU-heavy tasks. Also isel should be considered, but only together with above three. And only for code with this extensions we can play with -mtune. In real, all extensions are connected with certain CPU, so -mtune should be used only with PA6T for which gcc has no exact state machine. ( maybe also P1022, but e500v2 core is very close to e500v1 core in 8540 = SPE default cpu ).
Edited by sailor on 2025/3/1 10:47:32
AmigaOS3: Amiga 1200 AmigaOS4: Micro A1-C, AmigaOne XE, Pegasos II, Sam440ep, Sam440ep-flex, AmigaOne X1000 MorphOS: Efika 5200b, Pegasos I, Pegasos II, Powerbook, Mac Mini, iMac, Powermac Quad
I tested -mtune option on X1000, PA6T CPU with -mtune= 7400 | 7450 | G4 | 970 | G5 | power4 | power5 | power5+
if AltiVec is not used, best results are -mcpu=powerpc -mtune=power5+, but difference is <1% if AltiVec is used, best results are -mcpu=powerpc with no mtune ( i.e. equal to -mtune=powerpc )
Conclusions: 1. for PA6T CPU the best option is -mcpu=powerpc 2. PA6T pipelines (except AltiVec) are closest to power5+ 3. PA6T AltiVec pipelines are different than G4, G5, 7400, 7450, 970 pipelines. And of course different than power5+, as power5+ has no AltiVec.
P.S.: of course, results may vary with different algorithms
AmigaOS3: Amiga 1200 AmigaOS4: Micro A1-C, AmigaOne XE, Pegasos II, Sam440ep, Sam440ep-flex, AmigaOne X1000 MorphOS: Efika 5200b, Pegasos I, Pegasos II, Powerbook, Mac Mini, iMac, Powermac Quad
Thanks for running the tests. Sounds like I've been giving up little if anything in terms of speed -- on the X1000, anyway -- by using the default cpu option.
Quote:
All powerpc CPUs are superscalar.
Good to know. So possibly even the '-mtune=powerpc' option performs some degree of superscalar instruction scheduling.