You could use a method similar to what has been used in m68k-amigaos C code to allocate long word aligned structures like FileInfoBlock from the stack:
@salass00 interesting idea - many thanks. For the first time I heard about constructor attribute. Looks nice, on weekend I will test.
Before you post this advice, I had in plan more dirty solution, something like: double Tarray[3]; double *pT,*pT1; pT= (double*)((long)(Tarray +1) - (long)(Tarray % 8)); pT1= pT+1;
Thx!
Edited by sailor on 2025/1/24 20:47:35 Edited by sailor on 2025/1/24 20:48:45 Edited by sailor on 2025/1/24 20:53:08 Edited by sailor on 2025/1/24 20:53:41
AmigaOS3: Amiga 1200 AmigaOS4: Micro A1-C, AmigaOne XE, Pegasos II, Sam440ep, Sam440ep-flex, AmigaOne X1000 MorphOS: Efika 5200b, Pegasos I, Pegasos II, Powerbook, Mac Mini, iMac, Powermac Quad
You could use a method similar to what has been used in m68k-amigaos C code to allocate long word aligned structures like FileInfoBlock from the stack:
I could be wrong but I don't think definining a pointer to a multi-dimensional array will work. In my experience multi-dimensional arrays in C are quite limited in how they can be used so I almost never use them in my code.
What should work however is something like (not quite as elegant):
@flash native SPE version of gcc libraries will be fine, of course. NXP had CodeWarrior IDE for SPE in past, but there were other compiller than gcc. It will solve workarounds with float / double calls.
But global variable alignment error is probably directly in gcc - linking script error like salass00 said.
AmigaOS3: Amiga 1200 AmigaOS4: Micro A1-C, AmigaOne XE, Pegasos II, Sam440ep, Sam440ep-flex, AmigaOne X1000 MorphOS: Efika 5200b, Pegasos I, Pegasos II, Powerbook, Mac Mini, iMac, Powermac Quad
-1 ?? char does NOT add a extra 0\, its not a string, so why do you want on less char? a string is a class, not a array for chars.
This is code from salass00 hint (see above). If I understand it correctly, "-1" is here to allocate only needed memory, not more: allign_buffer should have size at least ( nr.of variables + 1 )*( sizeof(variable)) - we need to shift variable to correct aligned address. If variable is not correctly alligned, is shifted of certain nr. of bytes ( i.e. maximally of var.size -1 ). Shifting of var.size have no sense, because it have the same alignment like original. Thus we allocated one byte less. Of course, if var.size is less then 8 bytes, this shape should be corrected a little. Quote:
GCC should automatilcy allign data, and you can use compiler options for it, "packed".
Yes, in theory. Please, see post nr.53. Aligning not works always for global variables. In that case I need 8-byte alignment for experiments with SPE SIMD unit. And this workaround helps me much. This error is connected to gcc linker script ( see comment )
AmigaOS3: Amiga 1200 AmigaOS4: Micro A1-C, AmigaOne XE, Pegasos II, Sam440ep, Sam440ep-flex, AmigaOne X1000 MorphOS: Efika 5200b, Pegasos I, Pegasos II, Powerbook, Mac Mini, iMac, Powermac Quad
its not a string, so why do you want on less char?
I'm not allocating one byte less, I'm allocating sizeof(double)-1 bytes more than needed. This is to ensure that a double (8-byte) aligned address of the required size can always be found within the aligned buffer no matter what the alignment of the array itself is.
Quote:
GCC should automatilcy allign data, and you can use compiler options for it, "packed".
It should but it doesn't.
Using my own compiled test code as example:
While the .data section itself is 16-byte aligned to the start of the executable file:
Section Headers:
[Nr] Name Type Addr Off Size
[ 6] .ctors PROGBITS 01011094 001094 000008 00 WA 0 0 4
[ 7] .dtors PROGBITS 0101109c 00109c 000008 00 WA 0 0 4
[ 8] .data PROGBITS 010110b0 0010b0 000010 00 WA 0 0 16
it is preceeded in the segment by .ctors and .dtors sections which are only 4-byte aligned (see above):
The result is that when elf.library loads the entire segment 32-byte aligned the .data section gets loaded into it at offset 10b0-1094=1c which is only 4-byte aligned.
GCC should automatilcy allign data, and you can use compiler options for it, "packed".
"packed" (__attribute__, #pragma, etc.) can only be used to reduce alignment, for example from 4 bytes integer default on PPC to 2 bytes (for compatibility to m68k, as used in most exec, etc. includes), but not to increase alignment.
@salass00 Quote:
The result is that when elf.library loads the entire segment 32-byte aligned the .data section gets loaded into it at offset 10b0-1094=1c which is only 4-byte aligned.
Still sounds like a linker script bug to me, missing or wrong align for .ctors/.dtors, not a bug in gcc, binutils (ld) nor elf.library. However the last time I worked on it in the AmigaOS 4.x ports was IIRC gcc 2.95.x/binutils 2.14.x versions, i.e. about 20 years ago...
So someone could try to compile and assemble SPE code with native GCC 4 and link objects with GCC 11. If problem rely on linker, and if it's a text file it could be easily fixed with a diff between GCC4 and GCC11 linker cfg text files
I installed them in my test SDK and compiles the Stream source. It links and seems to be 32 bit align. But even the pre build binary from os4depot are already align. So someone needs to try out the binutils and see if the alignment stuff helps.
If someone installs these bintuils, I recommend to backup the old files. It still is development in progress with these binutils.
Installation
Unzip the content into
GCC:ppc-amigaos
(Backup the directory, to be able to revert the installation)
Additional put
.unix
file into
GCC:ppc-amigaos/bin
, because the binutils cannot handle native unix paths. And thus you need a reason released
Your work should be tested by a1222 owners and if confirmed it's working ok than could be adopted to build a1222 specific binaries. IMHO it should be ok also for building standard os4 binaries. Thanks a lot for your efforts!
@MigthyMax For the 405 CPUs with "external" FPU (440, 460) 32 bit alignment isn't enough, all FPU accesses have to be cache-line aligned, i.e. at least sizeof(double) * 8 bit = 64 bit alignment is required. If a FPU access is crossing a cache line boundary you get an alignment exception. It's handled by the 4x0 kernel (and my powerpc.library) exception handlers, but that's very slow.
CPUs with 64 byte cache line sizes like the ones in the X1000 and X5000 may even need larger alignment for max. performance, especially in case of the X1000 if AltiVec code is used.
I don't know what the alignment requirements for SPE code on the A1222+ are, but if you use 64 bytes = 512 bits alignment it should work on all supported CPU.
Something A1222+ developers might try, if it's still supported by elf.library, is building ELF REL instead of ELF EXEC executables. ELF REL was only used in the very early days of AmigaOS 4.x, about 25 years ago, instead of ELF EXEC, but IIRC with ELF REL "executables" all segments were loaded into separate pages, i.e. you have 4096 bytes alignment for everything.
@MigthyMax I will gladly test in weekend. Many thanks! But in gcc:ppc-amigaos there are only two subdirs bin and lib with different gcc versions. Really it should be copied there? And please, where to obtain clib4? I am afraid, maybe I need some more detailed instructions...
joerg wrote: CPUs with 64 byte cache line sizes like the ones in the X1000 and X5000 may even need larger alignment for max. performance, especially in case of the X1000 if AltiVec code is used.
For SPE it is similar: it needs 8-byte alignment to work properly and 32-bit alignment for best performance.
AmigaOS3: Amiga 1200 AmigaOS4: Micro A1-C, AmigaOne XE, Pegasos II, Sam440ep, Sam440ep-flex, AmigaOne X1000 MorphOS: Efika 5200b, Pegasos I, Pegasos II, Powerbook, Mac Mini, iMac, Powermac Quad
Just to clarify the alignment I can adjust in the binutils ld linker is the alignment of the sections.
The alignment of individual defined variables within a program is probably handled by the c compiler and can be influenced by the developer using the __attribute__((aligned(x)).
Of course the later somehow arranges all variables into the .data/.rodata section and than the linker aligns the section, and if the section isn't aligned correctly all variables align by the compiler gets out fo alignment.
Quote:
For the 405 CPUs with "external" FPU (440, 460) 32 bit alignment isn't enough, all FPU accesses have to be cache-line aligned, i.e. at least sizeof(double) * 8 bit = 64 bit alignment is required. If a FPU access is crossing a cache line boundary you get an alignment exception. It's handled by the 4x0 kernel (and my powerpc.library) exception handlers, but that's very slow.
CPUs with 64 byte cache line sizes like the ones in the X1000 and X5000 may even need larger alignment for max. performance, especially in case of the X1000 if AltiVec code is used.
I don't know what the alignment requirements for SPE code on the A1222+ are, but if you use 64 bytes = 512 bits alignment it should work on all supported CPU.
It sounds like that in the current linker script it, takes care of this, because lot of section etc is aligned with these configuration:
where SEGMENT_SIZE is/should be the TARGET_PAGE_SIZE.
What is the page size of our beloved target?
@sailor
I think it is the correct installation directory, the sub directory gcc version "only" contains the compiler. The important thing is that the new binutils get picked up. This can be checked with adding verbose output (-v) during compiling or adding to the link call. Than teh version will be output, which should be 2.40.
AmigaOS 4.x uses the SysV PowerPC ABI, should be the same as AIX. Only exceptions are R2 and R13, which aren't used at all (default) or as relative (small) data pointer: R13 when using -msdata, R2 when using -mbaserel. http://refspecs.linux-foundation.org/elf/elfspec_ppc.pdf pages 30-32