Game compiles fine. However there is strange problem. Game runs fine without crashes on A1222 While it crashes on X5000 and Sam460 with a crash related to this section in the source code ?
I'm not really a much of a coder so I would not use these kind of constructions myself without exactly knowing what's the resulting behaviour.
But are you sure that it is wise to fill a pointer to a float (32bits) with a double (64bits) result? I can imagine that at least the compiler would complain about a missing cast.
Difference in behaviour can be the difference between unintended compiler decision for this construction versus FP emulation code for the A1222.
Furthermore, the issue could be somewhere else as well. Make sure that n is within bounds for example.
On the positive side: It it good to see that the A1222 actually behaves as intended by the original coders without having a compatible FPU
Game runs fine without crashes on A1222 While it crashes on X5000 and Sam460 with a crash related to this section in the source code ?
What kind of crash? If it's an alignment exception extern __BYTE__ ___1e6ed0h[]; is probably not 32 (float) or 64 bit (double) aligned, which may crash on a real FPU with an alignment exception, but not with an FPU emulator using integer accesess.
Assuming that Asterix points to the failing instruction and I am reading the disassembly correctly than your code is trying to store a SP float result (f0) at a non-aligned memory location 0x5B24264E(r9). Since a double is converted to single prior to load of destination and store, it is likely *xf or *yf.
Also f0 is NaN. So something is really wrong here.
Can you copy and paste the contents of the disassembly tab of the grim reaper window? It might give a hint where things start to go wrong.
Edited by geennaam on 2023/2/14 23:11:33 Edited by geennaam on 2023/2/14 23:15:03 Edited by geennaam on 2023/2/14 23:20:29 Edited by geennaam on 2023/2/14 23:20:56
Is this sourcecode reverse engineered? This would at least explain the weird namings.
I am pretty sure that the problem is elsewhere in the code.
The disassembly shows that *xf points to 16bit aligned address (0x5B24264E) where it must be 32bit. The pointer itself is 32bit aligned (0x5B01823C).
lfd f1,40(r31) Load f1(double) from address 0x5B018248
bl 0x7F73668C branch to linked function sin()
fmr f12,f1 Copy f1 to f12 (result from sine function)
lis r9, 23527 load immediate shift (r9 = 0x5B070000)
lfd f0,-7736(r9) Load f0 with double from address 0x5BE6E1C8
fmul f0,f12,f0 f0 = f0 * 0.788011
fadd f0,f31,f0 f0 = f0 + 413.0
frsp f0,f0 double -> Float
lwz r9,28(r31) Load r9 with word from address 0x5B01823C
stfs f0,0(r9) Store float in F0 to 0x5B24264E
The double loads are double aligned, so that is ok. The loaded value from (double)s_35e[n].XLocation seems to be 413.0. Strange because this is supposed to be a __BYTE__ The NaN is most likely the result of something bogus loaded from 0x5BE6E1C8 -> lfd f0,-7736(r9). This should have been 12.0
The question remains why there is only an alignement issue on the X5k and sam460. And then specifically on AmigaOS4. Because as I understand it, the MOS version runs fine. But this is a question for the compiler experts.
The question remains why there is only an alignment issue on the X5k and sam460. And then specifically on AmigaOS4. Because as I understand it, the MOS version runs fine. But this is a question for the compiler experts.
I meet with this alignment -x5000-only issue when working on the Irrlicht Engine port, and one of the loader's source code were done without worry about alignment. I then asked the developers of our kernel, and was told that the PowerPC architecture does not allow _ANY_ unaligned access. That is 16 bit must be 16 bit aligned, 32 bit must be 32 bit aligned, etc.
But, then, while it expected that we get alignment exceptions with access on floats at unaligned address, the OS4 kernel does have an emulator for unaligned floating point access, but it's pretty slow (and on pretty high abstraction level). It also enabled on all machines (including x5000 too), since the emulation is part of exec, and not the HAL.
The problem which we have on x5000, is probably because of missing 4 opcodes (lfs, lfsu, stfs, stfsu) which need to implement for x5k, but this wasn't done yet.
While unaligned memory access looks like a bad thing from bad code, the real live says that better to handle this situation without crash, even if it will be some milliseconds slower.
Mathias created a simple test case which can be checked on all machines:
#include <stdio.h>
int main(int argc, char **argv)
{
// Declare a 16-byte buffer, it will be aligned on 16 bytes
printf("A buffer contains the same 4-byte pattern at index 1 (unaligned) and 8 (aligned)\n");
char buffer[16] = {0, 60, 127, 113, 58, 5, 6, 7, 60, 127, 113, 58, 12, 13, 14, 15};
volatile char * ptr;
// Read the reference pattern at an aligned address (buffer + 8)
ptr = buffer + 8;
printf("Read the reference pattern at an aligned address (buffer + 8) (addr = %p)\n", ptr);
printf("float = %f\n", *(float *)ptr);
// Read the same pattern at an unaligned address (buffer + 1)
ptr = buffer + 1;
printf("Read the same pattern at an unaligned address (buffer + 1) (addr = %p)\n", ptr);
printf("float = %f\n", *(float *)ptr);
return 0;
}
So while works on some machines, crashes on x5000 for sure. Probable the reasons why it works on other than x5000 machines, it's because PA61 CPU on x1000 probably allows unaligned floating point access.
I created a bug report about a year or two ago, so devs aware.
@Sinan You say it crashes on, x1000 too ? Take care if you use any Altivec parts, because if so, it will crash too, then and on x1000.
Probably we all can take that code i posted, and checking it on different machines, i can do so on x1000,×5000,sam460 and pegasos2, if they're interest. But that will not change a kernel for us, of course, and probably the faster way is to deal with the unaligned code in the game itself.
A buffer contains the same 4-byte pattern at index 1 (unaligned) and 8 (aligned) Read the reference pattern at an aligned address (buffer + 8) (addr = 0x3d7f0ce4) float = 0.015591 Read the same pattern at an unaligned address (buffer + 1) (addr = 0x3d7f0cdd) float = 0.015591
The problem which we have on x5000, is probably because of missing 4 opcodes (lfs, lfsu, stfs, stfsu) which need to implement for x5k, but this wasn't done yet.
Do you mean that this misalignment handling code has not been implemented for those four instructions?
As you say, non-aligned allocations will result in performance issues anyways and should therefore be avoided.
It was my understanding that all allocations in OS4 are default 32bits aligned. So potentially only doubles and uint64 could have alignement issues when you don't force the correct alignment. Or is this only true for IExec->AllocVecTags() calls and not the standard C malloc() like calls?
I must admit that I only use AllocVecTags() because it gives me as much control as possible from this abstraction level. As a hardware guy I have trust issues with compilers and OSes
Looking at the names of the source files give my already a headache. But forcing the memory allocations to be correctly aligned should't be that hard.
It was my understanding that all allocations in OS4 are default 32bits aligned.
Yes, but strange things like in this code (casting some struct with float/double to a __BYTE__ array) may of course still fail with an alignment exception.
I then asked the developers of our kernel, and was told that the PowerPC architecture does not allow _ANY_ unaligned access. That is 16 bit must be 16 bit aligned, 32 bit must be 32 bit aligned, etc.
This is only partially true. Any integer access has to be (at least) 16 bit aligned and only accessing odd addresses will cause an alignment exception on all systems. But all FPU accesses have to be at least 32 bit aligned, with the exception of the 440ep (and probably 460 CPUs with "external FPU" too), and maybe POWER CPUs as well, where 64 bit alignment is required, or more correctly for the 440ep: FPU accesses never must cross a cache line boundary. The kernel has an alignment exception handler only for the 440 (and probably 460) CPUs where this is a problem for code working correctly on other CPUs.
Quote:
While unaligned memory access looks like a bad thing from bad code, the real live says that better to handle this situation without crash, even if it will be some milliseconds slower.
No, it's a bug in the code which has to be fixed, and it's not just an PowerPC/POWER issue, for example unaligned FPU accesses don't work on x86/x64 CPUs either.
Edit: WarpOS software had a lot of wrong aligned FPU accesses, partially because of the HUNK executable format used, therefore my powerpc.library includes an alignment exception handler emulating all FPU load/store instructions using integer ones, but the AmigaOS 4.x kernel doesn't do that. On the A1222 there is AFAIK no FPU and the FPU instructions are emulated by using integer accesses instead, just like in my powerpc.library, which is the reason it doesn't crash on the A1222, even if there are very likely some other bugs in the code like the NaN in f0.
Edited by joerg on 2023/2/15 18:19:09 Edited by joerg on 2023/2/15 18:24:09 Edited by joerg on 2023/2/15 18:41:48
I run this small code on Sam460 and it crashes. That means it will also crash on X5000.
WinUAE works and A1222 works (I guess since FPU is emulated)
Quote:
Mathias created a simple test case which can be checked on all machines:
#include
int main(int argc, char **argv)
{
// Declare a 16-byte buffer, it will be aligned on 16 bytes
printf("A buffer contains the same 4-byte pattern at index 1 (unaligned) and 8 (aligned)\n");
char buffer[16] = {0, 60, 127, 113, 58, 5, 6, 7, 60, 127, 113, 58, 12, 13, 14, 15};
volatile char * ptr;
// Read the reference pattern at an aligned address (buffer + 8)
ptr = buffer + 8;
printf("Read the reference pattern at an aligned address (buffer + 8) (addr = %p)\n", ptr);
printf("float = %f\n", *(float *)ptr);
// Read the same pattern at an unaligned address (buffer + 1)
ptr = buffer + 1;
printf("Read the same pattern at an unaligned address (buffer + 1) (addr = %p)\n", ptr);
printf("float = %f\n", *(float *)ptr);