@Corto Thanks! Tried, and have 2 errors: 1. vertices_packed is not used 2. error: invalid conversion from ‘const u8*’ {aka ‘const unsigned char*’} to ‘char*’ on the "pVPtr = pPtr;", but that one i cast like "pVPtr = (char *)pPtr;".
Through as vertices_packed not used anywhere, i assume they should ?
@Hedeon Quote:
Ok, the WarpOS kernel had auto correct.
On what cost ? I mean, is it slow things down much ?
Normally, the compiler takes care of this. So the WarpOS autocorrect should only kick in by old and misbehaving apps. you would move the data using general purpose registers and then load them into the correct fp register. Yes. it is slow.
For more info you could look at the GR more carefully and look at the asm output to see if it is
1) integer or floating point 2) which address it is trying to access (and see its alignment)
Are you btw sure OS4 has nothing? I seem to remember asking one of the Frieden brothers to fix an alignment issue of Shogo_WOS while running in WOS emulation on the SAM440. The fix was done inside the alignment exception handler.
Are you btw sure OS4 has nothing? I seem to remember asking one of the Frieden brothers to fix an alignment issue of Shogo_WOS while running in WOS emulation on the SAM440. The fix was done inside the alignment exception handler.
At least if it have something, it didn't handle unalignment access there for sure..
Quote:
For more info you could look at the GR more carefully and look at the asm output to see if it is
There is also how crashlog looks like:
[emu_unimplemented] Unimplemented opcode
[HAL_DfltTrapHandler] *** Warning: Fatal exception in task 0x64129C90 (Shell Process, etask = 0xEFD44600) at ip 0x7EAD51E8
Dump of context at 0xEFD17000
Trap type: Alignment exception
Exception Syndrome Register (ESR): 0x01000000
Machine State (raw): 0x0002F030
Machine State (verbose): [Critical Ints on] [ExtInt on] [User] [FPU on] [IAT on] [DAT on]
DSISR: 01000000 DAR: 620166D9
Page: 0xEFCA73F0 (Virtual: 0x62016000, Physical: 0x0FB5E000, Flags: 0x 102)
Temporary stack trace:
#0: 0x7EAD51E8
#1: 0x7EAD8D38
#2: 0x7E943BBC
#3: 0x7E89DBC0
#4: 0x7E8A301C
#5: 0x7E8EEF54
#6: 0x7E8C9AA0
#7: 0x7EA5B094
#8: 0x7E8BB978
#9: 0x7E8B2724
#10: 0x7E8AA3F8
#11: 0x7E8A0EDC
#12: in module newlib.library.kmod+0x00002580 (0x01A86280)
#13: in module newlib.library.kmod+0x00003298 (0x01A86F98)
#14: in module newlib.library.kmod+0x000037CC (0x01A874CC)
#15: 0x7E89C924
#16: in module dos.library.kmod+0x0002A5D8 (0x01983EF8)
#17: in module kernel+0x0006B590 (0x0186B590)
#18: in module kernel+0x0006B5D8 (0x0186B5D8)
#19: 0x00000000
1) Put the floats at the start of the struct 2) Make it 4 (but that is actually the default so removing it might be better. 2 will still give alignment trouble for fpu.
@Hedeon Tried firstly move floats at start of struct : same crash Tried then change pragma pack(1) to pragma pack(4), it says "sorry wrong file format", so i assume all code based there in meaning that there is pragma pack(1) (dunno why they choice that at all, maybe exactly to align structs to not be 3, 5, 7 or whatever, but always be diveded on /2).
The data presented is a binary block? So everything is fixed already? And the struct is a representation of this binary block?
So salass is more right towards the solution (sorry).
You should try it again and see if the GR is different regarding the asm.
In the quick reads I did I think I read for example that lwbrx also gives an alignment error on byte aligned addresses. So maybe with his suggestions you got a different alignment exception.
The data presented is a binary block? So everything is fixed already? And the struct is a representation of this binary block?
Yeah, its just load() function of .ms3d file and parsing it.
What i want to understand now, is we face there 2 different unaligned memory problems or its just one, which just happens in different conditions, one was when we do "f32 framesPerSecond = *(float*)pPtr;", so reading the floating point value from an unaligned pointer. And second one are the same reading of unaligned pointer when we have any floats in the structure which is packed, right ?
I mean, i want to note some simple rulz to follow later. Like, if any structure in code have floats, and then pragma_packed, then its no-go for PPC (is it correct?).
For example, there is .md2 loader, which also have floats inside of structs and that one didn't crash on our side:
struct SMD2Frame
{
f32 scale[3]; // first scale the vertex position
f32 translate[3]; // then translate the position
c8 name[16]; // the name of the animation that this key belongs to
SMD2Vertex vertices[1]; // vertex 1 of SMD2Header.numVertices
} PACK_STRUCT;
And that loader didn't crash surely (maybe it loads by bytes then as Salas00 says ? See CMD2MeshFileLoader::loadFile there)
Edited by kas1e on 2019/10/23 15:11:33 Edited by kas1e on 2019/10/23 15:12:36
That loader you show has everything aligned. That is why it does not crash. The floats are at the top so start at 0. The other struct has a u8 at the top so the first float starts at 1 (and therefore is misaligned for the PPC FPU).
Loading it as an integer and then convert it to float should work (the examples by salass and corto). You said they also crashed but i'd like to see also the GR output there. I am suspecting it is a different alignment issue (not FPU).
That was the first alignment issue i crash on, and that one which we have now are second one.
@Hedeon
I still can't get it .. Is actual problem is not that packed struct have floats, but that those floats not at begining of structure ?
Should't we then just for every structure move all floats at top, and then, before each structure doing some "dumb" structure with empty u8 , i mean instead of original doing it all like this:
I tried it like this with original code, and it still crashes on the same "for (u16 tmp=0; tmp<numVertices; ++tmp)" loop then with alignemnt crashlog i show before.
Yes, normally with packed structures you would put all floats at the top. But here the input is already fixed. So the float is actually at offset 1 in that binary data. So in memory the loaded data will always have the float at offset 1. And of course, in this cause you might load the data in at offset 3 so the u8 flag will be at offset 3 and then the float at offset 4 (and add 3 dummy bytes at the start of the struct, and allocate a slightly bigger buffer) but then other loaders would give errors probably.
So going the integer way and then convert should do the trick. I am not sure why it is still crashing when you do that. I then need to see the asm output for each case.
Yes, normally with packed structures you would put all floats at the top. But here the input is already fixed. So the float is actually at offset 1 in that binary data. So in memory the loaded data will always have the float at offset 1. And of course, in this cause you might load the data in at offset 3 so the u8 flag will be at offset 3 and then the float at offset 4 (and add 3 dummy bytes at the start of the struct, and allocate a slightly bigger buffer) but then other loaders would give errors probably.
Yeah of course you can't just swap the floats to any arbitrary place in strucutre as it will mess the data for sure.
Quote:
but then other loaders would give errors probably.
Nah, all the loaders have all their own code. So doing anything with one loader have no inpact on other ones.
Quote:
So going the integer way and then convert should do the trick. I am not sure why it is still crashing when you do that. I then need to see the asm output for each case.
It crashes because i don't do integer/convert way :) The only conversion i do at moment was about "f32 framesPerSecond = *(float*)pPtr;" and that one works now. The next crash come from those float-structs and reading from them , and Corto's way probably still have some bug, as it just crashes in the middle of copy-buffering.
The PowerPC takes a hybrid approach. Every PowerPC processor to date has hardware support for unaligned 32-bit integer access. While you still pay a performance penalty for unaligned access, it tends to be small.
On the other hand, modern PowerPC processors lack hardware support for unaligned 64-bit floating-point access. When asked to load an unaligned floating-point number from memory, modern PowerPC processors will throw an exception and have the operating system perform the alignment chores in software. Performing alignment in software is much slower than performing it in hardware.
Its says that every PPC cpu has hardware support for unaligned 32-bit integer access. What about other ones ? 64 one for example ?
It also says that modern powerpc have issues with unaligned 64bit floats, but then what about 32 ? (as we can see 32 ones fail too).
And funny thing that "operating system perform the handling of unaligned memory access in the software", while if we have crashes , it didn't sounds like amigaos4 kernel handle those exceptions in software, but just crashes instead.