@All Is anyone by some luck have implmented for vasmppc 4 macroses as in H&P PowerASM prolog/epilog and pushgpr/popgpr ?
I just want to try to port old StormC based powerpc assembler code (which use PowerASM from H&P) to OS4, and with vasmppc_mot it seems all compiles, but only miss those 4 macroses (i had to build vasmppc_mot myself, to have PPC code in motorolla syntax as for PowerASM).
I find out that at least pushgpr/popgpr were implemented in ASM-One rev.483 for compatibility with H&P's PowerASM, but didn't find sources of it.
Those macroses in real-live example used just like this:
See when we use pushgpr we do have arguments (to point which registers to store), but for popgpr and epilog/prologue we have no arguments.
All information i can find in PowerASM documentation about are this:
Quote:
Nonvolatile registers have to be saved and restored by the function which modifies them. The 'push' and 'pop' commands can be used for that purpose. Using 'pushgpr', like in our example, you can save as many registers as you want (you can use a register list, known from the 68K command 'movem'). At the command 'popgpr' you don't have to specify the register list, the registers are restored which were saved at the corresponding 'pushgpr'. Please refer to the documentation of the PowerASM to get a complete description of all 'push' and 'pop' commands.
And there is also powerasm.guide, but it's on german, so what i can translate about are :
prolog
Quote:
Implementation: MACRO
Description: A stack frame is created and a local stack is allocated. All pseudo-mnemonics implicitly associated with the local stack work (e.g. PUSHGPR,PUSHCTR,POPFPR,etc.) may only then be used if a valid stack frame exists.
For a detailed description of this command see chapter Stackframes
epilog
Quote:
Implementation: MACRO
A stack frame created with the PROLOG command is aborted and the return from the function initiated. For a detailed description of this command see chapter "Stackframe".
pushgpr
Quote:
Implementation: Directly in assembler
Several GPR's are pushed onto the stack. The registers are stored from top to bottom, i.e. the register with the lowest number is written to the stack last.
PUSHGPR expects a register list. Individual registers will separated by slashes (/). Areas of registers can be specified as follows: Rm-Rn. Examples of valid ones Register lists:
;
r3/r6/r20/r21 ; includes r3,r6,r20 and r21
r10-r13 ; includes r10,r11,r12,and r13
r5/r8-r10/r30 ; includes r5,r8,r9,r10 and r30
r3-r6/r20-r22 ; includes r3,r4,r5,r6,r20,r21 and r22
The counterpart to PUSHGPR is the POPGPR command
Floating point registers can be pushed onto the stack with the command PUSHFPR get saved.
The PUSHGPR command is exactly the same as the 68K command movem.l RList,-(sp)
Notes: This command may only be used if a valid Stack frame exists (see PROLOG and EPILOG).
Example:
pushlr ; Save link register
pushctr ; Save Count register
pushgpr r3-r8 ; r3,r4,r5,r6,r7 and r8 rescue
...
popgpr ; restore all 6 GPRs
popctr ; Restore count register
poplr ; Restore link register
blr ; Exit function
popgpr
Quote:
Implementation: Directly in assembler
Description: Several longwords are loaded from the stack into GPR's. The first longword is loaded into the GPR with the lowest number. A register list can be specified with the POPGPR command. Individual registers are separated by slashes (/). Areas of registers can be specified as follows be: Rm-Rn. Examples of valid register lists:
;
r3/r6/r20/r21 ; includes r3,r6,r20 and r21
r10-r13 ; includes r10,r11,r12,and r13
r5/r8-r10/r30 ; includes r5,r8,r9,r10 and r30
r3-r6/r20-r22 ; includes r3,r4,r5,r6,r20,r21 and r22
If the register list is omitted, the Registers used, saved at the last PUSHGPR. This also works when PUSHGPR and POPGPR be nested. Example:
;
pushgpr r6-r9 ; r6,r7,r8 and r9 are saved
pushgpr r20/r21 ; r20 and r21 are saved
popgpr ; r20 and r21 are restored
popgpr ; r6,r7,r8 and r9 will be restored
If to a POPGPR without register list no corresponding match (PUSHGPR) is found, an error message appears generated.
The counterpart to POPGPR is the PUSHGPR" command
Floating point registers can be popped off the stack with the command POPFPR getting charged.
The POPGPR command corresponds to the 68K command movem.l (sp)+,RList
Notes: This command may only be used if a valid Stack frame exists (see PROLOG and EPILOG).
Examples:
pushlr ; Save link register
pushctr ; Save Count register
pushgpr r3-r8 ; r3,r4,r5,r6,r7 and r8 rescue
...
popgpr ; restore all 6 GPRs
popctr ; Restore count register
poplr ; Restore link register
blr ; Exit function
So if somebodye can help withthose 4 macroses that will help a lot.
They not need to support slashes, or all the stuff, just pure epilog/prologue and for pushgpr just "pushgpr rx-rx" only.
For pushgpr i assume it just something like this for example for "pushgpr r20-r26":
For "popgpr" i do now know, but i assume it will be something with lwzu..
For Epilog/Prologue it should be something very simple imho, but im not sure how those macroses should be done, so to be 100% the same as it was expected to be in H&P's PowerASM.
Any help much apprecate, thanks!
Edited by kas1e on 2022/4/3 7:39:43 Edited by kas1e on 2022/4/3 7:49:03 Edited by kas1e on 2022/4/8 13:09:18
@flash Yes that all very easy surely, and i know powerpc docs all over the place.
I asking in hope for those developers working with assembler offten to copy+paste there ready to use 4 macroses in the motorolla format and which will be the same as expected to be on PowerASM from StormC.
Just do not want to spend a day or two for writing and finetuning those simple things, while i am sure some of us know those things very well and can wrote it in a few minutes. But if there will be no other way ...
I can go easy route : take the prolog/epilog used in GCC via disassembly, but i do not know the sizes of prolog/epilog being reserver by default on PowerASM by those macroses. I.e.:
There we reserver 16, but how much it was on PowerASM ? Maybe 16 not enough.
And dunno about pushgprs/popgprs..
Probabaly it worth to just create on original PowerASM some test case which do call those 4 macroses and that all, it will firstly explain how much PowerASM reserve for stack frame and how pushgprs/popgprs is done
@All Turns out that at least Epilog and Prolog are happens to be in ppcmacros.i file of WarpOS SDK. Thanks to Frank pointing me out on. He also says that through vasmppc_mot doesn't support the CPU-specific directives, like "equr", "setr", etc.. They exist in vasm's m68k-backend, but not in the PPC-backend.
So we have to rework the prolog/epilog macros yourself by replacing the register names. For exmaple: trash=r0, stack=r1, base=r2.
error 39 in line 16 of "prolog": illegal relocation called from line 13 of "test.pasm" > stwu r1,-((__ARGS)-56)(r1)
And i still don't know what to do with pushgprs/popgprs. They implemented in assembler , not via macroses, so hided somewhere.
I currently tried to build HelloWorld on powerasm with epilog/prolog/pushgrps/popgprs and was in hope for some warpos disassembler, but have hard times for. WOS_IRA from os4depot produce mess still, and "wosdb" from aminet just says "can't load binary" (at least on os4). Will try to use wosdb on os3, maybe will be lucky to disassembly and see how at least pushgprs/popgprs looks like from assembler code.
@Hedeon Checked your macroses from sonet and rewarp : there prolog macro is different for you, its not just "prolog", but you specify size as argument, while in PowerAsm they just have "prolog" with no arguments. And for easy portability i need tge same.. basically that one i post in last post works (seems auto calc necessary size ?), but bring on vasmppc this error:
Quote:
error 39 in line 16 of "prolog": illegal relocation called from line 13 of "test.pasm" > stwu r1,-((__ARGS)-56)(r1)
@Hedeon Thanks for you examples. At the moment i end up with those ones which seems works well too (the same which used in powerasm, so i only replae register names):
Reading VASM tutorial i can see that \1 till \9 are used as arguments too, so even in my case prolog macro can be used with argument as size too.
So as far as i understand if we provide no arguments, so it will be "\1"", then it will allocate: mean 1108 - 56 bytes - 1052. If we provide argument, then it will be taken as number of bytes for alloation, and so will be 24+4+(\1)+56 - 56 , mean 28 as minimum + how muh we allocate.
Next, i tried to just follow the way as you say, so please have a look if it about right or not:
I didn't get why first one are "lwz", but other ones are "lwzu", also why the order is different (maybe beause originally in StormASM order is different too ? firstly placed one set of registers, then other ones), and that "addi" at then end, seems also need it to have +4 ?
Quote:
I see in your example lr is saved to 20(r1) but maybe that is SysV ABI which is quite different from the one used in WarpOS.
My macroses are straight copy of the WarpOS macroses, just changed those names of registers, so should be "ones used in WarpOS".
EDIT2: btw, such i thing works on PowerASM:
_BlankChunky:
la r26,_BlankChunky
But on VASMPPC give that:
Quote:
error: illegal operand types > la r26,_BlankChunky
I think la is indeed used differently on vasm. it is 'la register1, offset(register2)'. You could use that with r2 (TOC) as being register2 for example.
but for the la used in PowerASM you could use something like
Through as far as i can see PowerASM and VASMPPC_MOT have different syntax in compare with your one. Firstly it didn't have dots at begining, and secondly order reserver, i.e. not " .macro ldaddr register, label", but it's something like "ldaddr macro" (and that all), and then inside of the macros use \1 and \2 for register and label.
At least that how i understand PowerASM/VASMPP_MOT macroses sytax. And if i understand it right, it will be then:
loadaddr macro
lis \1,\2@ha
addi \1,\1,\2@l
endm
as i understan \1 will meanr first arg register, and \2 will mean second arg - string.
@Flash Quote:
A couple of "lis" and "addi" can be a fast solution to replace "la" macro
Why a couple ? It seems it just one string to replace ? I mean on the link you throw there written:
Quote:
LA - Load Address
Syntax: la rD,d(rA)
This is equivalent to addi rD,rA,d
So if i got it right, then my "la r26,_BlankChunky" is just "addi r26,r26,_BlankChunky" , right ?
EDIT: or, what is better is probabaly this one as Hedeon made in macros (And indeed 2 lines):
_BlankChunky:
lis 26,_BlankChunky@ha # load the address of the label
addi r26,26,_BlankChunky@l # in r26 (2 steps)
Through strange why in the url above it said about one single addi.
Edited by kas1e on 2022/4/4 20:15:48 Edited by kas1e on 2022/4/4 20:16:06 Edited by kas1e on 2022/4/4 20:16:22 Edited by kas1e on 2022/4/4 20:17:33 Edited by kas1e on 2022/4/4 20:25:55
You cannot address a memory location like that as the operands are 16 bits. That is why you need 2 lines to load an address. la is used with a 16 bit offset. So you load for example r2 with a 32 bit address with the macro (let's say DataStart). Then you use 16 bit offsets to this pointer with la as in la r3,_BlankChunky-DataStart(r2)
Anyway different ABI lead you to change registers usage.
Seems that difference between PowerAbI and ABI V4, not that big and mostly in the stack layout. So things which works with stack need to be changed (in my case prolog and epilog).
As for registers usage as far as i can see if functions didn't use stack then it may works as it..
Through for example this code crashes for me in some conditions (and in some not) on "lbzx r11,r9,r10":
; cmpi 0,0,a,0
; ble _skip
; (if a<=0 then goto _skip)
xdef DrawIconAlphaAsm
align 4
DrawIconAlphaAsm:
cmpi 0,0,r5,0
ble _end
cmpi 0,0,r6,0
ble _end
andi. r10,r10,0 ; set r10 to 0
_loop1:
mtctr r5
_loop2:
lhz r11,0(r3)
rlwimi r10,r11,0,16,31
lbz r11,0(r4)
addi r4,r4,1 ; maybe get rid of this?
rlwimi r10,r11,0,24,31
lbzx r11,r9,r10 ; CRASH !
stb r11,0(r3)
addi r3,r3,1 ; maybe get rid of this?
bdnz _loop2
add r4,r4,r7
add r3,r3,r8
subi r6,r6,1
cmpi 0,0,r6,0
bgt _loop1
_end:
blr
So maybe this one need ABI v4 adaptation, but at least in terms of registers that what i can see:
r0,r1 and r2 not used there (so no stack pointer, no r2 which different between ABIs)
Registers from r3 to r10 used by arguments, which in both ABIs the same.
The only r11 register mean for PowerABI "volatile, pass static chain if language needs it" and for ABIv4 "volatile, may be used by function linkage", which imho looks ok too, because we port from PowerABIt to V4, not the other way around.
Maybe some of instructions in this code acts differently between ABIs and do change something which make sense ..
r11 is loaded from the address made from r9+r10. r10 is a byte value (seeing rlwinm) and r9 is also not an address (alpha) which gives some undefined address. I am guessing you are getting a DSI error?
@Hedeon There are crashlog, but seems i mess a bit things (or, it just different now), but crash actually on "lbz r11,0(r4)" , but maybe it random. But at least for now 3 times test in a row bring that kind of crash:
Crash log for task "Game_os4.exe"
Generated by GrimReaper 53.19
Crash occured in module Game_os4.exe at address 0x7ED98120
Type of crash: DSI (Data Storage Interrupt) exception
Alert number: 0x80000003
Interesting that the same DrawIconAlpha2() asm function is used in the menu and co, and works.
Also, there are my current "abiv4" prolog, epilog, pushgprs/popgprs and loadadr, so to be sure i not make something bad there (as they may be used by other asm functions which may trash something later in stack, but imho should be ok):
For sake of interest, i just create a simple test case which just do call this DrawIconAlphaAsm, and , via IDA with PPC decompiler made some pseudo-C-byte-code which may help to understand wtf:
DrawIconAlphaAsm.pasm are the same as i show abobe, and main.c just:
#include <stdio.h>
extern void DrawIconAlphaAsm( register unsigned char* opos, register unsigned char*, register int oxpos, register int ypos, register int tpos_add, register int opos_add, register unsigned char* alpha );
unsigned __int16 *__fastcall DrawIconAlphaAsm(unsigned __int16 *result, unsigned __int8 *a2, int a3, int a4, int a5, int a6, int a7)
{
unsigned int v7; // r10
int v8; // ctr
unsigned __int8 v9; // r11