Found that new feature of 3.x radeon driver about breaking 256mb barier for video memory behave buggy, want to discuss if anyone can something to share about. I.e. it use GPU video memory indeed, but once you fill more than 256mb of that new available GPU video memory -> crash/lockup.
Its like, we didn't have 2GB of video memory on video card, but instead 512 : first 256 taken by system, second 256 are used by radeonHD3.x fine (so we "break that barier" as expected), but then everything which go futher of those "second" 256 mb of GPU video memory -> crash.
Tested on X5000/20 with RadeonHD r7-250 (HD verde) which have 2 GB of video memory. RadeonHD 3.6, Warp3DNova.library 1.65, graphics.library 54.247
Short story:
When you fill up more that 256 MB of second chunk of GPU video memory (i.e first 256mb chunk taken by system all fine, second 256mb chunk is used as extendent by GPU), then we have very weird lockup with everything crashed heavy, and sometime can see visuall distortion (not distortion when something wrong, but just distortion like gfx card behave bad and everything die).
Issue found when i working on port of fresh SuperTuxKart (which need more than 300mb of video memory) , and later i confirm that is that issue we have before with Huno's Return To Castle Wolfenstein Reborn port when details are set to high. Some may remember how it crashes on high details on their RadeonHD SI, there is that old thread: http://www.amigans.net/modules/xforum ... e=&order=ASC&type=&mode=0
Crashes always differes, sometime it sata task, sometime usb, sometime amidock, sometime input, sometime everything at once.
Long story:
While porting new SuperTuxKart, found that in some heavy levels i just have some strange crashes. The game in 1024x768x32 in full details sometime want about 300mb and more megabyte of video memory.
So, After long invisigation i found, that only i reach ~256mb of used GPU memory i have a lockup. First 256 mb of video memory taken by system, and then next 256mb of video memory are used also fine as extendednt, but once you fill those second 256 mb, and want more => crash.
Some time i can see that it this second "extendend" block i reach 250mb and lockup, sometime i can see 252, one time i can see 258 before lockup happens. But as i check used GPU memory with SYSMON (so its not auto-realtime, and i had to press "refresh" button which mean results can be not that accurate), so probably issue is exactly about ~256 barier of second chunk of extendent video memory.
Visually everything can just stops. Intuition crashed, USB crashed, AmiDock crashed, input crashes, sata crashes. No stack trases most of time, or some weird one. And almost all the time when lockup happens, i can catch on serial that DSISR: 00000000, & Exception Syndrome Register (ESR): 0x00000000. Everything the same as when Huno's RCTW Reboorn crashes on high details, so its the same bug 99.9%
I then use all the debug version of everything (warp3Dnova , RadeonHD) to test SuperTuxKart, and that what i have before lockup happens (i.e. when i full fill up that second 256mb chunk used by GPU and want more (so go up over 512mb in whole):
RadeonHD:
RadeonHD.chip (5): rhdRMCtxCopyToVRAM called
RadeonHD.chip (5): rhdRMCtxLockBuffer called (0x617252F8)
RadeonHD.chip (5): rhdRMCtxUnlockBuffer called (0x617252F8, 62492)
RadeonHD.chip (5): rhdRMCtxCopyToVRAM done
RadeonHD.chip (5): rhdRMCtxUnlockBufferInternal called (0x617252F8)
RadeonHD.chip (5): rhdRMCtxUnlockBufferInternal done.
RadeonHD.chip (5): rhdRMCtxCopyToVRAM called
RadeonHD.chip (5): rhdRMCtxLockBuffer called (0x617252F8)
RadeonHD.chip (5): rhdRMCtxUnlockBuffer called (0x617252F8, 62493)
RadeonHD.chip (5): rhdRMCtxCopyToVRAM done
RadeonHD.chip (5): rhdRMCtxUnlockBufferInternal called (0x617252F8)
RadeonHD.chip (5): rhdRMCtxUnlockBufferInternal done.
RadeonHD.chip (5): rhdRMCtxCopyToVRAM called
RadeonHD.chip (5): rhdRMCtxLockBuffer called (0x617252F8)
RadeonHD.chip (5): rhdRMCtxUnlockBuffer called (0x617252F8, 62494)
RadeonHD.chip (5): rhdRMCtxCopyToVRAM done
RadeonHD.chip (5): rhdRMCtxUnlockBufferInternal called (0x617252F8)
RadeonHD.chip (5): rhdRMCtxUnlockBufferInternal done.
RadeonHD.chip (5): rhdRMCtxCopyToVRAM called
RadeonHD.chip (5): rhdRMCtxLockBuffer called (0x617252F8)
RadeonHD.chip (5): rhdRMCtxUnlockBuffer called (0x617252F8, 62495)
RadeonHD.chip (5): rhdRMCtxCopyToVRAM done
RadeonHD.chip (5): rhdRMCtxAllocBuffer called
RadeonHD.chip (5): rhdRMReadExecutionID called
RadeonHD.chip (4): executionID: 62495
RadeonHD.chip (5): rhdRMCtxUnlockBufferInternal called (0x617252F8)
RadeonHD.chip (5): rhdRMCtxUnlockBufferInternal done.
RadeonHD.chip (5): rhdRMMAllocIntBuffer called
RadeonHD.chip (5): rhdRMCtxCopyToVRAM called
RadeonHD.chip (5): rhdRMCtxLockBuffer called (0x61725298)
RadeonHD.chip (5): rhdRMCtxUnlockBuffer
[HAL_DfltTrapHandler] *** Warning: Fatal exception in task 0x65FC93A0 (Shell Process, etask = 0xEFD3D6C0) at ip 0x01E51048
<lockup>
So, as can be seen, we upload, generate and bind lots of textures of different size. Then, when everything on that stage reach 256mb of used GPU memory , BAH.
Once i do ANYTHING with the game so it will not come close to ~256 mb of used GPU memory, then i never has any lockup. That can be redusing of resolution, redusing of texture quality, i.e. everything just to not fill first 256mb of GPU video memory.
And that mean exactly GPU video memory, usuall one (i.e. that which used by workbench) didn't filled a lot, it have on moment of crash for about 200 mb free of that "system video memory"
And as i said, its the same issue which happens with RCTW Reboorn when we crash on high detals with all those strange crashes in SATA, in Amidock, etc: We didn't crash on low settings because GPU memory didn't fill first 256mb , but crash on high details because it fill first 256mb of memory and want more.
My very wild guess, is that "bank switching" or "page flipping" or whatever techniq is used to handle GPU memory are fail in RadeonHD. At least, on SI. I can't test RX as didn't have one, so it can be only SI driver issue.
I will create a bugreport on Mantis for RadeonHD, but maybe someone else will bring something about too, so to have more info.
For example, interesting to know, what is difference between first 256 mb of GPU video memory and higher ones in terms of code. Is it somehow "splitted" by code and have some different handling per 256 mb banks ?
Edited by kas1e on 2019/9/27 8:04:48 Edited by kas1e on 2019/9/27 8:09:56 Edited by kas1e on 2019/9/27 8:12:54 Edited by kas1e on 2019/9/27 8:16:19 Edited by kas1e on 2019/9/27 8:20:11
@kas1e Great analysis and explain on that. Thank you for all the hard work and tests you do.
I would like to mention that I had the same crashes with Huno's Return To Castle Wolfenstein Reborn port on my X5000/40, but there were peoplem Huno included, that the game is running just fine on X1000. I haven't tested on mine yet. Maybe there is a problem that has to do with the mobo as well.
@Walkero Is x1000 was with radeonhd3.6 and all settings was set to maximum in rctw reboorn ?
@all Is there someone with x1000 who want to test some things ? Need to run reborn on all maximum settings, then run in another screen latest sysmon from z-tools, and in last tab "system" there will be info about video memory, so to make screenshoot of sysmon with video info when rctw reboorn running at max details
I compile it over gl4es (so ogles2, warp3dnova in use), and can reproduce the crash easy. Till i fill only 256 and no more (so any amount of textures, but no more than 256 mb is filled) all is fine. Once i add one more texture so it should be placed in 257 mb of GPU then CRASH and burn !
Now i will start to reduce components. First i will try to create pure ogles2 example (without gl4es involved), then if issue still here, then pure warp3dnova example.
Build it just like "gcc -Wall test.c -o test -lauto"
As can be seen in comments, when we will 764 textures of our size we will fully 256 mb , but not cross the line : system operational, all fine. But then, once we add a little bit and cross 256mb line, then BAH !
That on x5000.
If anyone can try it on x1000, will be good to know.. As well as with Sysmon's System tab opened, and auto-refresh mark ticked. The field to watch "Used GPU".
An another progress: I create just pure MiniGL example which also fill the GPU memory with little textures , and it also the same happy crash/lockup. There is test code for anyone to try, compile it like "gcc test.c -o test -lGL -lauto"
while ((imsg = (struct IntuiMessage *)IExec->GetMsg(win->UserPort)))
{
if (imsg->Class == IDCMP_CLOSEWINDOW)
bRun = FALSE;
IExec->ReplyMsg((struct Message *)imsg);
}
}
mglDeleteContext();
}
dropInterface((struct Interface *)IMiniGL);
return 0;
}
At least on my setup when i fill GPU memory by 764 textures, i fill whole 256 mb, but didn't cross the line. Once i put 765 textures (so, it need a little bit more than 256mb), then crash lockup.
Will be very intersting if anyone can try the same on X1000. And just put there, let's say 1000 textures, and watch in Sysmon how many memory it will eat. Then try 2000 and so on, to see, when it start to crash on x1000 (if it will not crash when cross 256mb line).
I also tried to run some minigl games, till the moment when i fill 256mb chunk in GPU memory, and the same crash.
So taking in account that it happens and with minigl and with ogles2, we can made a safe assumtion that its or Warp3dNova, or , RadeonHD. And issue probably RadeonHD as this is the code which break those 256mb bariers.
I talk to ptitSeb about, he of course can't know how it all done, but he have a guess, that it is some limit in the GPU memory to CPU memory mapping.
Now need to know details about x1000 tests of that minigl example (so everyone can compile it). I.e. will it crash when you fill more than 256 mb, and if not, will it crash when fill 512mb, or when.
@All So, another test which cost me 200$ : i buy RadeonRX 570 8gb , only to find out that there is NO LOCKUP when we pass 256mb barier of extendent GPU memory.
I even tried to put that amount of textures, which fill 512mb of GPU memory : still working. Then i tried to fill 1500mb (so 1.5gb) - all works. System still operate as intendent. Then 2gb all fine, 3gb all fine, 3.5gb all fine, then once i fill more than 3768 mb (so +256 of first system one = 4gb) , things got visually distored, but even then, there no lockup/crash.
Also while sysmon report that there is 8 gb of GPU memory, it still report that available memory only 4gb. But, that probably expected to be like this and not the problem by itself for RadeonRX
The main point now, is that passing of 256mb of extendend GPU video memory didn't crash/lockup on RadeonRX or any other amount of memory, even more than 4gb only make visuall distortion but no crashes/lockups. Issues with 256mb barier of GPU memory is happens only with RadeonHD.
Which mean, it is not x5000 problem, but of RadeonHD on x5000. Maybe there some bug which only happens to be on x5000 and X1000 users just lucky (maybe because they have DMA in graphics.library and that somehow shift issue, but that need to be checked on x1000).
@All I create simple test case , plz , everyone who have RadeonHD 3.x (only 3.x) on any hardware (expectually need to know about x1000, sams and tabor), give it a go and report there.
There you will find binary "test" (together with sources, if one need to check what it do).
Binary doing only that : by usage of ogles2.library and warp3dnova it fill the GPU memory by some little textures. I.e. if you run it like this:
work:radeonhd_check:> test 300
then it will put about 300 textures whole size of which will be ~100MB , so 100MB of GPU memory will be filled.
Then if you do "test 600" , it will take then 200mb of GPU memory.
So, on x5000, once you fill first 256 and then go to the next chunk , it simple lockups. That around 765 textures.
What i need from anyone who willing to help, is to run it just like this:
test 300 : works or not (should everwhere) test 600 : works or not (should everywhere) test 900: works or not (now we fill already 256mb of GPU, and already at ~300)
then:
test 1600 : works or not (there we test if 512mb of GPU memory is used fine) test 2400 : works or not (there we more than 768mb of GPU memory) test 3100 : works or not (there we more than 1024mb of GPU memory)
So that probably will be enough to check if first 1GB of GPU video memory working/not working.
By working it mean it will after few seconds create a simple empty window, which you can then close after. No crashes or lockups should be (on x5000 it is, but let's see).
The easy way to monitor how much of GPU data is taken while example works, is to use zzd10h's Sysmon v6.3 from Z-Tools , and with hitting "auto-refresh" button in the System/system tab. That how it looks like:
(press open in new tab for fullsize)
There i filled about 2GB of GPU memory on my new RadeonRX 570 (Polaris10), by doing "test 6000" (see field "Used GPU")
Remember, those tests only for 3.x version ! Only ! And ogles2.library should be installed together with warp3dnova.library, of course.
So when doing tests plz write back there with name of your hardware, graphics card and all results from those tests.
test 300-ok- used 100MB (fromSysMon) test 600-ok- 201MB test 600-ok- 301MB test 900-ok- 301MB test 1600-ok- 535MB test 2400-ok- 803MB test 3100-ok- 1037MB
btw, Huno's latest port of Return to Castle Wolfenstein flies on my X1000 pint:
@All Thanks. So many x1000 didn't have that problem, and one sam460 too. Need a little bit more of other Sam results (maybe with sam440 too, etc).
As a pattern there i can see only that for x1000 and for SAMx there is in graphics.library we have "Dma cpu hack" which speed up things in graphics.library, but for x5000 we didn't. Maybe that somehow workaround the bug we have on x5000 by doing another way of GPU<->CPU memory remapping. So will try to check on beta-list with some other hardware (like Tabor and x5000/040).
@kas1e Tested on my X5000/40 and everything from 770 and above crashes instantly my system, which needs a hard reset, With 760 around 254MB were reserved.
I couldn't make graphics.library crash. There was also some trouble to get VRAM actually consumed. Blitting to a window did the trick. http://capehill.kapsi.fi/vmem/
..And now I notice that there is a print bug on line 85: one extra %lu.
@Capehill Tried to run your test case : it show grey window, but then i can't see it fill extendent GPU video memory, it fill instead first 256mb block wher we have workbench/system/etc (i.e. "system graphics memory of first 256mb chunk").
I remove extra %lu, added "IDOS->Delay(100)"; before freeing of bitmaps, then run it, and GPU memory is not filled for sure. Instead, it use first 256 mb block (which is taken by workbench , by system , etc).
So, with that test case you didn't fill extendend GPU memory, only first 256 mb block. Our issue is second one , which is first for extendent GPU memory.
I even can see via SysMon, that when i run your example for example with 100 bitmaps, it eat 25mb of "system gfx" memory, not the 25mb of GPU extendent memory where we have our issues.
Test case need somehow be told that we need to put data inside of extendent GPU memory exactly (that one which after first 256 mb of "system gfx memory" , so its second and more chunk of whole VRAM)
EDIT: maybe we even can't fill up GPU extendent video memory via graphics.library, because its in hands of Hyperion, and "breaking of 256mb barier" was added to RadeonHD and warp3dnova, so i assume, its doing something which graphics.library can't / don't do. So , we can only fill it by warp3dnova functions, probably.
Edited by kas1e on 2019/9/29 12:25:23 Edited by kas1e on 2019/9/29 12:26:05 Edited by kas1e on 2019/9/29 13:07:16