For now LodePaint have strange bug on AOS4 (which not happenes on win32 and linux), which cause a random lockups. Its more or less random for me on peg2, but , as i can see, for RaserX bug happenes more or less stable.
So, i just do some editing and alt, and make lockup happenes, then reboot by magic combo, and "dumpdebugbuffer" say:
Quote:
Dump of context at 0x020991F0 Trap type: DSI exception Machine State (raw): 0x0200F030 Machine State (verbose): [ExtInt on] [User] [FPU on] [IAT on] [DAT on] Instruction pointer: in module kernel+0x00019F4C (0x01819F4C) Crashed process: lodepaint (0x668551A0) DSI verbose error description: Access not found in hash or BAT (page fault) Access was a store operation 0: 66435FDC 64704DC0 39299480 66435FDC 66435FDC 00000000 00000020 00000010 8: 00000001 BCFF5F90 0181D87C 664351D0 28424122 63151240 00000004 80020021 16: 64704FC0 48424124 021B0000 00000000 00000000 021B69A6 020A31DA 00001000 24: 00000000 021B0000 00000000 EFFFE910 00000000 66435FDC 66435FD0 020A2BC8 CR: 88224148 XER: 00000003 CTR: 01819F58 LR: 018304EC DSISR: 42000000 DAR: BCFF5F94
Registers pointing to code: r10: module kernel at 0x0181D87C (section 0 @ 0x1D880)
Then i try one more time, and try to make lockup again, it was happens on absolutly different place of work, and after magic reset, dumpdebugbuffer say to me:
Quote:
Dump of context at 0x020991F0 Trap type: DSI exception Machine State (raw): 0x0200F030 Machine State (verbose): [ExtInt on] [User] [FPU on] [IAT on] [DAT on] Instruction pointer: in module kernel+0x00019F4C (0x01819F4C) Crashed process: lodepaint (0x65EB9630) DSI verbose error description: Access not found in hash or BAT (page fault) Access was a store operation 0: 65DBAFDC 6519C910 3B595F3B 65DBAFDC 65DBAFDC 00000000 00000020 00000000 8: 00000001 DFFFAA20 0181D87C 65DBAB00 24448344 664F2240 651A5FFC 000F4FF8 16: 6519CC08 04000000 021B0000 00000000 00000000 021B69A6 020A31DA 00001000 24: 00000000 021B0000 00000000 EFFFE910 00000000 65DBAFDC 65DBAFD0 020A2BC8 CR: 88228348 XER: 40003E5C CTR: 01819F58 LR: 018304EC DSISR: 42000000 DAR: DFFFAA24
Registers pointing to code: r10: module kernel at 0x0181D87C (section 0 @ 0x1D880)
The adresses of crashes process different , so it happenes in different places of LodePaint as i understand. But error always the same:
DSI verbose error description: Access not found in hash or BAT (page fault) Access was a store operation Also it always pointed on r10: module kernel at 0x0181D87C (section 0 @ 0x1D880) at the end. And Instruction pointer: in module kernel+0x00019F4C (0x01819F4C) always the same too.
The question is: How to fix it, and how to detect where bug is happenes, and did it bug in LodePaint, or in kernel, or anything else ..
Did that outputs show any address, which i can use for addr2line, to found where is bug in programm happenes ? I see only address related to kernel (like in module kernel+0x00019F4C).
Any ideas, suggestions and alt are welcome.
@RaserX Can you please make lockup happenes, then reboot by 3buttons, and then type in console "dumpdebugbuffer" and put output here ?
Trying to use debug version of kernel. The same lockups at random, and the same 1 address in kernel all the time and the same error:
Quote:
Dump of context at 0x020A91F0 Trap type: DSI exception Machine State (raw): 0x0200F030 Machine State (verbose): [ExtInt on] [User] [FPU on] [IAT on] [DAT on] Instruction pointer: in module kernel+0x0001FBE8 (0x0181FBE8) Crashed process: lodepaint (0x65E70C40) DSI verbose error description: Access not found in hash or BAT (page fault) Access was a store operation 0: 63976FDC 62FB58C0 00000000 63976FDC 0200B6DC 00000000 00000020 0000006C 8: 00000001 BCFF6140 7FFFFFFD BCFF6140 24448048 630A5240 62FBEFFC 021C0000 16: 00000000 00000000 021C69A6 00000010 00000010 00001000 02090000 021C0000 24: 01856EB8 00000000 02090000 EFFFE910 02090000 01849678 639761C0 63976FDC CR: 48428048 XER: 00000000 CTR: 0181FC4C LR: 0181FC9C DSISR: 42000000 DAR: BCFF6144
... blablablabl. ....
Dump of context at 0xEFDAC3E0 Trap type: DSI exception Machine State (raw): 0x0200F030 Machine State (verbose): [ExtInt on] [User] [FPU on] [IAT on] [DAT on] Instruction pointer: in module kernel+0x0001FBE8 (0x0181FBE8) Crashed process: lodepaint (0x65E70C40) DSI verbose error description: Access not found in hash or BAT (page fault) Access was a store operation 0: 63976FDC 62FB58C0 00000000 63976FDC 0200B6DC 00000000 00000020 0000006C 8: 00000001 BCFF6140 7FFFFFFD BCFF6140 24448048 630A5240 62FBEFFC 021C0000 16: 00000000 00000000 021C69A6 00000010 00000010 00001000 02090000 021C0000 24: 01856EB8 00000000 02090000 EFFFE910 02090000 01849678 639761C0 63976FDC CR: 48428048 XER: 00000000 CTR: 0181FC4C LR: 0181FC9C DSISR: 42000000 DAR: BCFF6144
... blablabl ...
Registers pointing to code: r4 : module kernel at 0x0200B6DC (section 2 @ 0xB6E0)
With kernel.debug at any lockup addresses always: Instruction pointer: in module kernel+0x0001FBE8 (0x0181FBE8) r4 : module kernel at 0x0200B6DC (section 2 @ 0xB6E0)
So, i thinking that kernel.debug will show me some info maybe by addr2line, and i try found name of section 2 by readelf: Quote:
But i am not sure, that it can cause LockUp. Because if no GR spawned, then its not so important bug imho , which cant cause a lockups ? Imho that hit, only show that on some stage bad allocation of memory are happenes ? For me from that output are hardly to detect where problem starts , and what address use for addr2line. Can anybody help a bit ?
@kas1e If high-tec analysis like this fails, you could try low-tech (and laborious) method of recompiling LodePaint with various parts of the program disabled, and seeing if you can find what is causing the problem... Ideally you end-up disabling everything except what is needed to cause the crash.
That is worst-worst-worst solution for found the bug :) Did that MemGuard output have any info, which point on address where is crash happenes ?
Imho, that MemGuard hit show me the same info as GR do related to stack trace. If so, then, bug happenes in the w3d library in end, and coming from lodepaint, at segment 0004 offset 2dc1fc. Nope ?
Btw, just in interest: why GR show nothing, but MemGuard show that bug ? Bug imho pretty heavy, because if i not run MemGuard, and just trying many times do NEW, then lockup is coming. But GR in silence mode.
I write a message to main author, and he say that he will try to detect where is problem. I send him all that debug outputs and add2line strings, so, maybe it will helps.
The other interesting think, that with running MemGuard at background, i never have any lockup !
Sorry, it is very hard to help here because the DSI exception is caused by the kernel, it is not a direct access from the user application.
Are you sure that is by kernel ? Imho MemGuard hit show that is in LodePaint ?
Quote:
Did you look at the recent changes in the LodePaint code ? Maybe a problem is mentionned and fixed ? Maybe there is a problem with SDL.
Yeah, i have contact with author, and when he will fix it, he will write me mail. Last revision for moment are 94 (in last 3 days), but about bugs we talk with author yesterday first time.
For now he will try to fix all errors which he catch by ValGrind on linux. Strange think that ValGrind not show that "NEW/OK" hit which i have on aos4, but , show hit when he close image :) Anyway, let's wait a bit, firstly he will fix ValGrinds errors, then our NEW/OK hit, and then will see will be still here bugs or not ..
Related to SDL, i think more possible errors related to minigl/warp3d. Just because all DSIs and that HIT, not refering at all to SDL, but refering to lodepaint, minigl and w3d.
I suppose 'dumpdebugbuffer' is no good after turning the power off?
Yep, in that way it will show nothing :( But le'ts wait a bit, i will compile new version today, and if bug will not fixed, we will try to use MemGuard + Shashimi on your config, to see, when problems start to happenes.