Hello every one.
I present the
consistently curious case of the constantly crashing computer.
So it all started around two weeks ago. I wanted to load up Odyssey on my X1000 OS4 machine but my Internet dock went missing. This is something that has randomly occurred every few years. When a dock would suddenly disappear with no clue as to why and no sign when checking the settings. So I decided to just leave it. Ten tabs open does slow it down.
I then took my X1000 to a friends place for the weekend so he could compare with his Sam system. Soon after he had found some 68K software crashed on my system that worked on his. He also found my video players were old and not good at playing common videos. Guess I never noticed for a while. So he set about rectifying this and installed both 68K and OS4 software. Shogo however played well. But Spencer crashed on load for some reason so I couldn't show that. Even though it had been working fine so this surprised me.
I took my X1000 back home and set it up again. For two nights I just edited files in Cubic IDE and compiled in shell. Pretty boring I suppose given it could play videos again. So I had no issues at that point. The next night I try and open a text file, a crash log as it happens, when ironically doing so instantly crashes! It had crashed MultiViewer. I'd never seen that just crash on a text file. So saved the log and examined it. Some 68K program at the top of stack track which was nameless had crashed which was strange. How did a 68K program get called from native code?
Here's the rogue code from the crash log I needed to trace:
68k disassembly:
620649ae: 6038 bra.b 0x620649e8
620649b0: 2041 movea.l d1,a0
620649b2: 22680014 movea.l 0x14(a0),a1
*620649b6: 2029007c move.l 0x7c(a1),d0
620649ba: 0c8000001b00 cmpi.l #0x1b00,d0
So I set about checking files and not knowing it would become an extensive search. I first wanted to compare to backup files. So tried to run CompareDirs and then that crashed! How was the system even usable? Well it wasn't now! I then loaded DirOpus. I compared libs, classes, and devices and couldn't see any major differences apart from extra binaries added. I ran a program I wrote that lists 68K binaries and gathered a list of 68K programs on my system. Checked with DirOpus and again nothing stands out. So in the crash log I decided to copy out a section of the 68K code and do a search. I modified another program I wrote that searched for hex codes to search only in 68K binaries. So ran it over the whole Workbench to catch what 68K program contained it. I plugged in $2029007c from the crash line. Nothing showed up!
Even though I had gone beyond what should be needed to trace a crash, modifying and writing code to scan files, I still needed to go deeper. So I then set about searching the whole filesystem for this one hex code. But, it was just taking too long, and too many irrelevant files were taking up the search. After filtering them to 68K only had failed. So I decided to get out the all round disk monitor for all occasions--DiskMonTools. I really should have got it out first instead of going down a rabbit hole. So I navigated the Workbench in question to search the whole volume for $2029007c. Boom! It found it! I check and it appears to be embedded in some IFF file which is a strange place for 68K code. I look around the code and it's an IFF datatype file. I check out my datatypes and do another hex search. It's a ZX datatype file from 1996!
So I move it out the way and reboot. MultiViewer, CompareDirs and Spencer now start working! A rogue ZX datatype had crashed them. Strangely, MultiView, the core system program for viewing datatypess, didn't crash. ZX is a Spectrum image datatype. I've never had a Spectrum and no interest in viewing ZX files so no idea how a ZX datatype was installed on my system. At first I thought ZX was some kind of compression format.
This is the one:
http://aminet.net/package/util/dtype/ZX_DataTypeNext I set about trying to solve the missing Internet dock issue. I examine the settings and and did a diff of XML setup files. As you can see I'm again going too far to find the issue. But OS4 lacks sophisticated tools to scan and find the problem for you. So does modern OS in a lot of respects. When I clicked on my Internet dock it just showed the grey bar. I've checked the settings and compared it with other subdocks. There's only one difference. One difference I noted was a hidden setting in GUI and XML. But I changed it and it made no difference! In the Misc tab the option "Dock is hidden" is off, while others have it on. Given I want to see it, it's funny it would be missing when hidden is set to off. So I tried setting it to hidden. But it wouldn't save it! It kept turning it off.
Then I found the problem! It was that hidden dock setting. I couldn't get it to work from the check box correctly. And saving didn't make any difference. But I accidentally double clicked the tiny bar below the dock which still showed. The icons came back! Then I checked the hidden setting and it turns it on and off. Suddenly the hidden check mark decides to work. So for hidden setting to work the dock needs to be visible or it doesn't work, while ticking on and off, looking like it does. And you need to double click this tiny grey bar to hide and unhide the dock as well as activating the hidden check mark despite it turning on and off. Seriously who does that!? What sort of stupid design is that?
With my Internet dock now visible again I set about loading up a browser. Only to be greeted by Odyssey crashing. FFS what now?!?
The clue this time was some native code calling a Webkit open file routine:
Symbol info:
Instruction pointer 0x7732434C belongs to module "OWB" (PowerPC)
Symbol: _ZN7WebCore7OWBFile4openEc + 0x27C in section 1 offset 0x000CD328
Stack trace:
OWB:_ZN7WebCore7OWBFile4openEc()+0x27c (section 1 @ 0xCD328)
OWB:_ZN7WebCore7OWBFile4openEc()+0x334 (section 1 @ 0xCD3E0)
OWB:_ZN7WebCore5Image20loadPlatformResourceEPKc()+0x104 (section 1 @ 0xDF910)
OWB:T.2747()+0x14c (section 1 @ 0x7873A4)
native kernel module newlib.library.kmod+0x00000138
native kernel module newlib.library.kmod+0x00002088
native kernel module newlib.library.kmod+0x00002d0c
native kernel module newlib.library.kmod+0x00002ee8
OWB:_start()+0x170 (section 1 @ 0x16C)
native kernel module dos.library.kmod+0x000255c8
native kernel module kernel+0x000420ac
native kernel module kernel+0x000420f4
PPC disassembly:
77324344: 38a00000 li r5,0
77324348: 3cc00001 lis r6,1
*7732434c: 8009004c lwz r0,76(r9)
77324350: 7d234b78 mr r3,r9
77324354: 7c0903a6 mtctr r0
Both Odyssey and OWB crashed on the same function. Somehow Timberwolf was able to load. Then it too became unstable. So again I start another investigation. Do things appear in threes?
This time the code was fully native. So I couldn't directly trace it to any rogue 68k code. But why did they both crash inside their own routine? Something had to have infected the system like before but somehow had remained more stealth. Was it hidden in plain sight?
I decided to run Snoopy to assist which can also list library and device open calls as well as typical DOS calls. It's not always obvious what is failing as it's also common for functions to fail in normal use, such as looking for ports and environment variables. However amongst the needles in the hay stack as it were something looked strange. A library interface fail.
Here's the log:
01294 : OWB : o.k. = [exec] OpenLibrary("pthreads.library",0) [20463uS]
01295 : OWB : o.k. = [exec] OpenDevice("timer.device",3,0x570C55E0,0x00000000) = 0 [7uS]
01296 : OWB : o.k. = [exec] OpenLibrary("asyncio.library",0) [16565uS]
01297 : OWB : FAIL = [exec] FindResident("asyncio.library.main") [10uS]
01298 : ramlib : FAIL = [exec] FindResident("asyncio.l.main") [3uS]
01299 : AmiDock : Delay(1) [22128uS]
01300 : OWBLauncher : Delay(1) [22297uS]
01301 : OWBLauncher : FAIL = [exec] FindPort("OWB") [3uS]
01302 : OWBLauncher : FAIL = [exec] FindPort("OWB.1") [0uS]
01303 : ramlib : FAIL = Lock("LIBS:asyncio.l.main",SHARED) [15114uS]
01304 : ramlib : FAIL = Lock("CLASSES:asyncio.l.main",SHARED) [19uS]
01305 : ramlib : FAIL = Lock("CURRDIR:asyncio.l.main",SHARED) [12uS]
01306 : ramlib : FAIL = Lock("CURRDIR:Libs/asyncio.l.main",SHARED) [18uS]
01307 : ramlib : FAIL = Lock("CURRDIR:Classes/asyncio.l.main",SHARED) [9uS]
01308 : ramlib : FAIL = Lock("PROGDIR:asyncio.l.main",SHARED) [8uS]
01309 : ramlib : FAIL = Lock("PROGDIR:Libs/asyncio.l.main",SHARED) [9uS]
01310 : ramlib : FAIL = Lock("PROGDIR:Classes/asyncio.l.main",SHARED) [9uS]
01311 : OWB : FAIL = [exec] FindResident("asyncio.library.main") [5uS]
The "library.main" and "l.main" suffix are a vestige from when libraries were not fully OS4 native and OS4 interfaces were split into additional library files. Why was it looking for and failing to find an OS4 interface?
So I again compared against a backup. I was familiar with "asyncio" from years ago but it wasn't in my backup. I check my local copy and then it all comes to light. It's a 68k library dated to 1997! What the?! What on earth is that doing there!?
This one here:
http://aminet.net/package/dev/c/AsyncIOSo I found this one there:
http://os4depot.net/?function=showfil ... =utility/misc/asyncio.lhaI copied that in and rebooted. Suddenly everything is working again! Unbelievable!
As i side note part of the issue is the above OpenLibrary() calls are using 0 as version. This is unacceptable on OS4, and since the 90's even, as programs are required to specify a minimum library version. A 0 will default to any version. To be coded correctly for OS4 and above it should use 50 as version which would have caught the above case.
I've never seen a system come unstuck suddenly from a few old rogue files that took ages to track down. It happened over Easter and it was like three days and three nights all over again. Freak file accident. I've looked through my recent downloads and just cannot track down where it came from. Not that I was doing much lately so one reason I didn't get tripped on it sooner. Glad it's over now.
Well, I hope you enjoyed reading the
consistently curious case of the constantly crashing computer, as much as I enjoyed writing it, and didn't enjoy experiencing it!