Login
Username:

Password:

Remember me



Lost Password?

Register now!

Sections

Who's Online
165 user(s) are online (152 user(s) are browsing Forums)

Members: 1
Guests: 164

hotrod, more...

Support us!

Headlines

 
  Register To Post  

The consistently curious case of the constantly crashing computer
Not too shy to talk
Not too shy to talk


See User information
Hello every one.

I present the consistently curious case of the constantly crashing computer.

So it all started around two weeks ago. I wanted to load up Odyssey on my X1000 OS4 machine but my Internet dock went missing. This is something that has randomly occurred every few years. When a dock would suddenly disappear with no clue as to why and no sign when checking the settings. So I decided to just leave it. Ten tabs open does slow it down.

I then took my X1000 to a friends place for the weekend so he could compare with his Sam system. Soon after he had found some 68K software crashed on my system that worked on his. He also found my video players were old and not good at playing common videos. Guess I never noticed for a while. So he set about rectifying this and installed both 68K and OS4 software. Shogo however played well. But Spencer crashed on load for some reason so I couldn't show that. Even though it had been working fine so this surprised me.

I took my X1000 back home and set it up again. For two nights I just edited files in Cubic IDE and compiled in shell. Pretty boring I suppose given it could play videos again. So I had no issues at that point. The next night I try and open a text file, a crash log as it happens, when ironically doing so instantly crashes! It had crashed MultiViewer. I'd never seen that just crash on a text file. So saved the log and examined it. Some 68K program at the top of stack track which was nameless had crashed which was strange. How did a 68K program get called from native code?

Here's the rogue code from the crash log I needed to trace:
68k disassembly:
 
620649ae6038                 bra.b             0x620649e8
 620649b0
2041                 movea.l           d1,a0
 620649b2
22680014             movea.l           0x14(a0),a1
*620649b62029007c             move.l            0x7c(a1),d0
 620649ba
0c8000001b00         cmpi.l            #0x1b00,d0



So I set about checking files and not knowing it would become an extensive search. I first wanted to compare to backup files. So tried to run CompareDirs and then that crashed! How was the system even usable? Well it wasn't now! I then loaded DirOpus. I compared libs, classes, and devices and couldn't see any major differences apart from extra binaries added. I ran a program I wrote that lists 68K binaries and gathered a list of 68K programs on my system. Checked with DirOpus and again nothing stands out. So in the crash log I decided to copy out a section of the 68K code and do a search. I modified another program I wrote that searched for hex codes to search only in 68K binaries. So ran it over the whole Workbench to catch what 68K program contained it. I plugged in $2029007c from the crash line. Nothing showed up!

Even though I had gone beyond what should be needed to trace a crash, modifying and writing code to scan files, I still needed to go deeper. So I then set about searching the whole filesystem for this one hex code. But, it was just taking too long, and too many irrelevant files were taking up the search. After filtering them to 68K only had failed. So I decided to get out the all round disk monitor for all occasions--DiskMonTools. I really should have got it out first instead of going down a rabbit hole. So I navigated the Workbench in question to search the whole volume for $2029007c. Boom! It found it! I check and it appears to be embedded in some IFF file which is a strange place for 68K code. I look around the code and it's an IFF datatype file. I check out my datatypes and do another hex search. It's a ZX datatype file from 1996!

So I move it out the way and reboot. MultiViewer, CompareDirs and Spencer now start working! A rogue ZX datatype had crashed them. Strangely, MultiView, the core system program for viewing datatypess, didn't crash. ZX is a Spectrum image datatype. I've never had a Spectrum and no interest in viewing ZX files so no idea how a ZX datatype was installed on my system. At first I thought ZX was some kind of compression format.

This is the one:
http://aminet.net/package/util/dtype/ZX_DataType

Next I set about trying to solve the missing Internet dock issue. I examine the settings and and did a diff of XML setup files. As you can see I'm again going too far to find the issue. But OS4 lacks sophisticated tools to scan and find the problem for you. So does modern OS in a lot of respects. When I clicked on my Internet dock it just showed the grey bar. I've checked the settings and compared it with other subdocks. There's only one difference. One difference I noted was a hidden setting in GUI and XML. But I changed it and it made no difference! In the Misc tab the option "Dock is hidden" is off, while others have it on. Given I want to see it, it's funny it would be missing when hidden is set to off. So I tried setting it to hidden. But it wouldn't save it! It kept turning it off.

Then I found the problem! It was that hidden dock setting. I couldn't get it to work from the check box correctly. And saving didn't make any difference. But I accidentally double clicked the tiny bar below the dock which still showed. The icons came back! Then I checked the hidden setting and it turns it on and off. Suddenly the hidden check mark decides to work. So for hidden setting to work the dock needs to be visible or it doesn't work, while ticking on and off, looking like it does. And you need to double click this tiny grey bar to hide and unhide the dock as well as activating the hidden check mark despite it turning on and off. Seriously who does that!? What sort of stupid design is that?

With my Internet dock now visible again I set about loading up a browser. Only to be greeted by Odyssey crashing. FFS what now?!?

The clue this time was some native code calling a Webkit open file routine:
Symbol info:
Instruction pointer 0x7732434C belongs to module "OWB" (PowerPC
Symbol_ZN7WebCore7OWBFile4openEc 0x27C in section 1 offset 0x000CD328

Stack trace
:
    
OWB:_ZN7WebCore7OWBFile4openEc()+0x27c (section 1 0xCD328)
    
OWB:_ZN7WebCore7OWBFile4openEc()+0x334 (section 1 0xCD3E0)
    
OWB:_ZN7WebCore5Image20loadPlatformResourceEPKc()+0x104 (section 1 0xDF910)
    
OWB:T.2747()+0x14c (section 1 0x7873A4)
    
native kernel module newlib.library.kmod+0x00000138
    native kernel module newlib
.library.kmod+0x00002088
    native kernel module newlib
.library.kmod+0x00002d0c
    native kernel module newlib
.library.kmod+0x00002ee8
    OWB
:_start()+0x170 (section 1 0x16C)
    
native kernel module dos.library.kmod+0x000255c8
    native kernel module kernel
+0x000420ac
    native kernel module kernel
+0x000420f4

PPC disassembly
:
 
7732434438a00000   li                r5,0
 77324348
3cc00001   lis               r6,1
*7732434c8009004c   lwz               r0,76(r9)
 
773243507d234b78   mr                r3,r9
 77324354
7c0903a6   mtctr             r0


Both Odyssey and OWB crashed on the same function. Somehow Timberwolf was able to load. Then it too became unstable. So again I start another investigation. Do things appear in threes?

This time the code was fully native. So I couldn't directly trace it to any rogue 68k code. But why did they both crash inside their own routine? Something had to have infected the system like before but somehow had remained more stealth. Was it hidden in plain sight?

I decided to run Snoopy to assist which can also list library and device open calls as well as typical DOS calls. It's not always obvious what is failing as it's also common for functions to fail in normal use, such as looking for ports and environment variables. However amongst the needles in the hay stack as it were something looked strange. A library interface fail.

Here's the log:
01294 : OWB             o.k. = [execOpenLibrary("pthreads.library",0) [20463uS]
01295 : 
OWB             o.k. = [execOpenDevice("timer.device",3,0x570C55E0,0x00000000) = [7uS]
01296 : 
OWB             o.k. = [execOpenLibrary("asyncio.library",0) [16565uS]
01297 : 
OWB             FAIL = [execFindResident("asyncio.library.main") [10uS]
01298 : 
ramlib          FAIL = [execFindResident("asyncio.l.main") [3uS]
01299 : 
AmiDock         :        Delay(1) [22128uS]
01300 OWBLauncher     :        Delay(1) [22297uS]
01301 OWBLauncher     FAIL = [execFindPort("OWB") [3uS]
01302 OWBLauncher     FAIL = [execFindPort("OWB.1") [0uS]
01303 ramlib          FAIL Lock("LIBS:asyncio.l.main",SHARED) [15114uS]
01304 ramlib          FAIL Lock("CLASSES:asyncio.l.main",SHARED) [19uS]
01305 ramlib          FAIL Lock("CURRDIR:asyncio.l.main",SHARED) [12uS]
01306 ramlib          FAIL Lock("CURRDIR:Libs/asyncio.l.main",SHARED) [18uS]
01307 ramlib          FAIL Lock("CURRDIR:Classes/asyncio.l.main",SHARED) [9uS]
01308 : 
ramlib          FAIL Lock("PROGDIR:asyncio.l.main",SHARED) [8uS]
01309 : 
ramlib          FAIL Lock("PROGDIR:Libs/asyncio.l.main",SHARED) [9uS]
01310 ramlib          FAIL Lock("PROGDIR:Classes/asyncio.l.main",SHARED) [9uS]
01311 OWB             FAIL = [execFindResident("asyncio.library.main") [5uS]


The "library.main" and "l.main" suffix are a vestige from when libraries were not fully OS4 native and OS4 interfaces were split into additional library files. Why was it looking for and failing to find an OS4 interface?

So I again compared against a backup. I was familiar with "asyncio" from years ago but it wasn't in my backup. I check my local copy and then it all comes to light. It's a 68k library dated to 1997! What the?! What on earth is that doing there!?

This one here:
http://aminet.net/package/dev/c/AsyncIO

So I found this one there:
http://os4depot.net/?function=showfil ... =utility/misc/asyncio.lha

I copied that in and rebooted. Suddenly everything is working again! Unbelievable!

As i side note part of the issue is the above OpenLibrary() calls are using 0 as version. This is unacceptable on OS4, and since the 90's even, as programs are required to specify a minimum library version. A 0 will default to any version. To be coded correctly for OS4 and above it should use 50 as version which would have caught the above case.

I've never seen a system come unstuck suddenly from a few old rogue files that took ages to track down. It happened over Easter and it was like three days and three nights all over again. Freak file accident. I've looked through my recent downloads and just cannot track down where it came from. Not that I was doing much lately so one reason I didn't get tripped on it sooner. Glad it's over now.

Well, I hope you enjoyed reading the consistently curious case of the constantly crashing computer, as much as I enjoyed writing it, and didn't enjoy experiencing it!

Go to top
Re: The consistently curious case of the constantly crashing computer
Home away from home
Home away from home


See User information
@Hypex

Thanks for letting us know

_______________________________
c64-dual sids, A1000, A1200-060@50, A4000-CSMKIII
Catweasel MK4+= Amazing
! My Master Miggies-Amiga1000 & AmigaONE X1000 !
mancave-ramblings

Go to top
Re: The consistently curious case of the constantly crashing computer
Just can't stay away
Just can't stay away


See User information
@Hypex
Quote:
As i side note part of the issue is the above OpenLibrary() calls are using 0 as version. This is unacceptable on OS4, and since the 90's even, as programs are required to specify a minimum library version. A 0 will default to any version. To be coded correctly for OS4 and above it should use 50 as version which would have caught the above case.
Sorry for the bad excuse, but usually you don't open any libraries yourself but use something like -lauto, or similar solutions for individual libraries not included in libauto.a, instead.
What makes it even worse is that I'm not only the main developer of the AmigaOS 4.x OWB port, but one of the two developers who did the AmigaOS 4.x port/bug fix of asyncio.library as well 🙇

Go to top
Re: The consistently curious case of the constantly crashing computer
Not too shy to talk
Not too shy to talk


See User information
@328gts

Cheers.

Go to top
Re: The consistently curious case of the constantly crashing computer
Not too shy to talk
Not too shy to talk


See User information
@joerg

Quote:
Sorry for the bad excuse, but usually you don't open any libraries yourself but use something like -lauto, or similar solutions for individual libraries not included in libauto.a, instead. What makes it even worse is that I'm not only the main developer of the AmigaOS 4.x OWB port, but one of the two developers who did the AmigaOS 4.x port/bug fix of asyncio.library as well 🙇


Personally I open them all myself but wasn't a nightmare until I decided to write a Reaction GUI and by then it was too late.

But to be fair, aside from libauto not being recommended for pro use, that fault there would lie with libauto itself. Which should internally specify version 50 or above. Not only this, but possibly the OS4 Exec should have thrown a yellow alert if any native code used 0 for version. Which is functionally equivalent to calling OldOpenLibrary() for 35 years now or so.

Regarding asyncio.library, with the OS4 version I found on the Depot, that worked fine. I don't know how an old version found it's way onto my system. I even renamed it out of the way and Odyssey loaded fine. Does OWB actually rely it? I haven't checked what depends are included with it. But both OWB and Odyssey can run without any installing. Which is useful. But I do like that installer for upgrading.


Edited by Hypex on 2024/5/4 10:23:42
Go to top

  Register To Post

 




Currently Active Users Viewing This Thread: 1 ( 0 members and 1 Anonymous Users )




Powered by XOOPS 2.0 © 2001-2024 The XOOPS Project