POWER CPUs (X1000) and newer PPC ones (X5000, X1220, sam460) would require updated CPU specific parts. Ideally it should be done in the kernel performance monitor, but if nobody does it there it could be added to the C libraries instead.
Yeah, Mathias already add a performance monitor to the x5000/tabor just a few weeks ago, with the hope to add remaining CPUs too.
Quote:
If something doesn't work at all with GCC and newlib/clib2 don't forget there is still VBCC with it's vclib as an alternative. Not really usable for porting software, but for AmigaOS native software no big difference, and in case something is missing or doesn't work much easier to add or fix.
Yep, once it will be done for one c lib, it will be easy to almost copy+paste it to any other ones. Now it is only a matter of time before it will be done, Mathias has almost dealt with all that already. He also made good progress on his own amiga-only profiling tool (which also use a performance monitor), but gprof is still good to have as it is widely tested and developed by many people outside of the amiga world + it has tons of nice 3d party scripts/extensions to make a work with it better. And have 2 hardware profilers better than have none :)
Most users of course don't understand the extreme consequences of unusable or missing development tools, but expect to get something like FireFox or Chromium to get ported to AmigaOS, which is nearly impossible without the development and bug fixing tools
Agree. Missing tools make me more nervous than missing applications.
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
49.91 38.66 38.66 1 38.66 38.66 new_func1
49.90 77.31 38.65 1 38.65 77.31 func1
0.19 77.46 0.15 1 0.15 77.46 main
0.00 77.46 0.00 1 0.00 0.00 func2
Call graph
granularity: each sample hit covers 4 byte(s) for 0.01% of 77.46 seconds
index % time self children called name
0.15 77.31 1/1 call_main [2]
[1] 100.0 0.15 77.31 1 main [1]
38.65 38.66 1/1 func1 [3]
0.00 0.00 1/1 func2 [5]
-----------------------------------------------
<spontaneous>
[2] 100.0 0.00 77.46 call_main [2]
0.15 77.31 1/1 main [1]
-----------------------------------------------
38.65 38.66 1/1 main [1]
[3] 99.8 38.65 38.66 1 func1 [3]
38.66 0.00 1/1 new_func1 [4]
-----------------------------------------------
38.66 0.00 1/1 func1 [3]
[4] 49.9 38.66 0.00 1 new_func1 [4]
-----------------------------------------------
0.00 0.00 1/1 main [1]
[5] 0.0 0.00 0.00 1 func2 [5]
-----------------------------------------------
Index by function name
[3] func1 [1] main
[5] func2 [4] new_func1
So it seems something x5k related and something not implemented still in perfomance monitor of x5k kernel... I know that latest Hieronimouse use x5k's perfomance monitor, and it's working, but with gprof it acts like this .. (and same binary acts fine on peg2, with same gcc/binutils/gprof)..
Or maybe some changes need to be done in clib2's part of profiling code ?
Profiling needs CPU specific code, back then only 603, 604, 750 and 64xx CPUs (classic Amigas, A1XE/Pegasos, sam440) were supported. POWER CPUs (X1000) and newer PPC ones (X5000, X1220, sam460) would require updated CPU specific parts. Ideally it should be done in the kernel performance monitor, but if nobody does it there it could be added to the C libraries instead.
Can you explain a bit about : i thinking (before), that once Performance Monitor support is added to the kernel for specific platform, then API are the same, and then it just used from C libraries in the same way does not matter what platform.
But what we have now, turns out that while we do have Performance Monitor in the kernel now for x5000 and tabor (and it's working, as Hieronimous tool used it on those platforms), the CLIB2's libprofile fail to handle it properly for gprof on x5000.
I don't see there were any specific to pegasos2 code, but it surely works on pegasos2's performance monitor, and didn't on x5000's performance monitor (giving me all the 0.00 everywhere, as i have shown in previous snippets).
Do hieronimous use performance.monitor? I remember it was using sampling in the past
Yes of course, Matthias add perfomance.monitor for x5k/tabor in 54.46 kernel, and that what release notes of latest Hieronimous has:
Supported platforms
-------------------
Without entering all the details, there are now 3 modes to acquire data,
based on:
- interrupt server, the initial implementation, on G3/G4 machines and Sam4X0
boards
- performance monitor, the right implementation, that needs its resource built
into the kernel ; this is the default mode
- software timer, quite experimental and created to circumvent some technical
constraints ; this mode aims to disappear, at the moment only there to
provide a kind of working on X1000 (waiting for perf monitor implementation)
So, in terms of supported platforms, we have:
- AmigaOne, Pegasos 2
- Sam 440/460
- A1222/X5000 (new): requires kernel 54.46
- X1000: first implementation, with software timer, one day with the
performance monitor resource
History
--------
Version 0.50 (2022-03-10 "Rebirth release")
- Reworked completely the project, for a cleaner code and a better design
- Reworked options, getting inspiration again from perf on Linux
- Added support of A1222/X5000 (requires kernel 54.46)
- Added support of X1000, that basically works but will be consolidated
- Added a new command 'stat' to collect stats for various profiles (CPU, TLB,
cache, ...) over a given duration
- Improved reporting to easily see in which programs and which functions
CPU time is spent (now sorted list with decreasing percentage)
- Increased default sampling duration to 30 s, possible to stop with CTRL-C
So it definitely uses performance monitor on x5k as well and it's working.
The differences is probably the methods how perfomance monitor is used. In clib2's libprofiles we do have "counter mechanism", but as Matthias says me in mails, it should be changed for X5k/Tabor.. Through, i do not understand why .. Is libprofile code of clib2, is pure PowerPC only based (so peg2, old amigaones) , and some code (maybe that mcoun.s?) is platform specific and need adaptation to pa6t/freescale ?
Are registers different on 64bits CPU (X1000/X50x0)?
There sure should be differences between some, but dunno if this mcount.s is used at all, maybe it keeps for historical reasons, and mcount.c is used instead ..
@joerg Quote:
Do you get the same problem using newlib instead of clib2 as well?
With newlib version, i have the same : have correct values on peg2 kernel, and have 0.00 everywhere on x5k one.
Is this your change? Can it be as simple as changing it to 0x01000000 ?
No, its original code and we didn't know who wrote this. Thomas says it wasn't him, so he dunno as well, but at least he explain why sometime ago it was 0x000000, and then switched to 0x100000 (because of needs to add .so support and co).
Then, we find out that this is of no problems currently, all you need is to compile tools you want to to "gprof'ed" like this for clib2:
So that no problems in general, it can be used as it.
Difficulty which we have now, is that year ago or so Matthias add PerfomanceMonitor support into the kernel for X5000/Tabor as well, and make use of it by his Hiernonimus tool, which prove that PerfomanceMonitor on those platforms seems to be working.
But then, when we tried to use libprofile code be it clib2, or newlib, then everything works on pegasos2 as expected , but didn't works on x5000 producing all "zeroes" instead of actual time values.
In the mails, Matthias says to me that the way used in clib2/newlib to use perfomance monitor based on "mcounts" , while, it should be changed to be working on x5000/tabor as well.
I do not know why it should be changed, and why it didn't work as it on x5000, probably because implementation of PerfomanceMonitor on x5000 differs a bit, and while API the same, still acts different. Dunno. I was expecting things with same API react the same everywhere, but maybe I'm wrong there.
I ask Mathias to share parts of Hieronimous code where he works with X5000 perf monitor, so to see wtf, and we can implement it in clib2 without annoying/waiting Matthias, so for now waiting his answer.
@All We did it ! Click open in new tab for fullsize:
What you see on this small image is x5000 running speed test case compiled with -pg, on latest beta kernel, over latest beta of clib4 by Andrea , via lates binutil's gprof.
So it involved a lot of work from differnet ppls. That what done to make it:
1). Thanks to Matthias (Corto) the OS4 kernel to have performance monitor support now for X5000, Tabor, X1000 (yes, fresh thing) and as before it was for G* CPUs so Pegasos2, A1, mA1 and probably classic machines too (at least nothing changes in that regard). It was a lot of work from him involved to have it all same API, even on CPUs where is no "the same real performance monitor", as some of them not real PPC cpus , but Power and co.
2). Thanks to the big work on the new Clib which we call now CLIB4 (4 for OS4) by Andrea (AfxGroup) and with help of Mathias's test code Andrea add full profiling support to support new perfomance monitor to the CLIB4. That mean new gmon.out format and not old crap, not more hardcoded 0x000000 areas, and all works everywhere as expected.
3). Thanks to big work of the MightyMax on the latest BinUtils, we do have now latest BinUtils :)
So, while CLIB4 and BinUtils are there and can be used right now, as they opensourced, the OS4 update for the kernel will be out when it will be out (i do not know when).
But i feel needs to share it. There really were lot of work put into such a "simple" thing as profiling. Of course, BinUtils work were not for that , but it also helps as you can see (no more "dwarfs" errors, etc).
@Raziel :) It's not like you can run any binary over it : you should build the binary with its support by providing -pg option. Then, when you run a program build with profiling support, and exit from, it will create for you gmon.out file, which, after, you use with gprof utility to see what and how runs.
gprof of course not the only one profiling tool one can use, it's just very widely tested, and had many add-ons and stuff. But we do have Hieronimous (which in latest version works well enough too, but needs as well updated kernel). Or "profyler" from Mike Steed. Gprof there just to help us more as it more tested, widely used, developed by whole world tool and integrated into whole binutils thing.
So, while CLIB4 and BinUtils are there and can be used right now, as they opensourced, the OS4 update for the kernel will be out when it will be out (i do not know when).
You can update newlib without access to the sources. At least in my versions, I don't know anything about newer ones, IIRC the gprof support code wasn't in newlib.library but in libc.a, and the code didn't include anything C library specific. Unless the clib4 code depends on other clib4 specific parts, which would be strange, you can simply use "ar d libc.a gprof_object", "ar a libc.a clib4_gprof_object" to update it. Updating the newlib libc.so without access to the sources should be possible as well.