That leaves GCC as the unknown quantity. I haven't installed GCC 11 yet, so I can't try it to see what difference I get. Maybe you could try running the test programs as provided with Profyler, which are compiled with GCC 8.3.0, and comparing those results to the ones you get when they're compiled with GCC 11.1.0.
There my output from GCC 11.1, library and test cases build as newlib ones, and all running from Workbench by double-clicking:
Profile Data for MultiThread
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 1.072 s 100.0 % 1.072 s 1.001 s 93.4 % 1.001 s
MyPrintf 2 71.237 ms 6.6 % 35.618 ms 71.237 ms 6.6 % 35.618 ms
Profile Data for SOLibTest
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 74.741 ms 100.0 % 74.741 ms 3.368 us 0.0 % 3.368 us
MyLib_Hello 1 117.052 us 0.2 % 117.052 us 1.123 us 0.0 % 1.123 us
MyLib_MyPrintf 2 74.735 ms 100.0 % 37.368 ms 74.735 ms 100.0 % 37.368 ms
MyLib_World 1 74.621 ms 99.8 % 74.621 ms 1.043 us 0.0 % 1.043 us
Profile Data for LinkLibTest
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 65.936 ms 100.0 % 65.936 ms 2.085 us 0.0 % 2.085 us
MyLib_Hello 1 117.613 us 0.2 % 117.613 us 1.083 us 0.0 % 1.083 us
MyLib_MyPrintf 2 65.931 ms 100.0 % 32.966 ms 65.931 ms 100.0 % 32.966 ms
MyLib_World 1 65.816 ms 99.8 % 65.816 ms 1.203 us 0.0 % 1.203 us
Profile Data for Recursion
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
CR 2 61.704 ms 99.8 % 30.852 ms 2.286 us 0.0 % 1.143 us
main 1 61.856 ms 100.0 % 61.856 ms 1.644 us 0.0 % 1.644 us
MyPrintf 13 61.836 ms 100.0 % 4.757 ms 61.836 ms 100.0 % 4.757 ms
Recurse 11 8.402 ms 13.6 % 763.774 us 13.875 us 0.0 % 1.261 us
Wazzup 1 61.073 ms 98.7 % 61.073 ms 1.604 us 0.0 % 1.604 us
Profile Data for StructorPlus
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
__static_initialization_and_des 2 62.658 ms 99.0 % 31.329 ms 2.566 us 0.0 % 1.283 us
_GLOBAL__sub_D_TheGlobalClass 1 181.092 us 0.3 % 181.092 us 1.043 us 0.0 % 1.043 us
_GLOBAL__sub_I_TheGlobalClass 1 62.480 ms 98.7 % 62.480 ms 2.045 us 0.0 % 2.045 us
Hello() 1 177.362 us 0.3 % 177.362 us 1.043 us 0.0 % 1.043 us
main 1 637.791 us 1.0 % 637.791 us 6.216 us 0.0 % 6.216 us
MyPrintf(char*) 5 63.281 ms 100.0 % 12.656 ms 63.281 ms 100.0 % 12.656 ms
TestClass::TestClass() 2 62.731 ms 99.1 % 31.366 ms 3.128 us 0.0 % 1.564 us
TestClass::~TestClass() 2 378.504 us 0.6 % 189.252 us 2.045 us 0.0 % 1.023 us
And there is results if i run things from shell:
Profile Data for MultiThread
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 1.005 s 100.0 % 1.005 s 1.003 s 99.7 % 1.003 s
MyPrintf 2 2.757 ms 0.3 % 1.378 ms 2.757 ms 0.3 % 1.378 ms
Profile Data for SOLibTest
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 2.432 ms 100.0 % 2.432 ms 2.767 us 0.1 % 2.767 us
MyLib_Hello 1 135.498 us 5.6 % 135.498 us 1.083 us 0.0 % 1.083 us
MyLib_MyPrintf 2 2.427 ms 99.8 % 1.213 ms 2.427 ms 99.8 % 1.213 ms
MyLib_World 1 2.293 ms 94.3 % 2.293 ms 962 ns 0.0 % 962 ns
Profile Data for LinkLibTest
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 2.384 ms 100.0 % 2.384 ms 1.644 us 0.1 % 1.644 us
MyLib_Hello 1 137.583 us 5.8 % 137.583 us 1.524 us 0.1 % 1.524 us
MyLib_MyPrintf 2 2.380 ms 99.8 % 1.190 ms 2.380 ms 99.8 % 1.190 ms
MyLib_World 1 2.245 ms 94.2 % 2.245 ms 962 ns 0.0 % 962 ns
Profile Data for Recursion
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
CR 2 5.196 ms 96.8 % 2.598 ms 1.845 us 0.0 % 923 ns
main 1 5.366 ms 100.0 % 5.366 ms 1.604 us 0.0 % 1.604 us
MyPrintf 13 5.347 ms 99.6 % 411.333 us 5.347 ms 99.6 % 411.333 us
Recurse 11 32.991 ms 614.8 % 2.999 ms 13.915 us 0.3 % 1.265 us
Wazzup 1 2.351 ms 43.8 % 2.351 ms 1.564 us 0.0 % 1.564 us
Profile Data for StructorPlus
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
__static_initialization_and_des 2 4.131 ms 40.6 % 2.066 ms 2.206 us 0.0 % 1.103 us
_GLOBAL__sub_D_TheGlobalClass 1 1.932 ms 19.0 % 1.932 ms 1.003 us 0.0 % 1.003 us
_GLOBAL__sub_I_TheGlobalClass 1 2.202 ms 21.7 % 2.202 ms 1.644 us 0.0 % 1.644 us
Hello() 1 2.093 ms 20.6 % 2.093 ms 1.083 us 0.0 % 1.083 us
main 1 6.036 ms 59.4 % 6.036 ms 5.454 us 0.1 % 5.454 us
MyPrintf(char*) 5 10.154 ms 99.8 % 2.031 ms 10.154 ms 99.8 % 2.031 ms
TestClass::TestClass() 2 4.176 ms 41.1 % 2.088 ms 2.446 us 0.0 % 1.223 us
TestClass::~TestClass() 2 3.891 ms 38.3 % 1.946 ms 2.165 us 0.0 % 1.083 us
As i say all builds with newlib, but when i build everything for clib2 , then things didn't changes much.
Tested also newlib builds on GCC 10.3.0. In general results are the same.
Edited by kas1e on 2022/1/23 12:50:45 Edited by kas1e on 2022/1/23 14:28:28 Edited by kas1e on 2022/1/23 14:32:00
That leaves GCC as the unknown quantity. I haven't installed GCC 11 yet, so I can't try it to see what difference I get. Maybe you could try running the test programs as provided with Profyler, which are compiled with GCC 8.3.0, and comparing those results to the ones you get when they're compiled with GCC 11.1.0.
Ok here we go:
Note: In order to execute "SOLibTest" from your original Tests (GCC 8.3.0 built) i had to use my GCC 11.1.0 built "MyLib.so" because it is not included.
Original Tests launched from Workbench:
Profile Data for MultiThread
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 1.175 s 100.0 % 1.175 s 1.002 s 85.3 % 1.002 s
MyPrintf 2 172.963 ms 14.7 % 86.481 ms 172.963 ms 14.7 % 86.481 ms
Profile Data for SOLibTest
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 98.380 ms 100.0 % 98.380 ms 2.286 us 0.0 % 2.286 us
MyLib_Hello 1 85.012 us 0.1 % 85.012 us 1.083 us 0.0 % 1.083 us
MyLib_MyPrintf 2 98.376 ms 100.0 % 49.188 ms 98.376 ms 100.0 % 49.188 ms
MyLib_World 1 98.293 ms 99.9 % 98.293 ms 1.163 us 0.0 % 1.163 us
Profile Data for LinkLibTest
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 116.627 ms 100.0 % 116.627 ms 1.644 us 0.0 % 1.644 us
MyLib_Hello 1 58.305 us 0.0 % 58.305 us 1.123 us 0.0 % 1.123 us
MyLib_MyPrintf 2 116.623 ms 100.0 % 58.311 ms 116.623 ms 100.0 % 58.311 ms
MyLib_World 1 116.567 ms 99.9 % 116.567 ms 1.083 us 0.0 % 1.083 us
Profile Data for Recursion
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
CR 2 57.273 ms 99.8 % 28.636 ms 2.045 us 0.0 % 1.023 us
main 1 57.363 ms 100.0 % 57.363 ms 1.805 us 0.0 % 1.805 us
MyPrintf 13 57.343 ms 100.0 % 4.411 ms 57.343 ms 100.0 % 4.411 ms
Recurse 11 13.898 ms 24.2 % 1.263 ms 13.875 us 0.0 % 1.261 us
Wazzup 1 56.082 ms 97.8 % 56.082 ms 1.805 us 0.0 % 1.805 us
Profile Data for StructorPlus
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
__static_initialization_and_des 2 76.850 ms 96.6 % 38.425 ms 2.366 us 0.0 % 1.183 us
_GLOBAL__sub_D_TheGlobalClass 1 561.761 us 0.7 % 561.761 us 1.163 us 0.0 % 1.163 us
_GLOBAL__sub_I_TheGlobalClass 1 76.292 ms 95.9 % 76.292 ms 1.965 us 0.0 % 1.965 us
Hello() 1 1.520 ms 1.9 % 1.520 ms 1.203 us 0.0 % 1.203 us
main 1 2.675 ms 3.4 % 2.675 ms 5.293 us 0.0 % 5.293 us
MyPrintf(char*) 5 79.511 ms 100.0 % 15.902 ms 79.511 ms 100.0 % 15.902 ms
TestClass::TestClass() 2 76.889 ms 96.7 % 38.444 ms 2.847 us 0.0 % 1.424 us
TestClass::~TestClass() 2 1.109 ms 1.4 % 554.303 us 2.366 us 0.0 % 1.183 us
Original Tests launched from Shell:
Profile Data for MultiThread
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 1.001 s 100.0 % 1.001 s 1.001 s 99.9 % 1.001 s
MyPrintf 2 554.623 us 0.1 % 277.312 us 554.623 us 0.1 % 277.312 us
Profile Data for SOLibTest
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 250.665 us 100.0 % 250.665 us 2.406 us 1.0 % 2.406 us
MyLib_Hello 1 74.466 us 29.7 % 74.466 us 1.083 us 0.4 % 1.083 us
MyLib_MyPrintf 2 246.134 us 98.2 % 123.067 us 246.134 us 98.2 % 123.067 us
MyLib_World 1 173.793 us 69.3 % 173.793 us 1.043 us 0.4 % 1.043 us
Profile Data for LinkLibTest
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 250.384 us 100.0 % 250.384 us 1.604 us 0.6 % 1.604 us
MyLib_Hello 1 74.185 us 29.6 % 74.185 us 1.323 us 0.5 % 1.323 us
MyLib_MyPrintf 2 246.415 us 98.4 % 123.208 us 246.415 us 98.4 % 123.208 us
MyLib_World 1 174.595 us 69.7 % 174.595 us 1.043 us 0.4 % 1.043 us
Profile Data for Recursion
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
CR 2 841.098 us 88.8 % 420.549 us 2.125 us 0.2 % 1.063 us
main 1 947.162 us 100.0 % 947.162 us 1.764 us 0.2 % 1.764 us
MyPrintf 13 927.954 us 98.0 % 71.381 us 927.954 us 98.0 % 71.381 us
Recurse 11 7.383 ms 779.4 % 671.146 us 13.794 us 1.5 % 1.254 us
Wazzup 1 259.768 us 27.4 % 259.768 us 1.524 us 0.2 % 1.524 us
Profile Data for StructorPlus
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
__static_initialization_and_des 2 1.851 ms 59.8 % 925.669 us 2.366 us 0.1 % 1.183 us
_GLOBAL__sub_D_TheGlobalClass 1 1.545 ms 49.9 % 1.545 ms 1.203 us 0.0 % 1.203 us
_GLOBAL__sub_I_TheGlobalClass 1 309.331 us 10.0 % 309.331 us 2.005 us 0.1 % 2.005 us
Hello() 1 157.112 us 5.1 % 157.112 us 1.163 us 0.0 % 1.163 us
main 1 1.241 ms 40.1 % 1.241 ms 4.892 us 0.2 % 4.892 us
MyPrintf(char*) 5 3.079 ms 99.5 % 615.760 us 3.079 ms 99.5 % 615.760 us
TestClass::TestClass() 2 492.067 us 15.9 % 246.034 us 2.526 us 0.1 % 1.263 us
TestClass::~TestClass() 2 2.436 ms 78.7 % 1.218 ms 2.406 us 0.1 % 1.203 us
Original Tests launched from Shell via run:
Profile Data for MultiThread
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 1.006 s 100.0 % 1.006 s 1.002 s 99.6 % 1.002 s
MyPrintf 2 3.944 ms 0.4 % 1.972 ms 3.944 ms 0.4 % 1.972 ms
Profile Data for SOLibTest
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 1.138 ms 100.0 % 1.138 ms 2.366 us 0.2 % 2.366 us
MyLib_Hello 1 70.416 us 6.2 % 70.416 us 1.083 us 0.1 % 1.083 us
MyLib_MyPrintf 2 1.134 ms 99.6 % 566.794 us 1.134 ms 99.6 % 566.794 us
MyLib_World 1 1.065 ms 93.6 % 1.065 ms 1.083 us 0.1 % 1.083 us
Profile Data for LinkLibTest
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 1.075 ms 100.0 % 1.075 ms 1.604 us 0.1 % 1.604 us
MyLib_Hello 1 69.453 us 6.5 % 69.453 us 1.163 us 0.1 % 1.163 us
MyLib_MyPrintf 2 1.071 ms 99.6 % 535.696 us 1.071 ms 99.6 % 535.696 us
MyLib_World 1 1.004 ms 93.4 % 1.004 ms 1.043 us 0.1 % 1.043 us
Profile Data for Recursion
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
CR 2 3.131 ms 96.9 % 1.566 ms 2.165 us 0.1 % 1.083 us
main 1 3.232 ms 100.0 % 3.232 ms 1.885 us 0.1 % 1.885 us
MyPrintf 13 3.212 ms 99.4 % 247.105 us 3.212 ms 99.4 % 247.105 us
Recurse 11 24.133 ms 746.7 % 2.194 ms 13.995 us 0.4 % 1.272 us
Wazzup 1 1.022 ms 31.6 % 1.022 ms 1.724 us 0.1 % 1.724 us
Profile Data for StructorPlus
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
__static_initialization_and_des 2 2.674 ms 36.6 % 1.337 ms 2.326 us 0.0 % 1.163 us
_GLOBAL__sub_D_TheGlobalClass 1 1.607 ms 22.0 % 1.607 ms 1.203 us 0.0 % 1.203 us
_GLOBAL__sub_I_TheGlobalClass 1 1.069 ms 14.6 % 1.069 ms 1.564 us 0.0 % 1.564 us
Hello() 1 1.524 ms 20.9 % 1.524 ms 1.243 us 0.0 % 1.243 us
main 1 4.632 ms 63.4 % 4.632 ms 5.173 us 0.1 % 5.173 us
MyPrintf(char*) 5 7.292 ms 99.8 % 1.458 ms 7.292 ms 99.8 % 1.458 ms
TestClass::TestClass() 2 2.636 ms 36.1 % 1.318 ms 2.967 us 0.0 % 1.484 us
TestClass::~TestClass() 2 3.139 ms 42.9 % 1.569 ms 2.486 us 0.0 % 1.243 us
Tests built using GCC 11.1.0 launched from Shell:
Profile Data for MultiThread
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 1.005 s 100.0 % 1.005 s 1.002 s 99.6 % 1.002 s
MyPrintf 2 3.773 ms 0.4 % 1.886 ms 3.773 ms 0.4 % 1.886 ms
Profile Data for SOLibTest
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 1.123 ms 100.0 % 1.123 ms 2.165 us 0.2 % 2.165 us
MyLib_Hello 1 74.746 us 6.7 % 74.746 us 1.083 us 0.1 % 1.083 us
MyLib_MyPrintf 2 1.118 ms 99.6 % 559.095 us 1.118 ms 99.6 % 559.095 us
MyLib_World 1 1.046 ms 93.1 % 1.046 ms 1.083 us 0.1 % 1.083 us
Profile Data for LinkLibTest
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 989.708 us 100.0 % 989.708 us 1.724 us 0.2 % 1.724 us
MyLib_Hello 1 72.902 us 7.4 % 72.902 us 1.243 us 0.1 % 1.243 us
MyLib_MyPrintf 2 985.658 us 99.6 % 492.829 us 985.658 us 99.6 % 492.829 us
MyLib_World 1 915.082 us 92.5 % 915.082 us 1.083 us 0.1 % 1.083 us
Profile Data for Recursion
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
CR 2 3.078 ms 96.7 % 1.539 ms 2.085 us 0.1 % 1.043 us
main 1 3.184 ms 100.0 % 3.184 ms 1.604 us 0.1 % 1.604 us
MyPrintf 13 3.165 ms 99.4 % 243.438 us 3.165 ms 99.4 % 243.438 us
Recurse 11 24.184 ms 759.6 % 2.199 ms 13.835 us 0.4 % 1.258 us
Wazzup 1 969.177 us 30.4 % 969.177 us 1.644 us 0.1 % 1.644 us
Profile Data for StructorPlus
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
__static_initialization_and_des 2 2.570 ms 35.6 % 1.285 ms 2.286 us 0.0 % 1.143 us
_GLOBAL__sub_D_TheGlobalClass 1 1.541 ms 21.3 % 1.541 ms 1.163 us 0.0 % 1.163 us
_GLOBAL__sub_I_TheGlobalClass 1 1.032 ms 14.3 % 1.032 ms 1.724 us 0.0 % 1.724 us
Hello() 1 1.531 ms 21.2 % 1.531 ms 1.404 us 0.0 % 1.404 us
main 1 4.650 ms 64.4 % 4.650 ms 5.213 us 0.1 % 5.213 us
MyPrintf(char*) 5 7.206 ms 99.8 % 1.441 ms 7.206 ms 99.8 % 1.441 ms
TestClass::TestClass() 2 2.597 ms 36.0 % 1.299 ms 2.687 us 0.0 % 1.344 us
TestClass::~TestClass() 2 3.085 ms 42.7 % 1.542 ms 2.566 us 0.0 % 1.283 us
Tests built using GCC 11.1.0 launched from Shell via run:
Profile Data for MultiThread
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 1.006 s 100.0 % 1.006 s 1.002 s 99.6 % 1.002 s
MyPrintf 2 3.971 ms 0.4 % 1.986 ms 3.971 ms 0.4 % 1.986 ms
Profile Data for SOLibTest
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 1.123 ms 100.0 % 1.123 ms 2.446 us 0.2 % 2.446 us
MyLib_Hello 1 70.255 us 6.3 % 70.255 us 1.123 us 0.1 % 1.123 us
MyLib_MyPrintf 2 1.119 ms 99.6 % 559.295 us 1.119 ms 99.6 % 559.295 us
MyLib_World 1 1.051 ms 93.5 % 1.051 ms 1.083 us 0.1 % 1.083 us
Profile Data for LinkLibTest
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
main 1 979.242 us 100.0 % 979.242 us 1.724 us 0.2 % 1.724 us
MyLib_Hello 1 69.694 us 7.1 % 69.694 us 1.203 us 0.1 % 1.203 us
MyLib_MyPrintf 2 975.232 us 99.6 % 487.616 us 975.232 us 99.6 % 487.616 us
MyLib_World 1 907.824 us 92.7 % 907.824 us 1.083 us 0.1 % 1.083 us
Profile Data for Recursion
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
CR 2 3.078 ms 96.3 % 1.539 ms 2.005 us 0.1 % 1.003 us
main 1 3.196 ms 100.0 % 3.196 ms 1.764 us 0.1 % 1.764 us
MyPrintf 13 3.177 ms 99.4 % 244.348 us 3.177 ms 99.4 % 244.348 us
Recurse 11 23.465 ms 734.2 % 2.133 ms 13.794 us 0.4 % 1.254 us
Wazzup 1 1.046 ms 32.7 % 1.046 ms 1.724 us 0.1 % 1.724 us
Profile Data for StructorPlus
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
__static_initialization_and_des 2 2.646 ms 35.8 % 1.323 ms 2.246 us 0.0 % 1.123 us
_GLOBAL__sub_D_TheGlobalClass 1 1.557 ms 21.1 % 1.557 ms 1.163 us 0.0 % 1.163 us
_GLOBAL__sub_I_TheGlobalClass 1 1.092 ms 14.8 % 1.092 ms 1.925 us 0.0 % 1.925 us
Hello() 1 1.582 ms 21.4 % 1.582 ms 1.243 us 0.0 % 1.243 us
main 1 4.746 ms 64.2 % 4.746 ms 4.972 us 0.1 % 4.972 us
MyPrintf(char*) 5 7.379 ms 99.8 % 1.476 ms 7.379 ms 99.8 % 1.476 ms
TestClass::TestClass() 2 2.677 ms 36.2 % 1.339 ms 2.286 us 0.0 % 1.143 us
TestClass::~TestClass() 2 3.126 ms 42.3 % 1.563 ms 2.406 us 0.0 % 1.203 us
The results i've posted earlier are "Tests built using GCC 11.1.0 launched from Workbench".
One of my genetic defects is that I believe in well documented software, and in fact enjoy writing documentation. The down side is that it takes me longer to produce something, since all that writing takes additional time.
Documentation is kind of an iterative process; as I'm writing about how to use the program I sometimes come across some awkwardness in the operation that I might not notice when using the program myself, but that becomes obvious when I have to explain it to someone else. Then I have to go back and fix it in the code. In the case of a technical tool like Profyler I also have to make sure I know what I'm talking about when I explain the technical details, which involves additional research and testing.
You've got the same large differences in printf() speed as I see on my system, so the X5000 doesn't seem to make a difference there.
From your tests- Average time for MyPrintf() in original MultiThread example:
Launched from shell = 0.277 ms Launched from shell via Run = 1.972 ms Launched from Workbench = 86.481 ms
I can see why launching from the shell would be the fastest, as the program can use the shell's I/O streams. 'Run'ing the program also uses the shell's I/O streams, but since they're shared with the shell and potentially with other programs I imagine there's some resource locking involved which would slow things down some. But why Workbench takes so long to write to the console window it opens is a puzzler.
Not counting printf(), there doesn't seem to be any significant difference in speed between the GCC 8.3.0 and 11.1.0 versions of the test programs. The GCC 11 code may be a tiny bit faster, but nothing notable. So the difference in speed relative to my tests seems to be entirely due to the X5000.
Would it be possible to run the 'Simple' test program? I'd like to see how the two null functions perform for both versions of the compiler. No need to run the program in all the different ways, since the null functions don't print anything and so won't vary depending on the launch method.
Quote:
Note: In order to execute "SOLibTest" from your original Tests (GCC 8.3.0 built) i had to use my GCC 11.1.0 built "MyLib.so" because it is not included.
Oops! You're right- I totally missed putting that into the archive. Something to fix in the next release.
// Display the source file line number (this always fails).
if(Ourself && Ourself->SourceLineNumber)
printf("Source line number: %ld\\n", Ourself->SourceLineNumber);
else
printf("No source line number!\\n");
// We're done with the symbol info.
IDebug->ReleaseDebugSymbol(Ourself);
@msteed So got an answer saying that the test case is wrong, that is why: You've passed the address of the function in your program, not a "Tracked Memory Address" as specified in the autodoc. You should more likely be using it in the context of a Hook passed to StackTrace().
Probably autodoc wasn't clear enough and needs a bit of updating? Do you use autodoc for that stuff from wiki.amigaos.net?
Would it be possible to run the 'Simple' test program? I'd like to see how the two null functions perform for both versions of the compiler. No need to run the program in all the different ways, since the null functions don't print anything and so won't vary depending on the launch method.
Yes:
Original 'Simple' launched from Shell:
Profile Data for Simple
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
AlsoNull 1 441 ns 0.2 % 441 ns 441 ns 0.2 % 441 ns
CR 1 174.836 us 67.6 % 174.836 us 962 ns 0.4 % 962 ns
Hello 1 77.674 us 30.0 % 77.674 us 1.604 us 0.6 % 1.604 us
main 1 258.725 us 100.0 % 258.725 us 2.967 us 1.1 % 2.967 us
MyPrintf 4 250.024 us 96.6 % 62.506 us 250.024 us 96.6 % 62.506 us
Null 1 441 ns 0.2 % 441 ns 441 ns 0.2 % 441 ns
Space 1 2.045 us 0.8 % 2.045 us 1.003 us 0.4 % 1.003 us
World 1 177.202 us 68.5 % 177.202 us 1.283 us 0.5 % 1.283 us
'Simple' built using GCC 11.1.0 launched from Shelll:
Profile Data for Simple
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
AlsoNull 1 441 ns 0.2 % 441 ns 441 ns 0.2 % 441 ns
CR 1 174.074 us 66.2 % 174.074 us 1.003 us 0.4 % 1.003 us
Hello 1 82.686 us 31.4 % 82.686 us 1.764 us 0.7 % 1.764 us
main 1 263.136 us 100.0 % 263.136 us 3.008 us 1.1 % 3.008 us
MyPrintf 4 254.114 us 96.6 % 63.529 us 254.114 us 96.6 % 63.529 us
Null 1 401 ns 0.2 % 401 ns 401 ns 0.2 % 401 ns
Space 1 2.085 us 0.8 % 2.085 us 1.003 us 0.4 % 1.003 us
World 1 176.600 us 67.1 % 176.600 us 1.404 us 0.5 % 1.404 us
I use the autodocs from the SDK, not from the web site. But I assume they're both the same.
I did see the notice that tracked memory addresses must be used with ObtainDebugSymbol(). But the fact that the function name and some other fields work correctly led me to conclude that DOS automatically adds any program it loads to the tracked address list.
Indeed, if I run the 'Simple' test program that comes with Profyler (convenient because it doesn't immediately exit) and then use Ranger to examine the tracked memory list, I see that Simple is on the list.
By examination of other tracked programs, it looks like the first entry in the tracked address list is the program's code segment. There is such an entry for Simple, which makes me think that the code is in fact in tracked memory. Which means it should work with ObtainDebugSymbol(). The fact that other fields in the DebugSymbol struct are valid would seem to confirm this.
Quote:
You should more likely be using it in the context of a Hook passed to StackTrace().
I'm not sure I see how that would make a difference. Wouldn't a stack trace point to the same function address?
Can you check with the OS4 team and see if there's a flaw in my logic?
That confirms that there's essentially no difference between the two compiler versions, at least with a very simple program like this one. So the difference in speed between your results and mine are due to the difference in hardware.
FYI, here's what I get on my X1000 running the GCC 8.3.0 version from the shell. I've added a column at the end that reports on the difference in the 'average exclusive' time of my results versus yours, which indicates how much faster the X5000 is.
Profile Data for Simple
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg. Speed Diff
------------------------------- ----------- ----------- ------- ----------- ----------- ------- ----------- ----------
AlsoNull 1 1.080 us 0.4 % 1.080 us 1.080 us 0.4 % 1.080 us + 145%
CR 1 168.240 us 56.5 % 168.240 us 2.190 us 0.7 % 2.190 us + 128%
Hello 1 115.680 us 38.9 % 115.680 us 3.330 us 1.1 % 3.330 us + 108%
main 1 297.630 us 100.0 % 297.630 us 5.970 us 2.0 % 5.970 us + 101%
MyPrintf 4 278.490 us 93.6 % 69.623 us 278.490 us 93.6 % 69.623 us + 11%
Null 1 1.110 us 0.4 % 1.110 us 1.110 us 0.4 % 1.110 us + 152%
Space 1 4.320 us 1.5 % 4.320 us 2.280 us 0.8 % 2.280 us + 121%
World 1 173.790 us 58.4 % 173.790 us 3.180 us 1.1 % 3.180 us + 127%
So, for everything except MyPrintf() the X5000 is more than twice as fast, a difference greater than just the difference in processor speed. Perhaps the X5000 has faster memory access, or more efficient caching. Or the P50x0 is just faster at function calls, which is mainly what the Simple program does.
The difference is much less for MyPrintf(), which just calls the C library printf() to do all the work. printf() is a lot more complex than just function calls, and also has to actually render text to the screen, which brings graphics cards and drivers into play. The X5000 seems to have less of an advantage here. The time to run the entire program is only 15% faster than the X1000, also much closer.
Anyway, an interesting example of the kind of things you can do with a deterministic profiler.
Can you check with the OS4 team and see if there's a flaw in my logic?
Yeah, forwarded.
Quote:
So, for everything except MyPrintf() the X5000 is more than twice as fast, a difference greater than just the difference in processor speed. Perhaps the X5000 has faster memory access, or more efficient caching. Or the P50x0 is just faster at function calls, which is mainly what the Simple program does.
And there are my results running "simple" original test from workbench:
Profile Data for Simple
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
AlsoNull 1 441 ns 0.0 % 441 ns 441 ns 0.0 % 441 ns
CR 1 83.142 ms 99.8 % 83.142 ms 1.003 us 0.0 % 1.003 us
Hello 1 120.942 us 0.1 % 120.942 us 1.524 us 0.0 % 1.524 us
main 1 83.269 ms 100.0 % 83.269 ms 2.967 us 0.0 % 2.967 us
MyPrintf 4 83.260 ms 100.0 % 20.815 ms 83.260 ms 100.0 % 20.815 ms
Null 1 401 ns 0.0 % 401 ns 401 ns 0.0 % 401 ns
Space 1 1.925 us 0.0 % 1.925 us 962 ns 0.0 % 962 ns
World 1 83.144 ms 99.9 % 83.144 ms 1.484 us 0.0 % 1.484 us
And from shell:
Profile Data for simple
Function Name # of Calls Incl. Time Incl. % Incl. Avg. Excl. Time Excl. % Excl. Avg.
------------------------------- ----------- ----------- ------- ----------- ----------- ------- -----------
AlsoNull 1 441 ns 0.1 % 441 ns 441 ns 0.1 % 441 ns
CR 1 199.417 us 54.5 % 199.417 us 922 ns 0.3 % 922 ns
Hello 1 160.079 us 43.8 % 160.079 us 1.444 us 0.4 % 1.444 us
main 1 365.792 us 100.0 % 365.792 us 2.927 us 0.8 % 2.927 us
MyPrintf 4 357.291 us 97.7 % 89.323 us 357.291 us 97.7 % 89.323 us
Null 1 441 ns 0.1 % 441 ns 441 ns 0.1 % 441 ns
Space 1 1.925 us 0.5 % 1.925 us 962 ns 0.3 % 962 ns
World 1 201.904 us 55.2 % 201.904 us 1.363 us 0.4 % 1.363 us
Outside of MyPrintf() your results don't look very different from IamSONIC's. And when you get to MyPrintf() you're dealing with rendering text to the screen, which adds all sorts of variables.
Besides the graphics card and driver, there's the screen mode, the size of the font used by the shell, whether or not the shell needs to scroll to display the text (and if it does, how big the shell window is and whether it's partially obscured by another window), as well as what other programs might be doing graphics operations at the same time. If you're using the debug kernel that will likely affect the speed of graphics operations, too. In short, all the usual kinds of things that add complication when doing benchmarking.
I don't have all the answers, but at least we now have a tool with which to ask the questions.
@msteed Yeah, you are right. I found what was making an impact on myprinf in my case: compositing effects enabled in GUI. Once disabled it start to be on the same values as for IAmSonic. Switching resolutions, changing of shell-colors to basic ones, etc, etc didn't make a big impact, but compositing effects are.
If I understand it correctly, when compositing effects are in use, text rendering by CPU means that bitmaps must be moved back and forth between RAM/VRAM.
I once made a bug that updated window title too often. On a slow Sam440 system with compositing effects thing started to crawl.
compositing effects enabled in GUI. Once disabled it start to be on the same values
Compositing is always enabled on my machine. Maybe it's related to "Power Prefs". I switched to "High Power" before executing the tests. I noticed that the setting in "Power Prefs" affects all drawing operations. Including simple 2D ones.
@IamSONIC Wow, how interesting. I just run the ragemem benchmark, and while for cache and MIPS i have the same values, for Write32/64 i have 40% slower results and for Read32/64 10% slower results. But also VideoBus for Write is the same, but for reading 50% slower. I checked and i indeed have just one memory module installed. I read before somewhere that having 2 modules instead slows the system down (or maybe it slows Linux down, do not remember, but something was about), but now will definitely try it out. Another question is why i had the Video bus 2 times slower (that alone can explain differences in MyPrintf() as well).
EDIT: ordered "Corsair Vengeance CMZ8GX3M2A1600C9 DDR3 - 2x 4ГБ 1600" will see how different it will be. Through that of course didn't explain my 25 value for the video bus, while should be 53-54.
@msteed See, your profiler already start to change things for us :)