Ok, i do some heavy tests with all what we have, there are results for a simple test case which just use .so with printf("hello"):
1). old adtools binutils:
Works on "public" elf.library 53.30 Works on "prior_st_value fix" elf.library 53.48 Works on "with st_value fix" elf.library 53.49
As expected it works in all cases, because of "st_value = 0" patch in the old binutils. But also showns that all elf.library are ok with old binutils.
2). new binutils, without st_value = 0 patch added:
Crash on public elf.library 53.30 Crash on elf.library 53.48 Works fine on elf.library 53.49
As expected in this case, it crashes on public and pre_st_value fix versions, but works fine on elf.library 53.49 where proper fix for were added.
3). new binutils, with st_value = 0 patch added:
Works on "public" elf.library 53.30 Works on "prior_st_value fix" elf.library 53.48 Works on "with st_value fix" elf.library 53.49
As expected it also works in all cases. That the current version we will goin on futher until new elf.library will be not made public. Through, is discussable should we remove this patch later from binutils ? Because it will mean, that any new created binary will fail to work on anything prior 53.49 version of elf.library.
So .. With what else we left ? At least, about constructors and fini/init ctor/dtor stuff we know that for pure executables changes in Clib/Newlib need to be done, and for .sobjs probabaly something else too. This one can be left for now until someone will fix it in newlib and clib
The last thing which worry me now, is that since sometime with new binutils we had some binary size increase, need to check out what cause this, but so far i see lot's of 0x00 inside, looks like auto-alinment or something..
Thanks for your tests - glad to see the results are all as expected. It's not related to shared objects, but there is also the issue with executables where stdio in destructors is not working in newlib (works in clib2). I have reported this bug, but as nobody fixed it yet, I tried to myself and haven't figured out how to compile it so far.
I'm fairly sure a public elf.library update will happen without too much trouble - it was just that it wasn't impossible to do so with the bugs added in v53.35-53.42 (and yes, one bug was my doing ). Ideally, I "just" need to clean up the reloc routine by rewriting it - Alfkil's pointer equality fix, for example, results in an unnecessary duplicate dynamic symbol lookups when processing solibs. And then there is the bolted on symbol cache that was added at some point, which is now not used everywhere it could/should be.
Performance should already be better with elf.library since v53.45 (for example, Odyssey loads here in 5 seconds, instead of 7).
Btw, with new binutils, when we build with newlib, we have an independent section with "__newlib_version". Before, in older binutils, it were put together with .rodata. Question is : is it worth to mimic the same as it was, so to have __newlib_version in the same place where .rodata are, or keep it separated as it now ? At least keeping it separate add some more unwanted bytes to the header and making a binary to be a little, but still, bigger..
@All Find out, that with new binutils, any gl4es test case fail with inability to create context. Not sure what the reasons for, but initialization process are there:
I see constructors/destructors involved there, and this:
__attribute__((constructor(101))) // line 82
__attribute__((destructor)) // line 722
are surely used in os4 builds. And that probably the cause why it fails ? : it wants to use additional constructor (101) in , and fail because of non-working init_array constructors (mean again we need support for them for both NewLib and Clib2).
And that is a pure static test case, not the shared object one, of course.
@Joerg
Is official newlib's source code from which you made aos4 one in early days are this one : https://sourceware.org/newlib/ ?
Why i ask, i tried to find if it has any reference to __do_global_ctors_aux, but it seems have none. And if that the LIBC used for cygwin as well, then at least on my Cygwin, there still and only .ctors/.dtors supported, no .init_array/.fini_array.
@All Just to refresh topic there where we are at now: Max did fix all the necessary bits to put .rodata in independent section, rework PLT handling a bit, made few tweaks there and there and so : we have everything working.
Currently, for the first public release of just new bintuils, we switch to old constructors way : .ctors and .dtors , so everything works just like before, just no more dwarf issues and that kind of stuff.
After that, while everyone will be able to use latest version in old way, we will enable .init/.fini arrays way for constructor, and so, necessary bits will be added to clibs to make use of them, so to have next version be more modern. But at least with the first release, everyone will be able to use the latest version already.
And today i were able to build native version of latest binutils too, check this out (hit open image in new tab for full size):
@All And to bring some more at: as the CLIB on which Andrea working were heavy changed, modified and updated, it had a different name now: CLIB4 (4 for OS4), so to not mess it with CLIB2. Then, Rayn (rjd324), updated adtools repo, so now it automatically had support not only for newlib and clib2, but also for clib4. I.e., you had now 3 C libraries to choose via -mcrt:
-mcrt=newlib (same as default without providing -mcrt) -mcrt=clib2 -mcrt=clib4
Yes, .rodata must not be in the same segment as the .text/.plt segment.
Can you please double recheck that ? I mean, what the reasons for and why it should be moved out ?
I can understand why .plt should be the final one (same as .bss) probably, but why .rodata should be not in the same segment ? It can be just before .plt, etc, in same write protected segment ?
Or there something special about in elf.library about ?
Yes, .rodata must not be in the same segment as the .text/.plt segment.
Can you please double recheck that ? I mean, what the reasons for and why it should be moved out ?
I can understand why .plt should be the final one (same as .bss) probably, but why .rodata should be not in the same segment ? It can be just before .plt, etc, in same write protected segment ?
Or there something special about in elf.library about ?
Did I not already explain it? I hope I did, but perhaps I didn't . I've checked the sources again, as I forgot why...
It doesn't matter which order .text and .plt are in at all, just as long as they are both in the same segment - elf.library will load this segment as-is. If you put .rodata in the same segment as .text/.plt, then this triggers elf.library to load .text/.plt/.rodata individually into separate memory blocks (this will break the PLT) rather than loading the complete segment into a single memory block.
The logic is that any segment containing .rodata is treated differently by elf.library - each section is loaded individually, rather than the entire segment. The reason: something to do with not breaking 68k cross calls, which is the only clue given in the comments (I haven't figured out exactly why this is, only that the OS crashes if I remove this behaviour). If at all possible, I'd like to change this, but until I understand it, I can't touch it.
The logic is that any segment containing .rodata is treated differently by elf.library - each section is loaded individually, rather than the entire segment. The reason: something to do with not breaking 68k cross calls, which is the only clue given in the comments (I haven't figured out exactly why this is, only that the OS crashes if I remove this behaviour). If at all possible, I'd like to change this, but until I understand it, I can't touch it.
The m68k cross calls probably depend on IExec->IsNative() returning FALSE for the m68k code, just like it's required for utility.library hook functions for example checking if the hook function is PPC native or m68k code. If .rodata is in the same segment as .text .rodata with the m68k code may be put into executable memory (segment registers on G2/G3/G4 CPUs) making it fail by IExec->IsNative() returning TRUE for m68k cross call code.
The m68k cross calls probably depend on IExec->IsNative() returning FALSE for the m68k code, just like it's required for utility.library hook functions for example checking if the hook function is PPC native or m68k code. If .rodata is in the same segment as .text .rodata with the m68k code may be put into executable memory (segment registers on G2/G3/G4 CPUs) making it fail by IExec->IsNative() returning TRUE for m68k cross call code.
I guess we are talking purely in the m68k->PPC direction, in which case the only situation an ELF binary contains any m68k code is from the EmuTrap stubs as generated by idltool/fdtrans, etc (fdtrans used to put these in .data, but now puts these in .rodata).
I'm not sure where Petunia fits into things, as I couldn't see any code related to this at first glance, the emulation in the kernel performs a check similar to IExec->IsNative(), so I think you're right. Any OS3 code will fail to work on OS4, unless the EmuTraps in OS4 libraries/devices have been loaded into non-executable memory.
Not much to be done about this, I guess. In theory, you could load the entire .text/.rodata/.plt segment in one block of memory and then change the memory attributes for the .rodata section, but I don't think it can be guaranteed that there is suitable padding/alignment either side of .rodata, particularly with the older binutils. That said, this could be checked and handled differently if alignment is correct, but perhaps not worth the effort.
Not much to be done about this, I guess. In theory, you could load the entire .text/.rodata/.plt segment in one block of memory and then change the memory attributes for the .rodata section, but I don't think it can be guaranteed that there is suitable padding/alignment either side of .rodata, particularly with the older binutils. That said, this could be checked and handled differently if alignment is correct, but perhaps not worth the effort.
I don't know anything about how the ExecNG kernel works on the newer CPUs (X1000, X5000, A1222), but on the older ones changing memory attributes may work at most on 440/460 CPUs which are using the TLB cache for it, but not on G2/G3/G4 CPUs. On G2/G3/G4 CPUs the MMU memory attributes, incl. the executable bit, aren't used by the kernel IExec->IsNative() function, and similar code in the m68k emulator, at all. At least not in the kernel versions for which I still had access to the sources. It just checked if the address of the code is in the 256 MB executable segment address space, IIRC 0x70000000-0x7FFFFFFF.
@Oliver But we talk only about shared objects when talking about .rodata needs to be out of the segment with .plt, right ? When it's pure executable, it still can be together with .text segment with no problems then (like it was with old binutils).
But this behavior (having .rodata together with .text in one segment) seems to be varied from different binutils versions: sometime it was with .text, sometime in some version it was out (we found some google links where ppls curious about).
On G2/G3/G4 CPUs the MMU memory attributes, incl. the executable bit, aren't used by the kernel IExec->IsNative() function, and similar code in the m68k emulator, at all. It just checked if the address of the code is in the 256 MB executable segment address space, IIRC 0x70000000-0x7FFFFFFF
As I noted a while back while investigating elf file loading on my X1000, the 'executable' attribute does not appear on memory where the .text segments are loaded. However, the addresses where the .text segments are loaded seem to all be in the 0x7xxxxxxx range. So it would appear that the X1000 kernel handles memory the same as the G2/G3/G4 kernels do.
But we talk only about shared objects when talking about .rodata needs to be out of the segment with .plt, right ? When it's pure executable, it still can be together with .text segment with no problems then (like it was with old binutils).
That's correct, yes. For the reasons that @joerg has outlined, the special handling of .rodata by elf.library cannot be changed.
Quote:
But this behavior (having .rodata together with .text in one segment) seems to be varied from different binutils versions: sometime it was with .text, sometime in some version it was out (we found some google links where ppls curious about).
For normal executables, I have only ever known .rodata to be put together in the same segment with .text, at least with all the native official SDK compilers (and even all the unofficial adtools ones I have used), and least from 4.0.3 and up. Famous last works, but it probably won't do any harm if .rodata is separated from .text in this case too.
For shared objects, it may be different - it's perhaps at least partly the reason why there were two versions of shared objects (I don't know the details, but I don't think PLT worked in the old versions - the old version has been obsolete for quite some years).
That's correct, yes. For the reasons that @joerg has outlined, the special handling of .rodata by elf.library cannot be changed.
Hmm.... but if .rodata has something to do with 68k cross calls, then shouldn't it be matters in all cases, not only for dynamically linked executables ? Why only for dynamic, then ?
I know that VBCC put 68k code in the "CODE" section, but then GCC seems to do it differently, and put it to .rodata (which could easily happen as 68k code is usually constant and read-only, then .rodata must be in a different segment to make automatic cross calls work).
But then, it should be the same issue with both, and dynamic, and pure binaries, right ?
And, maybe whole .rodata problem is GCC-specific ? Because as Frank says, VBCC would not move 68k code into .rodata when linking with it.
I also take one of the Frank's games compiled by VBCC which are dynamic and use .so , and that how they look like:
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x01000034 0x01000034 0x000a0 0x000a0 R E 0x4
INTERP 0x0000d4 0x010000d4 0x010000d4 0x00009 0x00009 R 0x1
[Requesting program interpreter: vbcc 0.9]
LOAD 0x000000 0x01000000 0x01000000 0x6c141 0x6c141 R E 0x10
DYNAMIC 0x06c148 0x0107c150 0x0107c150 0x000d0 0x000d0 RW 0x4
LOAD 0x06c148 0x0107c150 0x0107c150 0x00864 0x8307c RWE 0x8
So, there .rodata is in the same segment as .text, and those binaries works fine, for sure. And .rodata is not out. I tested the binary right now (this is a game) - and all works just fine, no problems.
I also asked Frank about he's opinions about .plt and .text needs to be keep together in one segment, and he *think*, that perhaps needs of .text and .plt being kept together, is that BL is used to call .so functions, which has a PC-relative range of +/- 32 MB, and this in turn means, that when .plt is in a different segment, there might be a chance that this segment is loaded to the other end of the memory space, farther than 32MB away. So maybe in VBCC this need to be fixed. But that just Frank's IMHO on that matter.
So to summarize it up and clear everything up a little more:
1). Is .rodata issue are GCC specific ? 2). Why .rodata issue happens only with dynamic and not with casual binaries, if it's about 68k cross calls ? 3). Is needs to keep .plt and .text together only about issue when segment is loaded to the other end of the memory space ?
Hmm.... but if .rodata has something to do with 68k cross calls, then shouldn't it be matters in all cases, not only for dynamically linked executables ? Why only for dynamic, then ?
Because statically linked executables don't have a .plt.
The problem is a combination of the 3 sections .text, .plt and .rodata: - .text and .plt have to be loaded into contiguous memory to make 24 bit (+/- 8 MB) relative calls work as Frank wrote. That's done by elf.library if .text and .plt are in the same segement, but .rodata is in a different one. That's a bug in vlink (putting .text and .plt into different segments), or more likely just in it's linker scripts, which needs to be fixed. It may work by accident even without doing it correctly, like in your example, but it's not guaranteed to work. - elf.library can't load .text, .plt and .rodata into the same memory space at once because of the m68k cross calls in .rodata with have to be loaded into non-executable memory but .text (and .plt?) has to be loaded into executable memory instead. If a section contains .rodata all of the segments in it (.text, .rodata and for dynamically linked executables additionally .plt) are loaded separately instead, to make .text executable but .rodata non-executable, which may result in .text and .plt to be more than +/- 8 MB (24 bit) apart from each other.
Is it GCC specific? Maybe. GCC, or rather fdtrans, puts the m68k cross calls (EmuTraps) into .rodata. If the VBCC equivalent of fdtrans uses .data instead of .rodata it doesn't have the m68k code in PPC executable .rodata problem. But that doesn't change anything, elf.library can't know if an executable was linked with ld (binutils) and .rodata has to be loaded separately, or if it was linked with vlink instead and separate .rodata loading may not be required.
(I may have mixed "segment" and "section" above, but you should still get what I meant.)
- elf.library can't load .text, .plt and .rodata into the same memory space at once because of the m68k cross calls in .rodata
I don't know much about this stuff and didn't read all the posts, but: could it if it knew there are no m68k cross calls in .rodata? So what about having some ~flag, ~symbol in the ELF file or the section or whatever which tells elf.library "hey, I know you normally can't do that, but in this case for this .rodata section you are allowed to put it in same memory space as .text and .plt"
@Georg The cross calls are generated by the "ftdrans" tool. The results are IIRC assembler, but in any case just .(ro)data with the bytes of the to be emulated m68k code, without any symbols or anything else which could be used to identify it in an ELF executable.
The m68k cross calls/EmuTraps are used in the, for OS4 native code obsolete and unused, m68k jump table of OS4 libraries for example. Only old, emulated OS 3.x/m68k code calling library function via the jump table uses it, and the "EmuTrap" code in it just copies the emulated m68k registers to the corresponding PPC ones, calls the PPC native version of the OS4 library function and on return to the emulated m68k code copies the PPC register results of the function to the emulated m68k ones (D0 (+D1 for 64 bit) for integer, A0 for pointer results, etc.).
Emulated m68k code like those cross calls/EmuTraps can only work if it's in (for the PPC CPU) non-execuable memory, if it would be allocated in PPC executable memory instead it would be executed as PPC native code, resulting in ISI crashes, instead of being emulated with one of the 2 m68k emulators of OS4.
The M68k->PPC cross calls that fdtrans generates used to be asm only, and old versions put them in .data, but this was changed to .rodata at some point. Later an option was added in fdtrans to generate them in pure C (idltool can also be use to create the same) with the cross calls declared static const, so those end up in .rodata too.
@kas1e
1) No 2) It doesn't - the .rodata issues affects all binaries. It's just that the solution used for non-dynamic objects cannot be used for dynamic objects (otherwise 24-bit PC-relative jumps in .plt could be out of range, resulting in ISI crashes, which was the case with some of alfkil's elf.library changes, which I have since fixed - I experienced this with the example code that you sent me, which can manifest itself as seemingly random crashes depending on whether .plt ended up in memory in range of .text or not). 3) Yes
What Joerg says is correct. It does indeed matter that .rodata must be allocated into non-executable memory in all cases, whereas .text (and .plt) must go into executable memory. This is not GCC specific and this is what elf.library ensures. For static objects, it doesn't matter much about .rodata, as elf.library will not load any segment containing .rodata (each section is allocated to a separate block of memory, to ensure .rodata is not placed in executable memory). For dynamic objects, that solution will break .text/.plt interaction, hence .rodata must be kept separate from .text/.plt in this case.
With VBCC, it can generate mixed PPC/68K binaries (e.g. for WarpOS), but if you're using the m68k VBCC to compile the m68k code, obviously VBCC knows which code is which and the object formats are different of course. Vlink is nice because it knows about m68k, PPC, HUNK and ELF, unlike the GCC linker which is compiled for a specific target. Don't get mixed up with VBCC generating m68k code and GCC/VBCC genarating PPC code which may contain m68k code (assembly) as data (which is what EmuTraps are). VBCC will put EmuTraps in .rodata, just as GCC does.