Always good to read such technical clear posts, and Rachi's blog are good example. No water, no words about nothing, just details, explanation of pluses and minuses, future problems and passed problems.
Even if there is no ready to use user-version, a big thumb for showing some professionality.
Hope I don't miss the news when it's done. I plan on merging it with the code I have (screenmode changing and window to fullscreen hotkey, driveclick etc.) and uploading that one to OS4Depot on the same day.
does it mean that one single 68k instruction is emulated in several PPC instructions?
I am not Rachy, but as he answer on the same question in comments - yes. There is copy+paste of answer: Quote:
Usually it is longer, at least a couple instructions needed to emulate the exact same behavior of the 68k instruction. But don't forget how much faster the PPC is compared to the fastest 68k. Even if we assume that the 68k instruction can be executed in 1 CPU clock cycle, a 800 MHz PPC would be 16x faster than a 50 MHz 68k. This is not true of course, but there are other factors, like the speed of the memory access which is much better on a newer hardware than the old Amigas obviously, or the number and size of the caches in the processor.
First, I would say that as PowerPC is RISC, it often requires several instructions for one 68k instruction. Then, emulation can't be so clever converting instructions at the first pass.
I am also not Rachy, most commonly yes, etch instruction is divided in micro code blocks, so parts that cancel etch other out can be removed before the final code is generated.
What you most understand is that what is on address 0 in the emulator is not what is address 0 in real life, every thing has to be virtual, 680x0 registers has to be stored in RAM, and 680x0 flags also has to be stored in RAM, they can't be mapped to PowerPC registers, because every thing has to be virtual whit in the emulator.
PowerPC has 32 registers ++, while 680x0 has 8 registers, I think it should be possible to map this registers if there is no conflict whit the host OS, but that’s not what EUAE expects, but even if you did rewrite EUAE, so you don't need to put 680x0 registers in memory, then there is always the problem of visualizing the memory address space.
Emulators put emulated machines into a reserved memory array that is some where else.
Host machine address space starts at address 0x00000000 virtual machine address space starts at address 0x00000000+position_of_memory_reserved
every time the emulated CPU load or store some thing to memory, position of address to read from or write has to be first calculated,(But the real address does not necessary need to be recalculated if 680x0 register did not change.)
(NutsAboutAmiga)
Basilisk II for AmigaOS4 AmigaInputAnywhere Excalibur and other tools and apps.
rachy asked if we have ideas for optimizations. If mapping registers in not possible (a design choice made because of x86 architecture ?), maybe flags and d0 could be loaded at the beginning of the final block of code and stored at the end.
Another option would be to apply an optimization pass on the generated code to reduce the execution time but that would increase the time spent (that is, after all, emulation time).
I did not say it was possible or impossible, for now at least. Quote:
a design choice made because of x86 architecture ?
Maybe, I think the 680x0 has many more registers then a 486 (I think it was 4 or 5?), I believe modern x64 chips have a lot more registers today, I no expert on the subject of assembler code. I think the primary reason is that first emulator where designed around interpreted emulation, and rest of the emulator kind of expects the registers to be found in a unsigned int32 regs68k[8] table of some sort. Quote:
Another option would be to apply an optimization pass on the generated code to reduce the execution time but that would increase the time spent (that is, after all, emulation time).
Because the generated code is reused so many times over, optimizing it might be worth it, that’s if the JIT cache (aka emulated CPU cache) is not flushed all the time, I believe that’s what rachy is trying to do. Anyway that’s kind of what JIT does anyway.
This is what interpreted looks like:
Interpret the code.
Execute some routine that does what machine code does.
Interpret the code.
Execute some routine that does what machine code does.
Interpret the code.
Execute some routine that does what machine code does.
Interpret the code.
Execute some routine that does what machine code does.
This is what JIT looks like:
Interpret the code, and generate native machine code
Interpret the code, and generate native machine code
Interpret the code, and generate native machine code
Interpret the code, and generate native machine code
While cahce is not flushed
Begin
execute the generated code
end
As I understand it rachy is trying to make a more advanced JIT by using macro blocks, instead of generating native code at once, unassay code blocks are striped before the native code is generated.
Edited by LiveForIt on 2012/5/16 11:36:15 Edited by LiveForIt on 2012/5/16 15:39:58 Edited by LiveForIt on 2012/5/16 15:40:30
(NutsAboutAmiga)
Basilisk II for AmigaOS4 AmigaInputAnywhere Excalibur and other tools and apps.
I thought it is time to say a few words to this topic.
You all have nice ideas, actually what you described here is how Petunia works mostly:
1. it runs directly on the compiled code and leaves it only if there was an indirect jump comes to let the system find out the target address code type.
2. It maps all 68k registers to PPC registers in a static layout, the remaining registers are used for any other operations and base pointers.
Why these are not working for E-UAE:
1. the emulation needs some time for emulating the environment, it is not just about the processor emulation only. The execution often leaves the compiled code to let the other routines work. (It happens at every jump instruction for now.)
2. The static register layout seems to be a good idea, but actually there are big trade-offs when it comes to the context switching: all registers must be saved/restored and that is a lot of (often useless) memory operations. In E-UAE the compiling uses a dynamic register allocation functionality only. A register is loaded from the memory only if it would be used in the actually executed code chunk and will be saved back when the execution leaves the compiled code. This is much more flexible and does not need unnecessary memory operations whenever any C function is called (like lots of the memory operations for the custom chip access).
Since the macroblock optimization is not implemented yet all instructions are emulated for every aspect (like updating flags, loading modified registers, etc.) As soon as enough instructions are implemented I will implement the optimization routine, which will eliminate all of the unnecessary macroblocks that produce any results which will be overwritten in the same compiled code chunk. By this optimization useless register loading, flag emulation and (sometimes) even whole instructions will be eliminated.