In my opinion and based on what I analysed so far - the only area where SAM460 is faster/better is FPU speed for standard FPU code.
Of course its hard to tell when will this bother you since all other parts of A1222 are better/faster than SAM460. This question is still not answered since improvements on A1222 are still comming. For example L2 cache speed increased dramatically in last update.
Below is a good benchmark (newest stream-5.10-AOS from Aminet/OS4depot) that can ilustrate FPU emulation vs native SPE performance on A1222:
Emulated FPU (standard FPU code) Function Best Rate MB/s Avg time Min time Max time Copy: 842.5 0.194572 0.189902 0.201057 Scale: 150.8 1.063566 1.061034 1.066340 Add: 160.0 1.510033 1.499692 1.524443 Triad: 153.6 1.569161 1.563005 1.577607
A1222 SPE FPU (SPE code) Function Best Rate MB/s Avg time Min time Max time Copy: 831.8 0.197500 0.192355 0.211105 Scale: 542.5 0.301120 0.294940 0.309383 Add: 617.1 0.400034 0.388917 0.413630 Triad: 592.5 0.413958 0.405089 0.420792
I dont have SAM460 benchmark results for the newest stream-5.10-AOS to compare, but maybe owners of SAM460 can provide us with this info?
But it is clearly visible that emulated code vs SPE code for FPU operation on A1222 is somehow only "4x slower" in speed than native code. This really shows how SPE FPU emulator is very effective and fast implementation. Usually when you emulate FPU on CPU that does not have real FPU - its not usable at all. Speed number are then unusable. On A1222 FPU emulation si usable because A1222 has SPE FPU with its own math function so emulator translates those to standard FPU. There are equivalent functions but with speed penalty.
But in general you dont really want to use very intensive FPU code on A1222 via FPU emulation. It not usable. Only SPE port can solve that!
However A1222 SPE FPU is also in the same time a 64-bit vectorial unit (like "small" Altivec). I cant wait moment when developers start to explore that advantage. There are vectorial instructions in A1222 SPE unit. Right now ports to A1222 are just recompile to use SPE math functions instead of standard ones.
Also important thing to compare A1222 vs SAM460 is that P1022 in A1222 is dual core. If multicore arrives, even the most primitive one - then there will be no base to compare it to SAM460. It will be much faster.
The only problem I see now with SAM460 is effective support for modern graphic cards (RX cards, GART). If I understand correctly GART enables fast memory transfers between main memory and GFX card memory. But that may be solved in the future.
Function Best Rate MB/s Avg time Min time Max time Copy: 779.3 0.206296 0.205306 0.210359 Scale: 372.6 0.431840 0.429410 0.436890 Add: 447.7 0.540051 0.536067 0.542271 Triad: 443.4 0.544179 0.541314 0.547519
#stream
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
...
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 771.2 0.208998 0.207482 0.212700
Scale: 358.7 0.453711 0.446047 0.470284
Add: 441.8 0.549591 0.543264 0.559943
Triad: 432.8 0.561002 0.554506 0.567038
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
Thank you! I would say it is comparable to A1222 (as a single core obviously) and considering price difference between both I will be happy with Sam460le for a while.