Login
Username:

Password:

Remember me



Lost Password?

Register now!

Sections

Who's Online
41 user(s) are online (28 user(s) are browsing Forums)

Members: 1
Guests: 40

Maijestro, more...

Support us!

Headlines

 
  Register To Post  

gcc autovectorization feature - Altivec
Not too shy to talk
Not too shy to talk


See User information
I just now played with auto-vectorization feature of gcc:

There is result of standart powerpc stream:
Work:Benchmark/stream-5.10-AOS/stream 
...
-------------------------------------------------------------
Function    
Best Rate MB/s  Avg time     Min time     Max time
Copy
:            2666.8     0.060642     0.059998     0.064029
Scale
:           4103.0     0.039175     0.038996     0.039543
Add
:             3870.2     0.065242     0.062012     0.083547
Triad
:           3901.1     0.061864     0.061521     0.062810
-------------------------------------------------------------
Solution Validatesavg error less than 1.000000e-13 on all three arrays


Than I copmpiled mu stream.c code with:
gcc -DSTREAM_TYPE=float -DTUNED -mcpu=G4 -maltivec -mabi=altivec -O3 -ftree-vectorize -fopt-info-vec-optimized stream.c -o stream-float-tuned-altivec-g4

Result of stream-float-tuned-altivec-g4:
Work:Benchmark/stream-5.10-AOS/stream-float-tuned-altivec-g4 
...
-------------------------------------------------------------
Function    
Best Rate MB/s  Avg time     Min time     Max time
Copy
:            2642.3     0.031379     0.030277     0.039397
Scale
:           4866.5     0.017653     0.016439     0.024704
Add
:             5440.7     0.022771     0.022056     0.025303
Triad
:           5414.9     0.022900     0.022161     0.028060
-------------------------------------------------------------
Solution Validatesavg error less than 1.000000e-06 on all three arrays
-------------------------------------------------------------


Result is ( also confirmed with -fopt-info-vec-optimized ) that Scale, Add and Triad functions are optimized - uses altivec, Copy remains unoptimized.

Source code of functions is here (#pragma omp simd is irelevant in this example):
#ifdef TUNED
/* stubs for "tuned" versions of the kernels */
/* --- Modified by sailor -------------------*/
void tuned_STREAM_Copy()
{
    
ssize_t j;
#pragma omp simd
        
for (j=0j<STREAM_ARRAY_SIZEj++)
            
c[j] = a[j];
}

void tuned_STREAM_Scale(STREAM_TYPE scalar)
{
    
ssize_t j;
#pragma omp simd
    
for (j=0j<STREAM_ARRAY_SIZEj++)
        
b[j] = scalar*c[j];
}

void tuned_STREAM_Add()
{
    
ssize_t j;
#pragma omp simd
    
for (j=0j<STREAM_ARRAY_SIZEj++)
        
c[j] = a[j]+b[j];
}

void tuned_STREAM_Triad(STREAM_TYPE scalar)
{
    
ssize_t j;
#pragma omp simd
    
for (j=0j<STREAM_ARRAY_SIZEj++)
        
a[j] = b[j]+scalar*c[j];
}
/* end of stubs for the "tuned" versions of the kernels */


Do somebody know, why tuned_STREAM_Copy() was not optimized?
Of course, I can modify function to something like c[j] = one*a[j];

P.S. all with gcc 11.2.0, compilled natively on A1222+, tested on X1000

AmigaOS3: Amiga 1200
AmigaOS4: Micro A1-C, AmigaOne XE, Pegasos II, Sam440ep, Sam440ep-flex, AmigaOne X1000
MorphOS: Efika 5200b, Pegasos I, Pegasos II, Powerbook, Mac Mini, iMac, Powermac Quad
Go to top
Re: gcc autovectorization feature - Altivec
Quite a regular
Quite a regular


See User information
I don't know the answer but I noticed in your result tables that the Avg/Min/Max times are about half for Copy with AltiVec, yet the Best Rate seems to be the same or slightly less. This seems inconsistent but I don't know what these numbers really mean.

Go to top
Re: gcc autovectorization feature - Altivec
Not too shy to talk
Not too shy to talk


See User information
@balaton
yes, I have to check assembler, if really Copy uses FPU registers and Scale+Add+Triad AltiVec.

I want to learn howto write code which should be easily auto-vectorized by gcc - I am too lazy to made vector code by hand..

AmigaOS3: Amiga 1200
AmigaOS4: Micro A1-C, AmigaOne XE, Pegasos II, Sam440ep, Sam440ep-flex, AmigaOne X1000
MorphOS: Efika 5200b, Pegasos I, Pegasos II, Powerbook, Mac Mini, iMac, Powermac Quad
Go to top

  Register To Post

 




Currently Active Users Viewing This Thread: 1 ( 0 members and 1 Anonymous Users )




Powered by XOOPS 2.0 © 2001-2024 The XOOPS Project