gcc autovectorization feature - Altivec

	Bottom Previous Topic Next Topic
Register To Post

sailor

Posted on: 1/11 19:27 #1

Quite a regular

I just now played with auto-vectorization feature of gcc:

There is result of standart powerpc stream:


Work:Benchmark/stream-5.10-AOS/stream 

...

-------------------------------------------------------------

Function    Best Rate MB/s  Avg time     Min time     Max time

Copy:            2666.8     0.060642     0.059998     0.064029

Scale:           4103.0     0.039175     0.038996     0.039543

Add:             3870.2     0.065242     0.062012     0.083547

Triad:           3901.1     0.061864     0.061521     0.062810

-------------------------------------------------------------

Solution Validates: avg error less than 1.000000e-13 on all three arrays

Than I copmpiled mu stream.c code with:
gcc -DSTREAM_TYPE=float -DTUNED -mcpu=G4 -maltivec -mabi=altivec -O3 -ftree-vectorize -fopt-info-vec-optimized stream.c -o stream-float-tuned-altivec-g4

Result of stream-float-tuned-altivec-g4:


Work:Benchmark/stream-5.10-AOS/stream-float-tuned-altivec-g4 

...

-------------------------------------------------------------

Function    Best Rate MB/s  Avg time     Min time     Max time

Copy:            2642.3     0.031379     0.030277     0.039397

Scale:           4866.5     0.017653     0.016439     0.024704

Add:             5440.7     0.022771     0.022056     0.025303

Triad:           5414.9     0.022900     0.022161     0.028060

-------------------------------------------------------------

Solution Validates: avg error less than 1.000000e-06 on all three arrays

-------------------------------------------------------------

Result is ( also confirmed with -fopt-info-vec-optimized ) that Scale, Add and Triad functions are optimized - uses altivec, Copy remains unoptimized.

Source code of functions is here (#pragma omp simd is irelevant in this example):


#ifdef TUNED

/* stubs for "tuned" versions of the kernels */

/* --- Modified by sailor -------------------*/

void tuned_STREAM_Copy()

{

    ssize_t j;

#pragma omp simd

        for (j=0; j<STREAM_ARRAY_SIZE; j++)

            c[j] = a[j];

}



void tuned_STREAM_Scale(STREAM_TYPE scalar)

{

    ssize_t j;

#pragma omp simd

    for (j=0; j<STREAM_ARRAY_SIZE; j++)

        b[j] = scalar*c[j];

}



void tuned_STREAM_Add()

{

    ssize_t j;

#pragma omp simd

    for (j=0; j<STREAM_ARRAY_SIZE; j++)

        c[j] = a[j]+b[j];

}



void tuned_STREAM_Triad(STREAM_TYPE scalar)

{

    ssize_t j;

#pragma omp simd

    for (j=0; j<STREAM_ARRAY_SIZE; j++)

        a[j] = b[j]+scalar*c[j];

}

/* end of stubs for the "tuned" versions of the kernels */

Do somebody know, why tuned_STREAM_Copy() was not optimized?
Of course, I can modify function to something like c[j] = one*a[j];

P.S. all with gcc 11.2.0, compilled natively on A1222+, tested on X1000

AmigaOS3: Amiga 1200
AmigaOS4: Micro A1-C, AmigaOne XE, Pegasos II, Sam440ep, Sam440ep-flex, AmigaOne X1000
MorphOS: Efika 5200b, Pegasos I, Pegasos II, Powerbook, Mac Mini, iMac, Powermac Quad

balaton

Re: gcc autovectorization feature - Altivec

Posted on: 1/13 0:20 #2

Quite a regular

I don't know the answer but I noticed in your result tables that the Avg/Min/Max times are about half for Copy with AltiVec, yet the Best Rate seems to be the same or slightly less. This seems inconsistent but I don't know what these numbers really mean.

https://qmiga.codeberg.page/

sailor

Re: gcc autovectorization feature - Altivec

Posted on: 1/13 13:38 #3

Quite a regular

@balaton
yes, I have to check assembler, if really Copy uses FPU registers and Scale+Add+Triad AltiVec.

I want to learn howto write code which should be easily auto-vectorized by gcc - I am too lazy to made vector code by hand..

Register To Post
	Top Previous Topic Next Topic

Currently Active Users Viewing This Thread: 1 ( 0 members and 1 Anonymous Users )