so, report 1:
first I added -fno-inline in the compile flags to get inlined functions also appear in the perf reports, from the list, I found one that was at first trivial to optimize and wanted to do it just to see the impact -if any. Proved to be non-trivial, mostly because of alignment (if data was aligned it would be 3x faster, but it wasn't).
So, with the perf run:
$ sudo perf record -a ./ffmpeg_g -cpuflags altivec -benchmark -i Prometheus\ -\ Trailer.mp4 -f null /dev/null
...
Running time/fps didn't change, both took 121-125secs. To be honest, I didn't expect a big change as it doesn't get called so often, so nothing to get excited about just yet, but testing with perf I was able to measure the instruction from
0.65% ffmpeg_g ffmpeg_g [.] write16x4
to
0.50% ffmpeg_g ffmpeg_g [.] write16x4
This took a total of 4 hours so far (I'm excluding the initial setup/code traversal). I have some better candidates to work on, so a 2nd update will come soon.
Code can be found here:
https://github.com/markos/FFmpeg/commi ... 2a11bd1e2fbb9e9c0af697d8aEdit: any idea why \[code\] tag takes so much space?