As a sidenote to the recent post, the therein presented kernel is already superfast, but guess what :-)
There are ways to make it even faster by virtue of memory access optimizations. Let’s consider the memory access of U and V plane. They both access the same bits within the same dimension, thus can be consolidated into the same work item to access the memory only once (global memory access is the slowest memory access type).
Futhermore, “flattening” the work-group from 2D to 1D enables faster sequential memory access instead of the presented 2D access, hence benefit much better from prefetching and probably help avoiding bank conflicts…
So far for optimizations.. If there is demand on an appropriate kernel, then drop me an email..
Hi,
We may need OpenCL code to do exactly what you did: ARGB to YUV – can you specify the license for the code you published? (It would need to be GPLv2+ compatible for us to be able to use it..)
FYI: newer versions of x264 can encode ARGB directly, but we have other plans ;)
Thanks
Antoine
xpra.org
Hi Antoine,
i’m sorry for the really late response. Of course, you can use it under the GPL. It’s published under the GPL as part of the Environs-framework http://hcm-lab.de/environs
Chi-Tai