08Aug 2013

OpenCL: kernel optimization of ARGB to YUV

by Chi-Tai

As a sidenote to the recent post, the therein presented kernel is already superfast, but guess what :-)

There are ways to make it even faster by virtue of memory access optimizations. Let’s consider the memory access of U and V plane. They both access the same bits within the same dimension, thus can be consolidated into the same work item to access the memory only once (global memory access is the slowest memory access type).

Futhermore, “flattening” the work-group from 2D to 1D enables faster sequential memory access instead of the presented 2D access, hence benefit much better from prefetching and probably help avoiding bank conflicts…

So far for optimizations.. If there is demand on an appropriate kernel, then drop me an email..

Posted in: Development, Software, Study

2 Thoughts on “OpenCL: kernel optimization of ARGB to YUV”

Antoine Martin on August 26, 2013 at 3:34 pm said:

Hi,

We may need OpenCL code to do exactly what you did: ARGB to YUV – can you specify the license for the code you published? (It would need to be GPLv2+ compatible for us to be able to use it..)
FYI: newer versions of x264 can encode ARGB directly, but we have other plans ;)

Thanks
Antoine
xpra.org
- Chi-Tai on August 24, 2014 at 5:26 am said:
  
  Hi Antoine,
  
  i’m sorry for the really late response. Of course, you can use it under the GPL. It’s published under the GPL as part of the Environs-framework http://hcm-lab.de/environs
  
  Chi-Tai

← Previous Post

OpenCL: kernel optimization of ARGB to YUV

2 Thoughts on “OpenCL: kernel optimization of ARGB to YUV”

Post Navigation