So I fired up the mobile JS demo of NanoCL to compare the speed bump from my old Note 3 GPU to the new and shiny S7 GPU. It didn’t work. And this time, it’s not my fault!
Less GPU for more money
Yes. It turns out, Samsung ships it’s new flagship phone with a GPU where the full-precision float-texture NanoCL uses to store data is not supported. The 3 year old Note 3 GPU is more capable for actual data processing than the S7’s.
Not all is lost though. I have a workaround that I need to test and implement. It basically uses a IEEE float encoder on the JS side and a decoder on the GPU side. That works, but it’s a) slower and b) cut’s the amount of data that can be processed by a single NanoCL call (i.e. in parallel) from ~81922x4 floats to just ~81922. That might not sound like such a huge hit, but it limits I/O. Instead of multiple inputs per vector, it’s now just one input and output value. There are hardly any data processing task that can benefit from this memory layout.