Short Paper “Efficient SIMD Vectorization for Hashing in OpenCL” accepted at EDBT 2018
Tobias Behrens (German Research Center for Artificial Intelligence), Viktor Rosenfeld (German Research Center for Artificial Intelligence), Jonas Traub (TU Berlin), Sebastian Breß (German Research Center for Artificial Intelligence, TU Berlin), Volker Markl (TU Berlin, German Research Center for Artificial Intelligence)
Hashing is at the core of many efficient database operators such as hash-based joins and aggregations. Vectorization is a technique that uses Single Instruction Multiple Data (SIMD) instructions to process multiple data elements at once. Applying vectorization to hash tables results in promising speedups for build and probe operations. However, vectorization typically requires intrinsics – low-level APIs in which functions map to processor-specific SIMD instructions. Intrinsics are specific to a processor architecture and result in complex and difficult to maintain code.
OpenCL is a parallel programming framework which provides a higher abstraction level than intrinsics and is portable to different processors. Thus, OpenCL avoids processor dependencies, which results in improved code maintainability. In this paper, we add efficient, vectorized hashing primitives to OpenCL. Our experimental study shows that OpenCL-based vectorization is competitive to intrinsics on CPUs and Xeon Phi coprocessors.