Paper “Generating Custom Code for Efficient Query Execution on Heterogeneous Processors” accepted in The VLDB Journal
Sebastian Breß (DFKI GmbH, TU Berlin), Bastian Köcher (TU Berlin), Henning Funke (TU Dortmund University), Steffen Zeuch (German Research Center for Artificial Intelligence), Tilmann Rabl (TU Berlin, DFKI GmbH), and Volker Markl (TU Berlin, DFKI GmbH)
Processor manufacturers build increasingly specialized processors to mitigate the effects of the power wall in order to deliver improved performance. Currently, database engines have to be manually optimized for each processor which is a costly and error prone process.
In this paper, we propose concepts to adapt to and to exploit the performance enhancements of modern processors automatically. Our core idea is to create processor-specific code variants and to learn a well-performing code variant for each processor. These code variants leverage various parallelization strategies and apply both generic and processor-specific code transformations.
Our experimental results show that the performance of code variants may diverge up to two orders of magnitude. In order to achieve peak performance, we generate custom code for each processor. We show that our approach finds an efficient custom code variant for multi-core CPUs, GPUs, and MICs.
Source Code: https://github.com/TU-Berlin-DIMA/Hawk-VLDBJ
Short Paper “Efficient SIMD Vectorization for Hashing in OpenCL” accepted at EDBT 2018
Tobias Behrens (German Research Center for Artificial Intelligence), Viktor Rosenfeld (German Research Center for Artificial Intelligence), Jonas Traub (TU Berlin), Sebastian Breß (German Research Center for Artificial Intelligence, TU Berlin), Volker Markl (TU Berlin, German Research Center for Artificial Intelligence)
Hashing is at the core of many efficient database operators such as hash-based joins and aggregations. Vectorization is a technique that uses Single Instruction Multiple Data (SIMD) instructions to process multiple data elements at once. Applying vectorization to hash tables results in promising speedups for build and probe operations. However, vectorization typically requires intrinsics – low-level APIs in which functions map to processor-specific SIMD instructions. Intrinsics are specific to a processor architecture and result in complex and difficult to maintain code.
OpenCL is a parallel programming framework which provides a higher abstraction level than intrinsics and is portable to different processors. Thus, OpenCL avoids processor dependencies, which results in improved code maintainability. In this paper, we add efficient, vectorized hashing primitives to OpenCL. Our experimental study shows that OpenCL-based vectorization is competitive to intrinsics on CPUs and Xeon Phi coprocessors.
Paper “Pipelined Query Processing in Coprocessor Environments” accepted at SIGMOD 2018
Henning Funke (TU Dortmund University), Sebastian Breß (German Research Center for Artificial Intelligence, TU Berlin), Stefan Noll (TU Dortmund University), Volker Markl (TU Berlin, German Research Center for Artificial Intelligence), and Jens Teubner (TU Dortmund University)
Query processing on GPU-style coprocessors is severely limited by the movement of data. With teraflops of compute throughput in one device, even high-bandwidth memory cannot provision enough data for a reasonable utilization. Query compilation is a proven technique to improve memory efficiency. However, its inherent tuple-at-a-time processing style does not suit the massively parallel execution model of GPU-style coprocessors. This compromises the improvements in efficiency offered by query compilation. In this paper, we show how query compilation and GPU-style parallelism can be made to play in unison nevertheless. We describe a compiler strategy that merges multiple operations into a single GPU kernel, thereby significantly reducing bandwidth demand. Compared to operator-at-a-time, we show reductions of memory access volumes by factors of up to 7.5x resulting in shorter kernel execution times by factors of up to 9.5x.
Paper “Efficient Storage and Analysis of Genome Data in Databases” accepted at BTW 2017
Sebastian Dorok (Bayer Business Services GmbH, University of Magdeburg), Sebastian Breß (DFKI GmbH), Jens Teubner (TU Dortmund University), Horstfried Läpple (Bayer HealthCare AG), Gunter Saake (University of Magdeburg), Volker Markl (TU Berlin, DFKI GmbH)
Genome-analysis enables researchers to detect mutations within genomes and deduce their consequences. Researchers need reliable analysis platforms to ensure reproducible and comprehensive analysis results. Database systems provide vital support to implement the required sustainable procedures. Nevertheless, they are not used throughout the complete genome-analysis process, because (1) database systems suffer from high storage overhead for genome data and (2) they introduce overhead during domain-specific analysis. To overcome these limitations, we integrate genome-specific compression into database systems using a specialized database schema. Thus, we can reduce the storage overhead to 30%. Moreover, we can exploit genome-data characteristics during query processing allowing us to analyze real-world data sets up to five times faster than specialized analysis tools and eight times faster than a straightforward database approach.
Source Code: cogadb-0.4.2_btw_2017_source_code.zip (mirror)
Paper “Robust Query Processing in Co-Processor-accelerated Databases” accepted at SIGMOD 2016
Sebastian Breß (German Research Center for Artificial Intelligence), Henning Funke (TU Dortmund University), and Jens Teubner (TU Dortmund University)
Technology limitations are making the use of heterogeneous computing devices much more than an academic curiosity. In fact, the use of such devices is widely acknowledged to be the only promising way to achieve application-speedups that users urgently need and expect. However, building a robust and efficient query engine for heterogeneous co-processor environments is still a significant challenge.
In this paper, we identify two effects that limit performance in case co-processor resources become scarce. Cache thrashing occurs when the working set of queries does not fit into the co-processor’s data cache, resulting in performance degradations up to a factor of 24. Heap contention occurs when multiple operators run in parallel on a co-processor and when their accumulated memory footprint exceeds the main memory capacity of the co-processor, slowing down query execution by up to a factor of six.
We propose solutions for both effects. Data-driven operator placement avoids data movements when they might be harmful; query chopping limits co-processor memory usage and thus avoids contention. The combined approach—data-driven query chopping—achieves robust and scalable performance on co-processors. We validate our proposal with our open-source GPU-accelerated database engine CoGaDB and the popular star schema and TPC-H benchmarks.
Source Code: cogadb-0.4.1_sigmod_2016_source_code.zip (mirror)
Thesis Proposal “GPU-accelerated join-order optimization” accepted for publication at the VLDB PhD Workshop
Andreas Meister (University of Magdeburg)
Join-order optimization is an important task during query processing in DBMSs. The execution time of different join orders can vary by several orders of magnitude. Hence, efficient join orders are essential to ensure the efficiency of query processing. Established techniques for join-order optimization pose a challenge for current hardware architectures, because they are mainly sequential algorithms. Current architectures become increasingly heterogeneous by using specialized co-processors such as GPUs. GPUs offer a highly parallel architecture with a higher computational power compared to CPUs. Because join-order optimization benefits from parallel execution, we expect further improvements by using GPUs. Therefore, in this thesis, we adapt join-order optimization approaches to GPUs.
Thesis Proposal “The relational way to dam the flood of genome data” accepted for publication at SIGMOD/PODS PhD Workshop
Sebastian Dorok (Bayer Pharma AG, University of Magdeburg)
Mutations in genomes can indicate a predisposition for diseases such as cancer or cardiovascular disorder. Genome analysis is an established procedure to determine mutations and deduce their impact on living organisms. The first step in genome analysis is DNA sequencing that makes the biochemically stored hereditary information in DNA digitally readable. The cost and time to sequence a whole genome decreases rapidly and leads to an increase of available raw genome data that must be stored and integrated to be analyzed. Damming this flood of genome data requires efficient and effective analysis as well as data management solutions. State-of-the-art in genome analysis are flat-file-based storage and analysis solutions. Consequently, every analysis application is responsible to manage data on its own, which leads to implementation and process overhead.
Database systems have already shown their ability to reduce data management overhead for analysis applications in various domains. However, current approaches using relational database systems for genome-data management lack scalable performance on increasing amounts of genome data. In this thesis, we investigate the capabilities of relational main-memory database systems to store and query genome data efficiently, while enabling flexible data access.
Paper “Adaptive Reprogramming for Databases on Heterogeneous Processors” accepted for publication at SIGMOD/PODS PhD Workshop 2015
David Broneske (University of Magdeburg)
It has become clear by now that modern processing hardware gets increasingly heterogeneous, which forces data processing algorithms to care about the underlying hardware. However, current approaches for implementing data intensive operators (e.g., in database systems) either cause enormous programming effort for tuning one algorithm to several processors (the hardware-sensitive way), or do not fully exploit possible performance possibilities because of an abstract operator description (the hardware-oblivious way). In this thesis, we propose an algorithm optimizer, which automatically tunes a hardware-oblivious operator description to the underlying hardware. This way, the DBMS can rewrite its operator code until it runs optimally on the given hardware.
Demonstration “Flexible Analysis of Plant Genomes in a Database Management System” accepted for publication at EDBT
Sebastian Dorok (Bayer Pharma AG, University of Magdeburg), Sebastian Breß (TU Dortmund University), Jens Teubner (TU Dortmund University), Gunter Saake (University of Magdeburg)
Analysis of genomes has a wide range of applications from disease susceptibility studies to plant breeding research. For example, different types of barley have differing characteristics regarding draught or salt tolerance. Thus, a typical use case is comparing two plant genomes and try to deduce which genes are responsible for a certain resistance. For this, we need to find differences in large volumes of aligned genome data, which is already available in large genome databases.
The challenge is to efficiently retrieve the genotypes of a certain range of the genome, and then, to determine variants and their impact on the plant organism. State-of-the-art tools are fixed pipelines with a fixed parametrization. However, in practice, users want to interactively analyse genome data and need to customize the parametrization. In this demonstration, we show how we can support flexible ad-hoc analyses of arbitrary plant genomes using SQL with a small set of user-defined aggregation functions and dynamic parametrization. Furthermore, we demonstrate how genome analysis workflows for variant calling can be applied to our system and provide insights about the performance of our system.