Category Archives: Publications

Publications refer to all manusscripts accepted for publication on workshops, conferences, and journals.

Paper “Generating Custom Code for Efficient Query Execution on Heterogeneous Processors” accepted in The VLDB Journal

By   June 25, 2018

Paper “Generating Custom Code for Efficient Query Execution on Heterogeneous Processors” accepted in The VLDB Journal

Authors

Sebastian Breß (DFKI GmbH, TU Berlin), Bastian Köcher (TU Berlin), Henning Funke (TU Dortmund University), Steffen Zeuch (German Research Center for Artificial Intelligence), Tilmann Rabl (TU Berlin, DFKI GmbH), and Volker Markl (TU Berlin, DFKI GmbH)

Abstract

Processor manufacturers build increasingly specialized processors to mitigate the effects of the power wall in order to deliver improved performance. Currently, database engines have to be manually optimized for each processor which is a costly and error prone process.

In this paper, we propose concepts to adapt to and to exploit the performance enhancements of modern processors automatically. Our core idea is to create processor-specific code variants and to learn a well-performing code variant for each processor. These code variants leverage various parallelization strategies and apply both generic and processor-specific code transformations.

Our experimental results show that the performance of code variants may diverge up to two orders of magnitude. In order to achieve peak performance, we generate custom code for each processor. We show that our approach finds an efficient custom code variant for multi-core CPUs, GPUs, and MICs.

Supplementary Material

Source Code: https://github.com/TU-Berlin-DIMA/Hawk-VLDBJ

Short Paper “Efficient SIMD Vectorization for Hashing in OpenCL” accepted at EDBT

By   January 21, 2018

Short Paper “Efficient SIMD Vectorization for Hashing in OpenCL” accepted at EDBT 2018

Authors

Tobias Behrens (German Research Center for Artificial Intelligence), Viktor Rosenfeld (German Research Center for Artificial Intelligence), Jonas Traub (TU Berlin), Sebastian Breß (German Research Center for Artificial Intelligence, TU Berlin), Volker Markl (TU Berlin, German Research Center for Artificial Intelligence)

Abstract

Hashing is at the core of many efficient database operators such as hash-based joins and aggregations. Vectorization is a technique that uses Single Instruction Multiple Data (SIMD) instructions to process multiple data elements at once. Applying vectorization to hash tables results in promising speedups for build and probe operations. However, vectorization typically requires intrinsics – low-level APIs in which functions map to processor-specific SIMD instructions. Intrinsics are specific to a processor architecture and result in complex and difficult to maintain code.

OpenCL is a parallel programming framework which provides a higher abstraction level than intrinsics and is portable to different processors. Thus, OpenCL avoids processor dependencies, which results in improved code maintainability. In this paper, we add efficient, vectorized hashing primitives to OpenCL. Our experimental study shows that OpenCL-based vectorization is competitive to intrinsics on CPUs and Xeon Phi coprocessors.

Paper “Pipelined Query Processing in Coprocessor Environments” accepted at SIGMOD 2018

By   November 15, 2017

Paper “Pipelined Query Processing in Coprocessor Environments” accepted at SIGMOD 2018

Authors

Henning Funke (TU Dortmund University), Sebastian Breß (German Research Center for Artificial Intelligence, TU Berlin), Stefan Noll (TU Dortmund University), Volker Markl (TU Berlin, German Research Center for Artificial Intelligence), and Jens Teubner (TU Dortmund University)

Abstract

Query processing on GPU-style coprocessors is severely limited by the movement of data. With teraflops of compute throughput in one device, even high-bandwidth memory cannot provision enough data for a reasonable utilization. Query compilation is a proven technique to improve memory efficiency. However, its inherent tuple-at-a-time processing style does not suit the massively parallel execution model of GPU-style coprocessors. This compromises the improvements in efficiency offered by query compilation. In this paper, we show how query compilation and GPU-style parallelism can be made to play in unison nevertheless. We describe a compiler strategy that merges multiple operations into a single GPU kernel, thereby significantly reducing bandwidth demand. Compared to operator-at-a-time, we show reductions of memory access volumes by factors of up to 7.5x resulting in shorter kernel execution times by factors of up to 9.5x.

Paper “Efficient Storage and Analysis of Genome Data in Databases” accepted at BTW 2017

By   November 28, 2016

Paper “Efficient Storage and Analysis of Genome Data in Databases” accepted at BTW 2017

Authors

Sebastian Dorok (Bayer Business Services GmbH, University of Magdeburg), Sebastian Breß (DFKI GmbH), Jens Teubner (TU Dortmund University), Horstfried Läpple (Bayer HealthCare AG), Gunter Saake (University of Magdeburg), Volker Markl (TU Berlin, DFKI GmbH)

Abstract

Genome-analysis enables researchers to detect mutations within genomes and deduce their consequences. Researchers need reliable analysis platforms to ensure reproducible and comprehensive analysis results. Database systems provide vital support to implement the required sustainable procedures. Nevertheless, they are not used throughout the complete genome-analysis process, because (1) database systems suffer from high storage overhead for genome data and (2) they introduce overhead during domain-specific analysis. To overcome these limitations, we integrate genome-specific compression into database systems using a specialized database schema. Thus, we can reduce the storage overhead to 30%. Moreover, we can exploit genome-data characteristics during query processing allowing us to analyze real-world data sets up to five times faster than specialized analysis tools and eight times faster than a straightforward database approach.

Supplementary Material

Source Code: cogadb-0.4.2_btw_2017_source_code.zip (mirror)

Paper “Robust Query Processing in Co-Processor-accelerated Databases” accepted at SIGMOD 2016

By   December 1, 2015

Paper “Robust Query Processing in Co-Processor-accelerated Databases” accepted at SIGMOD 2016

Authors

Sebastian Breß (German Research Center for Artificial Intelligence), Henning Funke (TU Dortmund University), and Jens Teubner (TU Dortmund University)

Abstract

Technology limitations are making the use of heterogeneous computing devices much more than an academic curiosity. In fact, the use of such devices is widely acknowledged to be the only promising way to achieve application-speedups that users urgently need and expect. However, building a robust and efficient query engine for heterogeneous co-processor environments is still a significant challenge.

In this paper, we identify two effects that limit performance in case co-processor resources become scarce. Cache thrashing occurs when the working set of queries does not fit into the co-processor’s data cache, resulting in performance degradations up to a factor of 24. Heap contention occurs when multiple operators run in parallel on a co-processor and when their accumulated memory footprint exceeds the main memory capacity of the co-processor, slowing down query execution by up to a factor of six.

We propose solutions for both effects. Data-driven operator placement avoids data movements when they might be harmful; query chopping limits co-processor memory usage and thus avoids contention. The combined approach—data-driven query chopping—achieves robust and scalable performance on co-processors. We validate our proposal with our open-source GPU-accelerated database engine CoGaDB and the popular star schema and TPC-H benchmarks.

Supplementary Material

Source Code: cogadb-0.4.1_sigmod_2016_source_code.zip (mirror)

Paper “GPU-accelerated join-order optimization” accepted for publication at VLDB PhD Workshop

By   May 26, 2015

Thesis Proposal “GPU-accelerated join-order optimization” accepted for publication at the VLDB PhD Workshop

Authors

Andreas Meister (University of Magdeburg)

Abstract

Join-order optimization is an important task during query processing in DBMSs. The execution time of different join orders can vary by several orders of magnitude. Hence, efficient join orders are essential to ensure the efficiency of query processing. Established techniques for join-order optimization pose a challenge for current hardware architectures, because they are mainly sequential algorithms. Current architectures become increasingly heterogeneous by using specialized co-processors such as GPUs. GPUs offer a highly parallel architecture with a higher computational power compared to CPUs. Because join-order optimization benefits from parallel execution, we expect further improvements by using GPUs. Therefore, in this thesis, we adapt join-order optimization approaches to GPUs.

Paper “The Relational Way to Dam the Flood of Genome Data” accepted for publication at SIGMOD/PODS PhD Workshop

By   March 29, 2015

Thesis Proposal “The relational way to dam the flood of genome data” accepted for publication at SIGMOD/PODS PhD Workshop

Authors

Sebastian Dorok (Bayer Pharma AG, University of Magdeburg)

Abstract

Mutations in genomes can indicate a predisposition for diseases such as cancer or cardiovascular disorder. Genome analysis is an established procedure to determine mutations and deduce their impact on living organisms. The first step in genome analysis is DNA sequencing that makes the biochemically stored hereditary information in DNA digitally readable. The cost and time to sequence a whole genome decreases rapidly and leads to an increase of available raw genome data that must be stored and integrated to be analyzed. Damming this flood of genome data requires efficient and effective analysis as well as data management solutions. State-of-the-art in genome analysis are flat-file-based storage and analysis solutions. Consequently, every analysis application is responsible to manage data on its own, which leads to implementation and process overhead.

Database systems have already shown their ability to reduce data management overhead for analysis applications in various domains. However, current approaches using relational database systems for genome-data management lack scalable performance on increasing amounts of genome data. In this thesis, we investigate the capabilities of relational main-memory database systems to store and query genome data efficiently, while enabling flexible data access.

Paper “Adaptive Reprogramming for Databases on Heterogeneous Processors” accepted for publication at SIGMOD/PODS PhD Workshop 2015

By   March 10, 2015

Paper “Adaptive Reprogramming for Databases on Heterogeneous Processors” accepted for publication at SIGMOD/PODS PhD Workshop 2015

Author

David Broneske (University of Magdeburg)

Abstract

It has become clear by now that modern processing hardware gets increasingly heterogeneous, which forces data processing algorithms to care about the underlying hardware. However, current approaches for implementing data intensive operators (e.g., in database systems) either cause enormous programming effort for tuning one algorithm to several processors (the hardware-sensitive way), or do not fully exploit possible performance possibilities because of an abstract operator description (the hardware-oblivious way). In this thesis, we propose an algorithm optimizer, which automatically tunes a hardware-oblivious operator description to the underlying hardware. This way, the DBMS can rewrite its operator code until it runs optimally on the given hardware.

 

Paper “Toward GPU-accelerated Database Optimization” accepted for publication at Datenbank-Spektrum

By   March 5, 2015

Paper “Toward GPU-accelerated Database Optimization” accepted for publication at Datenbank-Spektrum

Authors

Andreas Meister (University of Magdeburg), Sebastian Breß (TU Dortmund University), Gunter Saake (University of Magdeburg)

Abstract

For over three decades, research investigates optimization options in databases. Typically, we use problem specific heuristics, due to the large size of the optimization space. Nowadays, processors are bound by a fixed energy budget, leading to increased parallelism and heterogeneity. Existing optimization heuristics in databases do not exploit parallelism for a single optimization task and, hence, can only benefit from the parallelism offered by new (co-)processors by batch-processing several optimization tasks.

Since a large optimization space often allows us to process sub-spaces in parallel, we expect large gains in result quality, and, hence, performance for query processing on modern (co-)processors. However, parallel optimization on CPUs is likely to slow down query processing, because DBMSs can fully exploit the CPUs computing resources due to high data parallelism. In contrast, the communication overhead of co-processors such as GPUs typically lead to plenty of compute resources unused.

In this paper, we motivate the use of parallel co-processors for optimization in databases, identify optimization problems that can benefit from parallelization, and show how we can design parallel optimization heuristics on the example of the operator placement problem.

Demonstration “Flexible Analysis of Plant Genomes in a Database Management System” accepted for publication at EDBT

By   January 27, 2015

Demonstration “Flexible Analysis of Plant Genomes in a Database Management System” accepted for publication at EDBT

Authors

Sebastian Dorok (Bayer Pharma AG, University of Magdeburg), Sebastian Breß (TU Dortmund University), Jens Teubner (TU Dortmund University), Gunter Saake (University of Magdeburg)

Abstract

Analysis of genomes has a wide range of applications from disease susceptibility studies to plant breeding research. For example, different types of barley have differing characteristics regarding draught or salt tolerance. Thus, a typical use case is comparing two plant genomes and try to deduce which genes are responsible for a certain resistance. For this, we need to find differences in large volumes of aligned genome data, which is already available in large genome databases.

The challenge is to efficiently retrieve the genotypes of a certain range of the genome, and then, to determine variants and their impact on the plant organism. State-of-the-art tools are fixed pipelines with a fixed parametrization. However, in practice, users want to interactively analyse genome data and need to customize the parametrization. In this demonstration, we show how we can support flexible ad-hoc analyses of arbitrary plant genomes using SQL with a small set of user-defined aggregation functions and dynamic parametrization. Furthermore, we demonstrate how genome analysis workflows for variant calling can be applied to our system and provide insights about the performance of our system.