Paper “Generating Custom Code for Efficient Query Execution on Heterogeneous Processors” accepted in The VLDB Journal
Sebastian Breß (DFKI GmbH, TU Berlin), Bastian Köcher (TU Berlin), Henning Funke (TU Dortmund University), Steffen Zeuch (German Research Center for Artificial Intelligence), Tilmann Rabl (TU Berlin, DFKI GmbH), and Volker Markl (TU Berlin, DFKI GmbH)
Processor manufacturers build increasingly specialized processors to mitigate the effects of the power wall in order to deliver improved performance. Currently, database engines have to be manually optimized for each processor which is a costly and error prone process.
In this paper, we propose concepts to adapt to and to exploit the performance enhancements of modern processors automatically. Our core idea is to create processor-specific code variants and to learn a well-performing code variant for each processor. These code variants leverage various parallelization strategies and apply both generic and processor-specific code transformations.
Our experimental results show that the performance of code variants may diverge up to two orders of magnitude. In order to achieve peak performance, we generate custom code for each processor. We show that our approach finds an efficient custom code variant for multi-core CPUs, GPUs, and MICs.
Source Code: https://github.com/TU-Berlin-DIMA/Hawk-VLDBJ
Short Paper “Efficient SIMD Vectorization for Hashing in OpenCL” accepted at EDBT 2018
Tobias Behrens (German Research Center for Artificial Intelligence), Viktor Rosenfeld (German Research Center for Artificial Intelligence), Jonas Traub (TU Berlin), Sebastian Breß (German Research Center for Artificial Intelligence, TU Berlin), Volker Markl (TU Berlin, German Research Center for Artificial Intelligence)
Hashing is at the core of many efficient database operators such as hash-based joins and aggregations. Vectorization is a technique that uses Single Instruction Multiple Data (SIMD) instructions to process multiple data elements at once. Applying vectorization to hash tables results in promising speedups for build and probe operations. However, vectorization typically requires intrinsics – low-level APIs in which functions map to processor-specific SIMD instructions. Intrinsics are specific to a processor architecture and result in complex and difficult to maintain code.
OpenCL is a parallel programming framework which provides a higher abstraction level than intrinsics and is portable to different processors. Thus, OpenCL avoids processor dependencies, which results in improved code maintainability. In this paper, we add efficient, vectorized hashing primitives to OpenCL. Our experimental study shows that OpenCL-based vectorization is competitive to intrinsics on CPUs and Xeon Phi coprocessors.
Paper “Pipelined Query Processing in Coprocessor Environments” accepted at SIGMOD 2018
Henning Funke (TU Dortmund University), Sebastian Breß (German Research Center for Artificial Intelligence, TU Berlin), Stefan Noll (TU Dortmund University), Volker Markl (TU Berlin, German Research Center for Artificial Intelligence), and Jens Teubner (TU Dortmund University)
Query processing on GPU-style coprocessors is severely limited by the movement of data. With teraflops of compute throughput in one device, even high-bandwidth memory cannot provision enough data for a reasonable utilization. Query compilation is a proven technique to improve memory efficiency. However, its inherent tuple-at-a-time processing style does not suit the massively parallel execution model of GPU-style coprocessors. This compromises the improvements in efficiency offered by query compilation. In this paper, we show how query compilation and GPU-style parallelism can be made to play in unison nevertheless. We describe a compiler strategy that merges multiple operations into a single GPU kernel, thereby significantly reducing bandwidth demand. Compared to operator-at-a-time, we show reductions of memory access volumes by factors of up to 7.5x resulting in shorter kernel execution times by factors of up to 9.5x.
Paper “Efficient Storage and Analysis of Genome Data in Databases” accepted at BTW 2017
Sebastian Dorok (Bayer Business Services GmbH, University of Magdeburg), Sebastian Breß (DFKI GmbH), Jens Teubner (TU Dortmund University), Horstfried Läpple (Bayer HealthCare AG), Gunter Saake (University of Magdeburg), Volker Markl (TU Berlin, DFKI GmbH)
Genome-analysis enables researchers to detect mutations within genomes and deduce their consequences. Researchers need reliable analysis platforms to ensure reproducible and comprehensive analysis results. Database systems provide vital support to implement the required sustainable procedures. Nevertheless, they are not used throughout the complete genome-analysis process, because (1) database systems suffer from high storage overhead for genome data and (2) they introduce overhead during domain-specific analysis. To overcome these limitations, we integrate genome-specific compression into database systems using a specialized database schema. Thus, we can reduce the storage overhead to 30%. Moreover, we can exploit genome-data characteristics during query processing allowing us to analyze real-world data sets up to five times faster than specialized analysis tools and eight times faster than a straightforward database approach.
Source Code: cogadb-0.4.2_btw_2017_source_code.zip (mirror)
Release of CoGaDB 0.4.2-beta1
We released version 0.4.2-beta1 of CoGaDB today. From this release on, we will provide the source code, an installer, and a debian package for Ubuntu 14.04 LTS. Additionally, we prepared a virtual machine (Virtual Box) which runs a demo of CoGaDB and provides all libraries and tools required to compile and develop CoGaDB.
The major new feature in this release is an SQL to C compiler, which allows us to compile a query to an optimized program that executes the query. A detailed change log will follow soon.
Paper “Robust Query Processing in Co-Processor-accelerated Databases” accepted at SIGMOD 2016
Sebastian Breß (German Research Center for Artificial Intelligence), Henning Funke (TU Dortmund University), and Jens Teubner (TU Dortmund University)
Technology limitations are making the use of heterogeneous computing devices much more than an academic curiosity. In fact, the use of such devices is widely acknowledged to be the only promising way to achieve application-speedups that users urgently need and expect. However, building a robust and efficient query engine for heterogeneous co-processor environments is still a significant challenge.
In this paper, we identify two effects that limit performance in case co-processor resources become scarce. Cache thrashing occurs when the working set of queries does not fit into the co-processor’s data cache, resulting in performance degradations up to a factor of 24. Heap contention occurs when multiple operators run in parallel on a co-processor and when their accumulated memory footprint exceeds the main memory capacity of the co-processor, slowing down query execution by up to a factor of six.
We propose solutions for both effects. Data-driven operator placement avoids data movements when they might be harmful; query chopping limits co-processor memory usage and thus avoids contention. The combined approach—data-driven query chopping—achieves robust and scalable performance on co-processors. We validate our proposal with our open-source GPU-accelerated database engine CoGaDB and the popular star schema and TPC-H benchmarks.
Source Code: cogadb-0.4.1_sigmod_2016_source_code.zip (mirror)
Release of CoGaDB 0.4.1
We released version 0.4.1 of CoGaDB today. You can download it here. The release contains many fixes to bugs that came up the last month and improvements of the error checking for the SQL interface.
Release of CoGaDB 0.4
We released version 0.4 of CoGaDB today. You can download it here. The release contains the following changes:
- SQL queries can now be entered without the exec command
- Added support for having construct (not fully SQL compliant, but usable)
- Support for nested queries
- We can now import and analyse aligned genome data from sam, bam and fasta files
- Implemented specialized aggregation functions to support variant calling via SQL
- Additional Compression Method: Dictionary Compressed Column with bitpacked encoding keys
- Explicit use of compression in database schema for genome data
- CoGaDB can now accept connections via network by calling the command “listen <port number>”
- Users can now connect to CoGaDB via any plain text network utility, such as netcat or telnet
- Added extensive benchmarking suite for the star schema benchmark
- Added capability to abort and later, resume experiments without redoing finsihed experiments
- Complete support for GPU acceleration of most queries, including groupby and aggregation, which runs completely on the GPU now
- Several improvements on GPU Cache and detection of runtime errors
- Reverse Join Indexes, which significantly speedup the tuple reconstruction phase after an invisible join
- A Hash based Aggregation for common aggregation functions on CPUs
- Major improvements on optimization heuristics
- Added new learning method: weighted KNN Regression, which is now used by default by HyPE and supports up to 5 features in a feature vector
- Primary key/foreign key integrity cosntraints
- Measurement of energy consumption of queries on CPUs