Author Archives: Sebastian Breß

Paper “Generating Custom Code for Efficient Query Execution on Heterogeneous Processors” accepted in The VLDB Journal

By   June 25, 2018

Paper “Generating Custom Code for Efficient Query Execution on Heterogeneous Processors” accepted in The VLDB Journal

Authors

Sebastian Breß (DFKI GmbH, TU Berlin), Bastian Köcher (TU Berlin), Henning Funke (TU Dortmund University), Steffen Zeuch (German Research Center for Artificial Intelligence), Tilmann Rabl (TU Berlin, DFKI GmbH), and Volker Markl (TU Berlin, DFKI GmbH)

Abstract

Processor manufacturers build increasingly specialized processors to mitigate the effects of the power wall in order to deliver improved performance. Currently, database engines have to be manually optimized for each processor which is a costly and error prone process.

In this paper, we propose concepts to adapt to and to exploit the performance enhancements of modern processors automatically. Our core idea is to create processor-specific code variants and to learn a well-performing code variant for each processor. These code variants leverage various parallelization strategies and apply both generic and processor-specific code transformations.

Our experimental results show that the performance of code variants may diverge up to two orders of magnitude. In order to achieve peak performance, we generate custom code for each processor. We show that our approach finds an efficient custom code variant for multi-core CPUs, GPUs, and MICs.

Supplementary Material

Source Code: https://github.com/TU-Berlin-DIMA/Hawk-VLDBJ

Short Paper “Efficient SIMD Vectorization for Hashing in OpenCL” accepted at EDBT

By   January 21, 2018

Short Paper “Efficient SIMD Vectorization for Hashing in OpenCL” accepted at EDBT 2018

Authors

Tobias Behrens (German Research Center for Artificial Intelligence), Viktor Rosenfeld (German Research Center for Artificial Intelligence), Jonas Traub (TU Berlin), Sebastian Breß (German Research Center for Artificial Intelligence, TU Berlin), Volker Markl (TU Berlin, German Research Center for Artificial Intelligence)

Abstract

Hashing is at the core of many efficient database operators such as hash-based joins and aggregations. Vectorization is a technique that uses Single Instruction Multiple Data (SIMD) instructions to process multiple data elements at once. Applying vectorization to hash tables results in promising speedups for build and probe operations. However, vectorization typically requires intrinsics – low-level APIs in which functions map to processor-specific SIMD instructions. Intrinsics are specific to a processor architecture and result in complex and difficult to maintain code.

OpenCL is a parallel programming framework which provides a higher abstraction level than intrinsics and is portable to different processors. Thus, OpenCL avoids processor dependencies, which results in improved code maintainability. In this paper, we add efficient, vectorized hashing primitives to OpenCL. Our experimental study shows that OpenCL-based vectorization is competitive to intrinsics on CPUs and Xeon Phi coprocessors.

Paper “Pipelined Query Processing in Coprocessor Environments” accepted at SIGMOD 2018

By   November 15, 2017

Paper “Pipelined Query Processing in Coprocessor Environments” accepted at SIGMOD 2018

Authors

Henning Funke (TU Dortmund University), Sebastian Breß (German Research Center for Artificial Intelligence, TU Berlin), Stefan Noll (TU Dortmund University), Volker Markl (TU Berlin, German Research Center for Artificial Intelligence), and Jens Teubner (TU Dortmund University)

Abstract

Query processing on GPU-style coprocessors is severely limited by the movement of data. With teraflops of compute throughput in one device, even high-bandwidth memory cannot provision enough data for a reasonable utilization. Query compilation is a proven technique to improve memory efficiency. However, its inherent tuple-at-a-time processing style does not suit the massively parallel execution model of GPU-style coprocessors. This compromises the improvements in efficiency offered by query compilation. In this paper, we show how query compilation and GPU-style parallelism can be made to play in unison nevertheless. We describe a compiler strategy that merges multiple operations into a single GPU kernel, thereby significantly reducing bandwidth demand. Compared to operator-at-a-time, we show reductions of memory access volumes by factors of up to 7.5x resulting in shorter kernel execution times by factors of up to 9.5x.

Paper “Efficient Storage and Analysis of Genome Data in Databases” accepted at BTW 2017

By   November 28, 2016

Paper “Efficient Storage and Analysis of Genome Data in Databases” accepted at BTW 2017

Authors

Sebastian Dorok (Bayer Business Services GmbH, University of Magdeburg), Sebastian Breß (DFKI GmbH), Jens Teubner (TU Dortmund University), Horstfried Läpple (Bayer HealthCare AG), Gunter Saake (University of Magdeburg), Volker Markl (TU Berlin, DFKI GmbH)

Abstract

Genome-analysis enables researchers to detect mutations within genomes and deduce their consequences. Researchers need reliable analysis platforms to ensure reproducible and comprehensive analysis results. Database systems provide vital support to implement the required sustainable procedures. Nevertheless, they are not used throughout the complete genome-analysis process, because (1) database systems suffer from high storage overhead for genome data and (2) they introduce overhead during domain-specific analysis. To overcome these limitations, we integrate genome-specific compression into database systems using a specialized database schema. Thus, we can reduce the storage overhead to 30%. Moreover, we can exploit genome-data characteristics during query processing allowing us to analyze real-world data sets up to five times faster than specialized analysis tools and eight times faster than a straightforward database approach.

Supplementary Material

Source Code: cogadb-0.4.2_btw_2017_source_code.zip (mirror)

Release of CoGaDB 0.4.2-beta1

By   December 6, 2015

Release of CoGaDB 0.4.2-beta1

We released version 0.4.2-beta1 of CoGaDB today. From this release on, we will provide the source code, an installer, and a debian package for Ubuntu 14.04 LTS. Additionally, we prepared a virtual machine (Virtual Box) which runs a demo of CoGaDB and provides all libraries and tools required to compile and develop CoGaDB.

The major new feature in this release is an SQL to C compiler, which allows us to compile a query to an optimized program that executes the query. A detailed change log will follow soon.

Paper “Robust Query Processing in Co-Processor-accelerated Databases” accepted at SIGMOD 2016

By   December 1, 2015

Paper “Robust Query Processing in Co-Processor-accelerated Databases” accepted at SIGMOD 2016

Authors

Sebastian Breß (German Research Center for Artificial Intelligence), Henning Funke (TU Dortmund University), and Jens Teubner (TU Dortmund University)

Abstract

Technology limitations are making the use of heterogeneous computing devices much more than an academic curiosity. In fact, the use of such devices is widely acknowledged to be the only promising way to achieve application-speedups that users urgently need and expect. However, building a robust and efficient query engine for heterogeneous co-processor environments is still a significant challenge.

In this paper, we identify two effects that limit performance in case co-processor resources become scarce. Cache thrashing occurs when the working set of queries does not fit into the co-processor’s data cache, resulting in performance degradations up to a factor of 24. Heap contention occurs when multiple operators run in parallel on a co-processor and when their accumulated memory footprint exceeds the main memory capacity of the co-processor, slowing down query execution by up to a factor of six.

We propose solutions for both effects. Data-driven operator placement avoids data movements when they might be harmful; query chopping limits co-processor memory usage and thus avoids contention. The combined approach—data-driven query chopping—achieves robust and scalable performance on co-processors. We validate our proposal with our open-source GPU-accelerated database engine CoGaDB and the popular star schema and TPC-H benchmarks.

Supplementary Material

Source Code: cogadb-0.4.1_sigmod_2016_source_code.zip (mirror)

Release of CoGaDB 0.4.1

By   March 29, 2015

Release of CoGaDB 0.4.1

We released version 0.4.1 of CoGaDB today. You can download it here. The release contains many fixes to bugs that came up the last month and improvements of the error checking for the SQL interface.

Paper “Toward GPU-accelerated Database Optimization” accepted for publication at Datenbank-Spektrum

By   March 5, 2015

Paper “Toward GPU-accelerated Database Optimization” accepted for publication at Datenbank-Spektrum

Authors

Andreas Meister (University of Magdeburg), Sebastian Breß (TU Dortmund University), Gunter Saake (University of Magdeburg)

Abstract

For over three decades, research investigates optimization options in databases. Typically, we use problem specific heuristics, due to the large size of the optimization space. Nowadays, processors are bound by a fixed energy budget, leading to increased parallelism and heterogeneity. Existing optimization heuristics in databases do not exploit parallelism for a single optimization task and, hence, can only benefit from the parallelism offered by new (co-)processors by batch-processing several optimization tasks.

Since a large optimization space often allows us to process sub-spaces in parallel, we expect large gains in result quality, and, hence, performance for query processing on modern (co-)processors. However, parallel optimization on CPUs is likely to slow down query processing, because DBMSs can fully exploit the CPUs computing resources due to high data parallelism. In contrast, the communication overhead of co-processors such as GPUs typically lead to plenty of compute resources unused.

In this paper, we motivate the use of parallel co-processors for optimization in databases, identify optimization problems that can benefit from parallelization, and show how we can design parallel optimization heuristics on the example of the operator placement problem.

Release of CoGaDB 0.4

By   February 28, 2015

Release of CoGaDB 0.4

We released version 0.4 of CoGaDB today. You can download it here. The release contains the following changes:

SQL Extensions

  • SQL queries can now be entered without the exec command
  • Added support for having construct (not fully SQL compliant, but usable)
  • Support for nested queries

Genomics Extension

  • We can now import and analyse aligned genome data from sam, bam and fasta files
  • Implemented specialized aggregation functions to support variant calling via SQL
  • Additional Compression Method: Dictionary Compressed Column with bitpacked encoding keys
  • Explicit use of compression in database schema for genome data

Network Support

  • CoGaDB can now accept connections via network by calling the command “listen <port number>”
  • Users can now connect to CoGaDB via any plain text network utility, such as netcat or telnet

Benchmarking

  • Added extensive benchmarking suite for the star schema benchmark
  • Added capability to abort and later, resume experiments without redoing finsihed experiments

Query Processor

  • Complete support for GPU acceleration of most queries, including groupby and aggregation, which runs completely on the GPU now
  • Several improvements on GPU Cache and detection of runtime errors
  • Reverse Join Indexes, which significantly speedup the tuple reconstruction phase after an invisible join
  • A Hash based Aggregation for common aggregation functions on CPUs

Query Optimizer

  • Major improvements on optimization heuristics
  • Added new learning method: weighted KNN Regression, which is now used by default by HyPE and supports up to 5 features in a feature vector

Other

  • Primary key/foreign key integrity cosntraints
  • Measurement of energy consumption of queries on CPUs
  • Bugfixes