Control Flow Duplication for Columnar Arrays in a Dynamic Compiler
Sprache des Titels:
The Art, Science, and Engineering of Programming, 2023, Vol. 7, Issue 3, Article 9
Columnar databases are an established way to speed up online analytical processing (OLAP) queries. Nowadays, data processing (e.g., storage, visualization, and analytics) is often performed at the programming language level, hence it is desirable to also adopt columnar data structures for common language runtimes.
Our work shows that automatic transformation of arrays to columnar storage is feasible even for small workloads and that more complex arrays of objects could benefit from a multi-level transformation. Furthermore, we show how we can optimize methods that handle arrays in different states by the use of duplication. We evaluated our work on microbenchmarks and established data analytics workloads (TPC-H) to demonstrate that it significantly outperforms previous efforts, with speedups of up to 10x for particular queries. Queries additionally benefit from multi-level transformation, reaching speedups of around 2x. Additionally, we show that we do not cause significant overhead on workloads not suitable for storage transformation.
We argue that automatically created columnar arrays could aid developers in data-centric applications as an alternative approach to using dedicated APIs on manually created columnar arrays. Via automatic detection and optimization of queries on potentially columnar arrays, we can improve performance of data processing and further enable its use in common?particularly dynamic?programming languages.