Introducing CedarDB

published on 2024/05/30

Despite configuring PostgreSQL to use as many CPU cores as it wants, it only makes use of 10 cores at a time. In contrast, CedarDB is able to utilize all CPU cores from the very first to the very last second of the query (even in cases where it is not the same second). Although important, parallel algorithms and data structures alone are not enough. The available CPUs also need to be distributed fairly between tasks. Our scheduler automatically adapts to new tasks coming in and scales the degree of parallelism of older tasks up or down as needed. This ensures that short-running and interactive tasks can complete quickly without starving complex long-running tasks.

CedarDB

Another day, another new DB.

This section The Techonology behind CedarDB is interesting

Modern servers usually come with a few dozen to a hundred CPU cores. Database systems have traditionally used inter-query-parallelism to make use of all cores (i.e., each query is executed by its own core). All modern database systems also use intra-query-parallelism (i.e., each query is processed by multiple cores in parallel), which is especially important when issuing few but very compute-intensive analytical queries. However, even this approach quickly runs into Amdahl’s Law: The more cores one has available, the harder it becomes to keep all of them busy.

CedarDB innovates by implementing morsel-driven parallelism. It divides each query into many morsels of a few thousand tuples each. Whenever a CPU core is done with its current job, it grabs a new morsel, i.e., a small chunk of data waiting to be processed. Since many more morsels are waiting to be processed than CPU cores, CedarDB can ensure that all cores stay busy until the end, as shown in the above picture.