pg_duckdb, an open-source Postgres extension that embeds DuckDB's analytics engine into Postgres

published on 2024/08/17

Postgres is great at a lot of things, but if you try to use it for analytics, you hit a wall pretty quickly. That is, it is great at creating, finding and locating individual rows, but if you want to understand what is going on in a data set, it can be painfully slow. For example, you might want to know how revenue is growing in the Netherlands, or how many of your customers have names that rhyme with “Duck.” These are analytical queries and often require separate ways of storing and processing the data to operate efficiently.

People have tried to add Band-Aids to improve Postgres analytical performance. but they haven’t been particularly successful because being good at analytics requires different techniques for running your queries, like being able to operate over batches of rows at once, and avoiding decompressing data until it is absolutely needed. And typically, that takes a purpose-built analytical engine, which takes a ton of effort.

This is where DuckDB comes in. DuckDB is an in-process OLAP database and uses a vectorized query engine to process chunks (vectors) of data at a time. This makes it valuable for answering analytical questions about what is going on in the data. DuckDB’s popularity has been soaring due to its speed, ease of use, and versatility.

motherduck

It's still early but this announcement is really exciting.

We’re announcing early, with the intention of building in the open with a public roadmap. The pg_duckdb extension is fully usable to query over data in a data lake, to run analytical queries over Postgres, and to store data in a local DuckDB database.