Hydra is powered by pg_duckdb, an open source program (MIT license) which embeds DuckDB’s analytics engine and features into Postgres. pg_duckdb was co-developed and maintained by Hydra, DuckDB Labs, and MotherDuck.

If you think pg_duckdb is neat, please consider adding your friendly Github star ⭐ to the pg_duckdb repository!

Run with Docker

Docker images are available on Dockerhub and are based on the official Postgres image. Use of this image is the same as the Postgres image. For example, you can run the image directly:

docker run -d -e POSTGRES_PASSWORD=duckdb pgduckdb/pgduckdb:16-main

Or you can use the docker compose in this repo:

git clone https://github.com/duckdb/pg_duckdb && cd pg_duckdb && docker compose up -d

Once started, connect to the database using psql:

psql postgres://postgres:duckdb@127.0.0.1:5432/postgres
# Or if using docker compose
docker compose exec db psql

For other usages see our Docker specific README.

Compile from Source

To build pg_duckdb, you need:

To build and install, run:

make install

Add pg_duckdb to the shared_preload_libraries in your postgresql.conf file:

shared_preload_libraries = 'pg_duckdb'

Next, create the pg_duckdb extension:

CREATE EXTENSION pg_duckdb;

Sample Queries

-- Remote Parquet scans:
SELECT count(*) FROM read_parquet('https://shell.duckdb.org/data/tpch/0_01/parquet/orders.parquet') AS (o_orderkey int);
SELECT avg(c_acctbal) FROM read_parquet('https://shell.duckdb.org/data/tpch/0_01/parquet/customer.parquet') AS (c_acctbal float);
SELECT count(*)::int as aws_service_cnt FROM read_parquet('https://raw.githubusercontent.com/tobilg/aws-iam-data/main/data/parquet/aws_services.parquet') AS (service_id int);
SELECT code, city, state FROM read_parquet('https://raw.githubusercontent.com/tobilg/aws-edge-locations/main/data/aws-edge-locations.parquet') AS (code text, city text, state text);
SELECT cloud_provider, sum(ip_address_cnt)::int as cnt FROM read_parquet('https://raw.githubusercontent.com/tobilg/public-cloud-provider-ip-ranges/main/data/providers/all.parquet') AS (cloud_provider text, ip_address_cnt int) GROUP BY cloud_provider;

-- Remote CSV scan
SELECT * FROM read_csv('https://raw.githubusercontent.com/tobilg/public-cloud-provider-ip-ranges/main/data/providers/all.csv') AS (cloud_provider text, cidr_block text, ip_address text) limit 30;

pg_duckdb Roadmap

Please view our milestones on the pg_duckdb repo for what’s planned for future releases.