Do I have to know how to use DuckDB?

No. Using Hydra is identical to using standard Postgres. Once DuckDB execution is “on” Hydra abstracts away the DuckDB execution details. However, your friends will be impressed when you can skillfully explain Hydra internals during game night! 🎲

DuckDB execution is usually enabled automatically when needed. It’s enabled whenever you use DuckDB functions (such as read_parquet), when you query DuckDB tables, and when running COPY table TO 's3://...'. However, if you want queries which only touch Postgres tables to use DuckDB execution you need to run SET duckdb.force_execution TO true’. This feature is opt-in to avoid breaking existing queries. To avoid doing that for every session, you can configure it for a certain user by doing ALTER USER my_analytics_user SET duckdb.force_execution TO true.

If you’d like our help or have questions, post a quick question in discord! That’s the easiest place to find our engineering, sales, and founders.

If you’d like to learn see how DuckDB executed the query, run a normal EXPLAIN query like EXPLAIN SELECT * FROM foo. The DuckDB execution plan will be present anytime DuckDB is in use. Here’s an example Hydra explain plan:

Why is embedded DuckDB so ducking fast?

DuckDB stands on the shoulders of giants and draws components and inspiration from open source projects and scientific publications. For a deep dive on DuckDB, we’d recommend reading DuckDB’s peer-reviewed papers and thesis works:

To efficiently support this workload, it is critical to reduce the amount of CPU cycles that are expended per individual value. The state of the art in data management to achieve this are either vectorized or just-in-time query execution engines. DuckDB contains a columnar-vectorized query execution engine, where queries are still interpreted, but a large batch of values (a “vector”) are processed in one operation. This greatly reduces overhead present in traditional systems such as standard PostgreSQL, MySQL or SQLite which process each row sequentially. Vectorized query execution leads to far better performance in OLAP queries.

Here is an overview of components and scientific publications which have inspired DuckDB’s design: