Serverless Analytics
To unlock realtime, serverless analytics simply CREATE SCHEMA analytics; on Hydra. Behind the scenes, Hydra uses DuckDB to run isolated serverless processing on Postgres’ analytics schema. DuckDB is a fast, in-process SQL database for analytics. Hydra is built on pg _duckdb - an open source project we built to integrate DuckDB’s execution and features with Postgres automatically.
Do I have to know how to use DuckDB?
No. Using Hydra is identical to using standard Postgres — Hydra abstracts away the DuckDB execution details. However, your friends will be impressed when you can skillfully explain Hydra internals during game night! 🎲
Serverless analytical processing is enabled automatically when needed. It runs whenever you use your analytics schema.
If you’d like our help or have questions, post a quick question in discord! That’s the easiest place to find our engineering, sales, and founders.
If you’d like to learn see how DuckDB executed the query, run a normal EXPLAIN query like EXPLAIN SELECT * FROM foo
on a table in the analytics schema. The DuckDB execution plan will be present anytime DuckDB is in use. Here’s an example Hydra explain plan:
Read Scaling Replicas
Hydra employs light weight compute orchestration so read-only users can avoid resource contention. Read scaling tokens will transparently access up to 4X compute resources to better handle concurrent query workloads. The result is an improved end user experience on your analytics schema.
Hydra accounts now support scaling up to 4 replicas of your database. When connecting with a read scaling token, each concurrent end user connects to a read scaling replica of the database that is served by its own process. Beyond this limit, applications can gracefully degrade by having multiple end users be served by the same process.
Create read scaling replicas
To add read scaling replicas to your project please email support@hydra.so.
Permissions and limitations
A read scaling token grants permission for read operations, such as querying tables, but restricts write operations, including:
-
Updating tables
-
Creating new databases
-
Attaching or detaching databases
How a client uses read scaling replicas
When a client connects to Hydra with a read scaling replica:
-
It is assigned to one of the read scaling replicas for the user account.
-
This is in addition to the standard read-write processing that is used normally.
These replicas are eventually consistent, meaning data from read operations may briefly lag behind the latest writes.
Why is Hydra so ducking fast?
Hydra runs pg_duckdb an extension that enables DuckDB’s high performance analytics engine in Postgres. DuckDB stands on the shoulders of giants and draws components and inspiration from open source projects and scientific publications.
-
These Rows Are Made for Sorting and That’s Just What We’ll Do (ICDE 2023)
-
Join Order Optimization with (Almost) No Statistics (Master thesis, 2022)
-
DuckDB-Wasm: Fast Analytical Processing for the Web (VLDB 2022 Demo)
-
Data Management for Data Science - Towards Embedded Analytics (CIDR 2020)
-
DuckDB: an Embeddable Analytical Database (SIGMOD 2019 Demo)
To efficiently support this workload, it is critical to reduce the amount of CPU cycles that are expended per individual value. The state of the art in data management to achieve this are either vectorized or just-in-time query execution engines. DuckDB contains a columnar-vectorized query execution engine, where queries are still interpreted, but a large batch of values (a “vector”) are processed in one operation. This greatly reduces overhead present in traditional systems such as standard PostgreSQL, MySQL or SQLite which process each row sequentially. Vectorized query execution leads to far better performance in OLAP queries.
Here is an overview of components and scientific publications which have inspired DuckDB’s design:
-
Execution engine: The vectorized execution engine is inspired by the paper MonetDB/X100: Hyper-Pipelining Query Execution by Peter Boncz, Marcin Zukowski and Niels Nes.
-
Optimizer: DuckDB’s optimizer draws inspiration from the papers Dynamic programming strikes back by Guido Moerkotte and Thomas Neumann as well as Unnesting Arbitrary Queries by Thomas Neumann and Alfons Kemper.
-
Concurrency control: Our MVCC implementation is inspired by the paper Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems by Thomas Neumann, Tobias Mühlbauer and Alfons Kemper.
-
Secondary indexes: DuckDB has support for secondary indexes based on the paper The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases by Viktor Leis, Alfons Kemper and Thomas Neumann.
-
SQL window functions: DuckDB’s window functions implementation uses Segment Tree Aggregation as described in the paper Efficient Processing of Window Functions in Analytical SQL Queries by Viktor Leis, Kan Kundhikanjana, Alfons Kemper and Thomas Neumann.
-
SQL inequality joins: DuckDB’s inequality join implementation uses the IEJoin algorithm as described in the paper Lightning Fast and Space Efficient Inequality Joins Zuhair Khayyat, William Lucia, Meghna Singh, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, Nan Tang and Panos Kalnis.
-
Compression of floating-point values: DuckDB supports the multiple algorithms for compressing floating-point values:
-
Chimp by Panagiotis Liakos, Katia Papakonstantinopoulou and Yannis Kotidi
-
Patas, an in-house development, and
-
ALP (adaptive lossless floating-point compression) by Azim Afroozeh, Leonard Kuffo and Peter Boncz, who also contributed their implementation.
-
-
SQL Parser: We use the PostgreSQL parser that was repackaged as a stand-alone library. The translation to our own parse tree is inspired by Peloton.
-
Shell: We use the SQLite shell to work with DuckDB.
-
Regular expressions: DuckDB uses Google’s RE2 regular expression engine.
-
String formatting: DuckDB uses the fmt string formatting library.
-
UTF wrangling: DuckDB uses the utf8proc library to check and normalize UTF8.
-
Collation and time: DuckDB uses the ICU library for collation, time zone, and calendar support.
-
Test framework: DuckDB uses the Catch2 unit test framework.
-
Test cases: We use the SQL Logic Tests from SQLite to test DuckDB.
-
Result validation: Manuel Rigger used his excellent SQLancer tool to verify DuckDB result correctness.
-
Query fuzzing: We use SQLsmith to generate random queries for additional testing.
-
JSON parser: We use yyjson, a high performance JSON library written in ANSI C, to parse JSON in DuckDB’s JSON Extension.
Limitations
Navigate to the known limitations section of the pg_duckdb repo.