Changelog
Overall: tinyurl.com/benchHydra
Hydra transforms Postgres into one of the fastest analytics databases on earth.
Database Name | Relative time (lower is better) |
---|---|
Hydra | 1.51 |
ClickHouse (c6a.4XL) | 1.93 |
Snowflake (128x4XL) | 2.71 |
Redshift (serverless) | 2.80 |
TimescaleDB | 16.76 |
AlloyDB | 34.52 |
AWS Aurora | 277.78 |
PostgreSQL (indexes) | 35.20 |
Postgres (no indexes) | 2283 |
Why use the Hydra Package for Postgres? It’s free, local, and takes ~10 seconds to setup. Create an analytics table in Postgres: perfect for events, time series, logs, traces.
python package (pypi)
Installing hydra-cli enables serverless analytics, Hydra Deep Storage, and more.
By separating compute from storage, Hydra enables bottomless storage for Postgres. Fully integrated columnstore with autogrow, advanced compression, automatic caching, and multi-node reads.
Features
- Storage Auto Grow - Never worry about running out of bottomless analytics storage.
- 10X Data Compression - Data stored in analytics tables benefit from efficient data compression of 5-15X and is ideal for large data volumes. For example: 150GB becomes 15GB with a 10X compression.
- Automatic Caching - Managed within Hydra to enable sub-second analytics, predictably.
- Multi-node Reads - Connect n# of Postgres databases with Hydra to Deep Storage. Ideal for organizations that read events, clicks, traces, time series data from a global source of truth.
By separating compute from storage, Hydra enables compute-isolated analytics. It is designed for low latency applications built on time series and event data.
Features
- Parallel, vectorized execution (benchmark)
Example Performance Results ( Hydra vs. PostgreSQL)
- Compute Autoscale - Hydra scales compute automatically per query to ensure preditably quick efficient processing of analytical queries.
- Isolated Compute Tenancy - Processes run on dedicated compute resources, improving performance by preventing resource contention. This update enhances performance reliability and preditability through scale.
Added
- Support using Postgres indexes and reading from partitioned tables.
- The
AS (id bigint, name text)
syntax is no longer supported when usingread_parquet
,iceberg_scan
, etc. The new syntax is as follows:
- Add a
duckdb.query
function which allows using DuckDB query syntax in Postgres. - Support the
approx_count_distinct
DuckDB aggregate. - Support the
bytea
(aka blob),uhugeint
,jsonb
,timestamp_ns
,timestamp_ms
,timestamp_s
&interval
types. - Support DuckDB json functions and aggregates.
- Add support for the
duckdb.allow_community_extensions
setting. - We have an official logo\\\! 🎉
Changed
- Allow executing
duckdb.raw_query
,duckdb.cache_info
,duckdb.cache_delete
andduckdb.recycle_db
as non-superusers.
Fixed
- Correctly parse parameter lists in
COPY
commands. This allows usingPARTITION_BY
as one of theCOPY
options. - Correctly read cache metadata for files larger than 4GB.
- Fix bug in parameter handling for prepared statements and PL/pgSQL functions.
- Fix comparisons and operators on the
timestamp with timezone
field by enabling DuckDB itsicu
extension by default. - Allow using
read_parquet
functions when not using superuser privileges. - Fix some case insensitivity issues when reading from Postgres tables.
Added
- Support for reading Delta Lake storage using the
duckdb.delta_scan(...)
function. - Support for reading JSON using the
duckdb.read_json(...)
function. - Support for multi-statement transactions.
- Support reading from Azure Blob storage.
- Support many more array types, such as
float
,numeric
anduuid
arrays. - Support for PostgreSQL 14.
- Manage cached files using the
duckdb.cache_info()
andduckdb.cache_delete()
functions. - Add
scope
column toduckdb.secrets
table. - Automatically install and load known DuckDB extensions when queries use them. So,
duckdb.install_extension()
is usually not necessary anymore.
Changed
- Improve performance of heap reading.
- Bump DuckDB version to 1.1.3.
Fixed
- Throw a clear error when reading partitioned tables (reading from partitioned tables is not supported yet).
- Fixed crash when using
CREATE SCHEMA AUTHORIZATION
. - Fix queries inserting into DuckDB tables with
DEFAULT
values. - Fixed assertion failure involving recursive CTEs.
- Much better separation between C and C++ code, to avoid memory leaks and crashes (many PRs).
SELECT
queries executed by the DuckDB engine can directly read Postgres tables. (If you only query Postgres tables you need to runSET duckdb.force_execution TO true
, see the IMPORTANT section above for details)- Able to read data types that exist in both Postgres and DuckDB.
- If DuckDB cannot support the query for any reason, execution falls back to Postgres.
- Read and Write support for object storage (AWS S3, Azure, Cloudflare R2, or Google GCS):
- Read parquet, CSV and JSON files:
SELECT n FROM read_parquet('s3://bucket/file.parquet') AS (n int)
SELECT n FROM read_csv('s3://bucket/file.csv') AS (n int)
- You can pass globs and arrays to these functions, just like in DuckDB
- Enable the DuckDB Iceberg extension using
SELECT duckdb.install_extension('iceberg')
and read Iceberg files withiceberg_scan
. - Write a query — or an entire table — to parquet in object storage.
COPY (SELECT foo, bar FROM baz) TO 's3://...'
COPY table TO 's3://...'
- Read and write to Parquet format in a single query COPY ( SELECT count(*), name FROM read_parquet(‘s3://bucket/file.parquet’) AS (name text) GROUP BY name ORDER BY count DESC ) TO ‘s3://bucket/results.parquet’;
- Read parquet, CSV and JSON files:
- Query and
JOIN
data in object storage/MotherDuck with Postgres tables, views, and materialized views. - Create temporary tables in DuckDB its columnar storage format using
CREATE TEMP TABLE ... USING duckdb
. - Install DuckDB extensions using
SELECT duckdb.install_extension('extension_name');
- Toggle DuckDB execution on/off with a setting:
SET duckdb.force_execution = true|false
- Cache remote object locally for faster execution using
SELECT duckdb.cache('path', 'type');
where- ‘path’ is HTTPFS/S3/GCS/R2 remote object
- ‘type’ specify remote object type: ‘parquet’ or ‘csv’