Data Lake Example
Connect Your Data Lake
Hydra supports AWS S3, Cloudflare R2, or Google GCS buckets.
-
Add a credential to enable DuckDB’s httpfs support.
-
Copy data directly from Postgres to your bucket - no ETL pipeline!
-
Perform analytics on your data. Example:
Reading a Parquet file
The following query uses pg_duckdb
to query Parquet files stored in S3 to find the top TV shows in the US during 2020-2022.
Reading an Iceberg table
In order to query against data in Iceberg, you first need to install the DuckDB Iceberg extension. In pg_duckdb
, installing duckdb extensions is done using the duckdb.install_extension(extension_name)
function.
Writing back to your Data Lake
Access to Data Lakes is not just read-only in pg_duckdb
, you can also write back by using the COPY
command. Note that you can mix and match native PostgreSQL data, so you can use this to export from your PostgreSQL tables to external Data Lake storage.
This opens up many possibilities for performing the following operations directly in PostgreSQL:
- Query existing data from a Data Lake
- Back up specific PostgreSQL tables to an object store
- Import data from the Data Lake to support operational applications.