Storage
Connect Hydra with a new or existing object storage bucket to begin querying Parquet, CSV, Iceberg files.
Hydra Globally-distributed Storage (beta)
Hydra offers a globally distributed S3-compatible object storage service that provides low latency anywhere in the world. Hydra enables you to quickly and easily store and access any amount of data for a wide range of use cases. Hydra automatically distributes your data close to the users, and removes the complexities of data replication, and caching. As Hydra supports the S3 API, you can use the wide range of available S3 tools, libraries, and extensions. Please visit our complete documentation on Hydra Globally-Distributed Storage to learn more.
External Data Lake Providers
Hydra allows you to query data in S3, R2, and GCS. To enable a particular data lake, you must provide a credential that has access to the data in question. This is done by adding a secret to the duckdb.secrets
table:
- The
cloud_type
must be one ofR2
,S3
, orGCS
. - For R2, the
r2_account_id
column is also used. For details, see the R2 documentation.
Please note that these secrets are unencrypted. You may wish to restrict access to this table to trusted users.
How to Query Iceberg, Parquet, and CSV files in S3, R2, or GCS
DuckDB allows you to efficiently query the data in your data lake. With Hydra, you can now query that data directly from Postgres, including:
- Write queries with JOINs that combine data from your data lake with data in your Postgres tables, views, and materialized views
- Copy data to your data lake from your Postgres tables
- Write views that encapsulate logic of querying your data lake
Formats
The following formats are supported:
- Parquet - read and write
- CSV - read and write
- Iceberg - read-only
For more information on using these formats, see our documentation.