Storage
Hydra employs on-disk as well as elastic file storage. Postgres’ rowstore “heap” tables are on-disk and the analytics columnstore sits on file storage.
Features
Auto-grow
Auto-grow
Never worry about running out of bottomless analytics storage
Data compression
Data compression
Data stored in analytics tables benefit from efficient data compression of 5-15X and is ideal for large data volumes. For example: 150GB becomes 15GB with a 10X compression.
Automatic caching
Automatic caching
fully managed within Hydra to enable sub-second analytics.
Zero-copy snapshots & forks
Zero-copy snapshots & forks
Zero-copy snapshots enable data sharing with additional teams in your organization. Hydra’s serverless processing guarantees that these different users can access the analytics schema concurrently without sharing compute resources.
Easy to perform joins between the row and columnstore
Easy to perform joins between the row and columnstore
Build richer apps and analytics when combining application data, user sessions, logs, timeseries and events - all available inside Postgres when using Hydra.
Bottomless storage
Events, time-series, user sessions, click, logs, IOT sensor readings, etc. generate a lot of data over time. While on-disk storage works well for Postgres’ rowstore, known as “heap” tables, it’s a poor choice for fast growing data that requires analysis. To avoid the scale limit of on-disk storage, Hydra separates compute and storage.
The benefit of separating storage and compute is the ability to scale compute and storage resources independently. As compute needs peak, only CPUs are deployed. As your storage needs increase, only the storage footprint increases.
To accomplish this, analytics tables sit on a FUSE-based filesystem which keeps track of metadata indicating which data blocks are live and where to find them. The storage works by aggregating writes into a single file - when a block is overwritten, it is marked stale and the metadata is updated to point to the new location for the data. When reads are executed, the metadata points the query to the block location and performs a direct read. Since full pages are sized 256KB and always flushed this performs well. When writing large blocks of consecutive data it works great because they can be tracked using ranges rather than individual blocks.
On-disk storage (rowstore)
Standard Postgres heap tables are stored on-disk up to 1 TB. For new instances Hydra has an initial on-disk storage “soft limit” of 500 GB, which can be raised by contacting support@hydra.so. To learn about how continuous backups and point-in-time recovery, navigate to our backups documentation.