Storage
Hydra employs both Postgres storage on-disk as well as Amazon EFS. On-disk storage is use for Postgres’ rowstore “heap” tables and EFS is used for the analytics schema’s columnstore.
Features
Bottomless storage
Events, time-series, user sessions, click, logs, IOT sensor readings, etc. generate a lot of data over time. While on-disk storage works well for Postgres’ rowstore, known as “heap” tables, it’s a poor choice for fast growing data that requires analysis. To avoid the scale limit of on-disk storage, Hydra separates compute and storage.
The benefit of separating storage and compute is the ability to scale compute and storage resources independently. As compute needs peak, only CPUs are deployed. As your storage needs increase, only the storage footprint increases.
To accomplish this, the analytics schema sets on a FUSE-based filesystem which keeps track of metadata indicating which data blocks are live and where to find them. The storage works by aggregating writes into a single file - when a block is overwritten, it is marked stale and the metadata is updated to point to the new location for the data. When reads are executed, the metadata points the query to the block location and performs a direct read. Since full pages are sized 256KB and always flushed this performs well. When writing large blocks of consecutive data it works great because they can be tracked using ranges rather than individual blocks.
On-disk storage (rowstore)
Standard Postgres heap tables are stored on-disk up to 500GB. To learn about how continuous backups and point-in-time recovery, navigate to our backups documentation.