Updates and Deletes
Hydra Columnar tables support updates and deletes, yet remains an append-only datastore. In order to achieve this, Hydra maintains metadata about which rows in the table have been deleted or modified. Modified data is re-written to the end of the table; you can think of an
DELETEfollowed by an
When querying, Hydra will automatically return only the latest version of your data.
To maximize performance of
SELECTqueries, columnar tables should have infrequent updates or deletes. Ideally, data that is updated more frequently is best kept separate from archived data. This allows you to either store this data in row tables, or rewrite the columnar table using
VACUUM FULLwhich will re-optimize the data for reads.
INSERT, each transaction with an
UPDATEquery will write a new stripe. To maximize the size of these stripes, update as much data in a single transaction as possible.
Each updates or deletes query will lock the table, meaning multiple
DELETEqueries on the same table will be executed serially (one at a time).
UPDATEqueries rewrite any rows that are modified, and thus are relatively slow.
DELETEqueries only modify metadata and thus complete quite quickly.
Deleting data does not reclaim space, and updated data will also leave behind deleted data (the previous version of the data). In order to reclaim space and re-compress your data, you must rewrite the table using
VACUUM FULL. Note that
VACUUM FULLrequires a full table lock and can take considerable time for larger tables.
Future versions of Hydra Columnar will address this shortcoming.
Hydra Columnar updates and deletes will meet the isolation level requested for your current transaction (the default is
READ COMMITTED). Keep in mind that an
UPDATEquery is implemented as a
DELETEfollowed by an
INSERT. As new data that is inserted in one transaction can appear in a second transaction in
READ COMMITTED, this can affect concurrent transactions even if the first transaction was an
UPDATE. While this satisfies
READ COMMITTED, it may result in unexpected behavior. This is also possible in heap (row-based) tables, but heap tables contain additional metadata that limit the impact of this case.
For stronger isolation guarantees,
REPEATABLE READis recommended. In this isolation level, your transaction will be cancelled if it references data that has been modified by another transaction. In this case, your application should be prepared to retry the transaction.