Updates and Deletes
Hydra Columnar tables support updates and deletes, yet remains an append-only datastore. In order to achieve this, Hydra maintains metadata about which rows in the table have been deleted or modified. Modified data is re-written to the end of the table; you can think of an
DELETEfollowed by an
When querying, Hydra will automatically return only the latest version of your data.
To maximize performance of
SELECTqueries, columnar tables should have a maximum of data in every stripe. Like an
INSERT, each transaction with an
UPDATEquery will write a new stripe. To maximize the size of these stripes, update as much data in a single transaction as possible. Alternatively, you can run
VACUUMon the table which will combine the most recent stripes into a combined stripe of maximum size.
Each updates or deletes query will lock the table, meaning multiple
DELETEqueries on the same table will be executed serially (one at a time).
UPDATEqueries rewrite any rows that are modified, and thus are relatively slow.
DELETEqueries only modify metadata and thus complete quite quickly.
The columnar store provides multiple methods to vacuum and clean up tables. Among these are the standard
VACUUM FULLfunctionality, but also provided are UDFs, or User Defined Functions that help for incremental vacuuming of large tables or tables with many holes.
Vacuuming requires an exclusive lock while the data that is part of the table is reorganized and consolidated. This can cause other queries to pause until the vacuum is complete, thus stalling other activity in the database.
Using the vacuum UDF, you can specify the number of
stripesto consolidate in order to lower the amount of time where a table is locked.
SELECT columnar.vacuum('mytable', 25);
Using the optional stripe count argument, a vacuum can be performed incrementally. A value will return of how many stripes were modified. Calling this function repeatedly is fine, and it will continue to vacuum the table until there is nothing more to do, and will return
0as the count.
In addition, you are provided a convenience function that can vacuum a whole schema, pausing between each vacuum and table to unlock and allow for other queries to continue on the database.
By default, this will vacuum the
publicschema, but there are other options to control the vacuuming process.
SELECT columnar.vacuum_full(sleep_time => 0.5);
SELECT columnar.vacuum_full(stripe_count => 1000);
Hydra Columnar updates and deletes will meet the isolation level requested for your current transaction (the default is
READ COMMITTED). Keep in mind that an
UPDATEquery is implemented as a
DELETEfollowed by an
INSERT. As new data that is inserted in one transaction can appear in a second transaction in
READ COMMITTED, this can affect concurrent transactions even if the first transaction was an
UPDATE. While this satisfies
READ COMMITTED, it may result in unexpected behavior. This is also possible in heap (row-based) tables, but heap tables contain additional metadata that limit the impact of this case.
For stronger isolation guarantees,
REPEATABLE READis recommended. In this isolation level, your transaction will be cancelled if it references data that has been modified by another transaction. In this case, your application should be prepared to retry the transaction.