Comment on page
Optimizing Query Performance
The columnar store makes extensive use of compression, allowing for your data to be stored efficiently on disk and in memory as compressed data.
While this is very helpful, in some circumstances a long running analytical query can retrieve and decompress the same columns on a recurring basis. In order to counter that, a column caching mechanism is available to store uncompressed data in-memory, up to a specified amount of memory per worker.
These uncompressed columns are then available for the lifetime of a
SELECTquery, and are managed by the columnar store directly. NOTE that this cache is not used for
INSERT, and are released after the
SELECTquery is complete.
Caching is be enabled or disabled with a GUC:
columnar.enable_column_cache. By default, the cache is enabled on Hydra Cloud, and disabled in Hydra Open Source. For queries where the cache is not helpful, you may find slightly better performance by disabling the cache.
-- enable the cache
set columnar.enable_column_cache = 't';
-- disable the cache
set columnar.enable_column_cache = 'f';
In addition, the cache size can be set, with a default of
set columnar.column_cache_size = '2000MB';
This size can range between
20000MB, and is consumed by each parallel process. This means that if you have
8parallel processes and set the cache size to
2000MB, then a query can consume up to
8 * 2000000bytes, or up to
16GBof RAM, so it is very important to choose a cache size that is appropriate for your workload.
Vacuuming tables will optimize tables that have had many inserts, updates, or deletes. There are three levels of vacuum on columnar tables:
VACUUM tablewill rewrite recently created stripes into optimal stripes of maximum size. If making many small inserts or updates into a columnar table, run
VACUUMfrequently to optimize the table.
SELECT columnar.vacuum_all()will reclaim space from updated and deleted rows, writing new optimal stripes and compact the amount of space the table occupies.
VACUUM FULL tablewill do a full rewrite of the table making fully optimized stripes. This operation is very expensive but produces the best results.
You have the ability to use indexes with columnar storage, but it will not behave in the same way that you have come to expect from a heap (row based) storage solution. Under most cases, the plan that Postgres chooses will be correct, and we've worked hard to ensure that when using the columnar storage backend that the plan chosen will be the best one for accessing columnar data.
This is not always the case, so in addition to the normal per query adjustments that you can make with Columnar and Postgres (
enable_seqscan, etc), you can also use pg_hint_plan to help alter the query plan and make use of any indexes that you have created. Note that some index types are not supported with the columnar storage engine, such as bitmap or BRIN indexes, and some will be slower than the highly optimized query plan that the columnar engine provides you.
pg_hint_planuses a comment syntax to allow you to attempt to alter the created plan wherever possible:
EXPLAIN ANALYZE /*+ IndexScan(hits) */
SELECT url, COUNT(*)
In this case, anything within the comment block
*/is interpreted as guidance for
Some of the common hints include:
SeqScan(tablename)- force a sequential scan on the table listed
IndexScan(tablename[indexname])- force the use of an index scan on a specific table, and optionally an index to use by name
IndexOnlyScan(tablename[indexname])- force the use of an index only scan on a specific table, and optional an index to use by name