Views
Views can be a useful way to encapsulate querying your data lake, or complex queries that combine your local Postgres data with your data lake. Views you define in Postgres are supported by DuckDB and can query your data lake.
Tutorial
I want to run some analytics on my sales data, as well as combine that with new transactions that are coming in. In this
example, I will have active data in public.sales
, then data store in my data lake referred to in the data_lake
schema, and finally an analytics
schema that has data combined from both sources.
First, lets create our schemas:
The next step is to copy my sales
data to S3, and create a view to encapsulate querying it:
Now, lets store the maximum ID was that was copied into a table so we can use it in our next view:
I can now create a view that combines the data to get all sales in real-time:
Using this view, I can query this data, and join it with other Postgres tables as well, like my skus
table:
Using this pattern, you can update your data lake with new data on a daily, weekly, or monthly basis.
Materialized Views
Materialized views can be populated via DuckDB, and will follow normal execution rules. If a materialized view query uses a DuckDB function, it will use DuckDB. Otherwise, a materialized view will use the current execution setting at the time the materialized view is created or refreshed. Once materialized, materialized views are identical to Postgres materialized views.