Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Materialized View? #102

Open
nelsonic opened this issue Apr 25, 2023 · 7 comments
Open

Materialized View? #102

nelsonic opened this issue Apr 25, 2023 · 7 comments
Labels
discuss Share your constructive thoughts on how to make progress with this issue documentation Improvements or additions to documentation enhancement New feature or enhancement of existing functionality question A question needs to be answered before progress can be made on this issue research Research required; be specific T25m Time Estimate 25 Minutes technical A technical issue that requires understanding of the code, infrastructure or dependencies

Comments

@nelsonic
Copy link
Member

What is a Materialized view and how is it useful?
The wikipedia page on this is a decent starting point: https://en.wikipedia.org/wiki/Materialized_view

Question

How does this work in Postgres
Is the data duplicated?

Read: https://www.timescale.com/blog/how-postgresql-views-and-materialized-views-work-and-how-they-influenced-timescaledb-continuous-aggregates/

@nelsonic nelsonic added enhancement New feature or enhancement of existing functionality question A question needs to be answered before progress can be made on this issue technical A technical issue that requires understanding of the code, infrastructure or dependencies discuss Share your constructive thoughts on how to make progress with this issue documentation Improvements or additions to documentation research Research required; be specific T25m Time Estimate 25 Minutes labels Apr 25, 2023
@nelsonic
Copy link
Member Author

@LuchoTurtle my reading of the docs suggests that Materialized Views in Postgres are only updated when requested i.e. the data in the view is not real-time. Can you confirm this with a bit of Gooooogling...? 💭

@LuchoTurtle
Copy link
Member

LuchoTurtle commented Apr 28, 2023

Yes, materialized views (as the name implies) are just the results of a query saved/cached somewhere. Postgres will never automatically refresh the materialized view unless explicitly told.

https://stackoverflow.com/questions/29437650/how-can-i-ensure-that-a-materialized-view-is-always-up-to-date

The whole point of materialized views is to have fast queriable data that stem from expensive operations/queries from the database. So the main thing with materialized views is:

  • refresh scheduling.
  • aggregation and "merging" strategies of incoming data on a materialized view. The REFRESH MATERIALIZED VIEW command truncates/removes all the data and re-inserts with the most up-to-date (I think this is even written on the link you've mentioned and the whole point of why they did data aggregation there).

In fact, the link you sent mentions a few strategies of data aggregation so they don't waste the previously materialized data only to be materialized again. They range from sliding window verification of new inserts/updates to simple UNION ALL where only new data is appended to what was previously materialized.

But yes, materialized views are only as accurate as the last time they ran the query they are caching.

@nelsonic
Copy link
Member Author

Cool, that was my reading of the docs. Thanks for Summarising it succinctly, LuchoGPT. 😜

@ndrean
Copy link

ndrean commented Apr 28, 2023

A side note: you have "VIEW", "MATERIALIZED VIEW" and CTE (the "WITH" statement).

"VIEW" and "MATERIALIZED VIEW" are like snapshots, creating virtual tables. With "CREATE VIEW", you do not store data physically, only cached. You can "UPDATE" it. With "MATERIALIZED VIEW", the data is physically stored. To update it, you "REFRESH".

You also have "CTE", the "WITH" statement. This is used for "inline" subqueries, breaking down the main query.

@LuchoTurtle
Copy link
Member

LuchoTurtle commented Apr 29, 2023

@ndrean for sure! Though VIEW does not cache any data, it just abstracts complexity from the query itself. But yeah, I understood what you meant!
CTEs are really useful to organize complex views instead of having queries inside of queries, I second that 👏

@ndrean
Copy link

ndrean commented May 5, 2023

@LuchoTurtle Thanks for your precision, it was essential in fact.

Indeed (and I was wrong..), VIEW queries and CTE are evaluated each time they are referenced, whilst MATERIALIZED VIEW queries are cached (because stored).

Between CTE and VIEW, I understand the main difference is that you can restrict the users' access to only the data from the views - they may not be able to access data in the underlying tables with "GRANT/DENY"- whilst they can access the underlying table when a CTE is used. You can deny modifier access to the underlying table with the "WTH CHECK OPTION"

A (partial) toy example I played with that I hope makes sense:

CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  name VARCHAR(20),
  age INT,
  active BOOLEAN
);

INSERT INTO users (name, age, active) VALUES ('me', 2, true);
...

CTE

A complex "inlined subquery"

WITH active_users AS (
  SELECT name, age
  FROM users
  WHERE active = true
)
SELECT name FROM active_users;

VIEW

A "complex" base data:

CREATE VIEW active_users_v AS
  SELECT name, age
  FROM users
  WHERE active = true;

This first query is executed every time it is referenced, such as:

SELECT name, age FROM active_users_v;

Permissions restriction:

GRANT SELECT ON active_users_v TO lucho;
DENY SELECT ON users TO lucho;

MATERIALIZED VIEW

They are used for performance reasons when some not very frequently changing but complex base data are used frequently in other queries:

Cache "complex" :) base data:

CREATE MATERIALIZED VIEW active_users_mv AS
  SELECT age, name
  FROM users
  WHERE active = true;

If you want to use updated data, use "REFRESH":

REFRESH MATERIALIZED VIEW active_users_mv;

Otherwise, just use the MV:

SELECT name, age FROM active_users_mv;

@ndrean
Copy link

ndrean commented May 8, 2023

  • Also to note that Materialized Views support indexing.

  • About the REFRESH for materialized views coupled with triggers:

To avoid refreshing all these (potentially) costly tables, one may use the Postgres TRIGGERS to update some rows of the "MV" (at the expense of lengthening another query of course).

Screenshot 2023-05-08 at 15 03 30

The previous post references the following post:

Screenshot 2023-05-08 at 15 05 02

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Share your constructive thoughts on how to make progress with this issue documentation Improvements or additions to documentation enhancement New feature or enhancement of existing functionality question A question needs to be answered before progress can be made on this issue research Research required; be specific T25m Time Estimate 25 Minutes technical A technical issue that requires understanding of the code, infrastructure or dependencies
Projects
None yet
Development

No branches or pull requests

3 participants