Table statistics #495
Replies: 4 comments
-
There should be an option for a metastore, this should be a KV store so the metadata for a file can be fetched with 0(1) complexity. A local and remote option should be available. Local would be something like rocksdb, remote Mongo or FireStore. We should maintain a copy of the metadata in memory in an LRU cache, we can do that as easy as adding the decorator to the call. |
Beta Was this translation helpful? Give feedback.
-
Statistics should be used initially as a BRIN and to shortcut basic stats - recording min, max, unique count, sum and null count. Don't spend time doing bloom filters, hyperloglog or t-digest until there is cost-based query optimization |
Beta Was this translation helpful? Give feedback.
-
A background thread should receive stats extracted from the files and save them to a KV store. It probably won't be efficient to do anything more than basic min/max/nulls in real-time to create stats for files which don't have them. ANALYZE TABLE should create stats for files which don't already have them to create more comprehensive stats. |
Beta Was this translation helpful? Give feedback.
-
The statistics will initially be used for pruning files, that is if the query has a sargable condition, we should be determine if the file can possibly have any matching records by checking if the value(s) being filtered for are within the min and max values. This should improve query performance, for some types of queries, by avoiding reading and handling data. |
Beta Was this translation helpful? Give feedback.
-
Mabel should write statistics for all parquet file to do this, stats building needs to be faster.
Stats could sit in the parquet file, whilst this won't avoid reading the file, it will avoid parsing the data in it. There should be a statistics cache so stats can be quickly read.
Stats should include raw form hyperloglog and tdigests so these can be combined at planning time to estimate cardinality and distribution across af hoc combinations of blobs.
Beta Was this translation helpful? Give feedback.
All reactions