Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add iceberg_tables table function #24469

Merged

Conversation

lukasz-stec
Copy link
Member

Description

Add system.iceberg_tables table function

Add the ability to list only iceberg tables from the iceberg catalog.
Before this change, there was no way to list only iceberg tables.
The SHOW TABLES statement, information_schema.tables, and jdbc.tables will all
return all tables that exist in the underlying metastore, even if the table cannot
be handled in any way by the iceberg connector. This can happen if other connectors
like hive or delta, use the same metastore, catalog, and schema to store its tables.
The function accepts an optional parameter with the schema name.
Sample statements:

SELECT * FROM TABLE(iceberg.system.iceberg_tables());
SELECT * FROM TABLE(iceberg.system.iceberg_tables(SCHEMA_NAME => 'test'));

Additional context and related issues

#11617 removed the non-iceberg table filter from TrinoHiveCatalog
#21114 removed the getTablesWithParameter metastore API that is re-added here

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(X ) Release notes are required, with the following suggested text:

## Iceberg connector
* Add `system.iceberg_tables` table function to the iceberg connector to allow listing only iceberg tables

@cla-bot cla-bot bot added the cla-signed label Dec 13, 2024
@github-actions github-actions bot added iceberg Iceberg connector hive Hive connector labels Dec 13, 2024
@lukasz-stec lukasz-stec changed the title Ls/2412/04 iceberg tables function Iceberg tables table function Dec 13, 2024
@lukasz-stec lukasz-stec changed the title Iceberg tables table function Add iceberg_tables table function Dec 13, 2024
@lukasz-stec lukasz-stec force-pushed the ls/2412/04-iceberg-tables-function branch 2 times, most recently from 8d77152 to 10160e7 Compare December 13, 2024 15:10
@lukasz-stec lukasz-stec marked this pull request as ready for review December 13, 2024 19:17
Copy link
Member

@raunaqmorarka raunaqmorarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

@lukasz-stec lukasz-stec force-pushed the ls/2412/04-iceberg-tables-function branch from 73ae38d to aca9303 Compare December 17, 2024 09:35
Copy link
Member Author

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CA

{
requireNonNull(parameterKey, "parameterKey is null");
requireNonNull(parameterValues, "parameterValue is null");
return doListAllTables(databaseName, table -> parameterValues.contains(table.getParameters().get(parameterKey))).stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to handle the case where table.getParameters().get(parameterKey) is null separately ? That potentially is nullptr exception

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the type of the parameterValues to ImmutableSet to avoid this problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about how changing the type helps.
I was thinking about the case where given parameter key does not exist in table parameters. I think the safe thing to do would be to return empty list of table names in that case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Set can contain null values so theoretically an API with Set could mean a caller would like to filter for null or non-existent values. ImmutableSet cannot contain null, and additionally, it is safe to check for null existence there. This means that ImmutableSet better describes the API here which only supports non-null values for the filter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems a subtle distinction, it would be easy for users of the API to miss it.
Can we just handle that corner case safely instead ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can they miss it? With ImmutableSet there is no way to pass the wrong argument

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't pass in a key for which there is no entry in table parameters ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can, that is actually the point of this to return only those that have it. But that works ok becasue the parameters values cannot contain null thus potential mistake to try to filter by null value will just not work and it is safe to check for null element becasue ImmutableSet as all immutable collections explicitly allows for null argument to com.google.common.collect.ImmutableCollection#contains.

public abstract boolean contains(@CheckForNull Object object);

return getTablesInternal(
_ -> {},
databaseName,
table -> table.parameters() != null && parameterValues.contains(table.parameters().get(parameterKey))).stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to handle the case where table.parameters().get(parameterKey) is null separately ? That potentially is nullptr exception

@lukasz-stec lukasz-stec force-pushed the ls/2412/04-iceberg-tables-function branch from aca9303 to a31b92c Compare December 18, 2024 18:22
Copy link
Member Author

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CA

{
requireNonNull(parameterKey, "parameterKey is null");
requireNonNull(parameterValues, "parameterValue is null");
return doListAllTables(databaseName, table -> parameterValues.contains(table.getParameters().get(parameterKey))).stream()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the type of the parameterValues to ImmutableSet to avoid this problem.

@lukasz-stec lukasz-stec force-pushed the ls/2412/04-iceberg-tables-function branch from a31b92c to 44a1c84 Compare December 18, 2024 20:52
Copy link
Member

@raunaqmorarka raunaqmorarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please resolve conflicts

The dropMaterializedView on TrinoHiveCatalog that uses FileHiveMetastore
works only if unique table locations are enabled.
TrinoCatalog.listTables returns not only iceberg tables but also
other relations like views, materialized view or non-iceberg tables.
@lukasz-stec lukasz-stec force-pushed the ls/2412/04-iceberg-tables-function branch from 44a1c84 to bfeae5e Compare December 20, 2024 12:02
@lukasz-stec
Copy link
Member Author

please resolve conflicts

done

Add the ability to list only iceberg tables from the iceberg catalog.
Before this change, there was no way to list only iceberg tables.
The SHOW TABLES statement, information_schema.tables, and jdbc.tables will all
return all tables that exist in the underlying metastore, even if the table cannot
be handled in any way by the iceberg connector. This can happen if other connectors
like hive or delta, use the same metastore, catalog, and schema to store its tables.
The function accepts an optional parameter with the schema name.
Sample statements:
SELECT * FROM TABLE(iceberg.system.iceberg_tables());
SELECT * FROM TABLE(iceberg.system.iceberg_tables(SCHEMA_NAME => 'test'));
@lukasz-stec lukasz-stec force-pushed the ls/2412/04-iceberg-tables-function branch from bfeae5e to 18647ba Compare December 20, 2024 12:44
@raunaqmorarka raunaqmorarka merged commit 5ce80be into trinodb:master Dec 23, 2024
62 checks passed
@raunaqmorarka raunaqmorarka deleted the ls/2412/04-iceberg-tables-function branch December 23, 2024 08:32
@raunaqmorarka
Copy link
Member

Please work on adding documentation as a follow-up

@github-actions github-actions bot added this to the 469 milestone Dec 23, 2024
@ebyhr
Copy link
Member

ebyhr commented Dec 23, 2024

This PR should have gotten syntax reviews from @martint as I talked with @raunaqmorarka offline.

@lukasz-stec Could you check CI failures on master? https://github.com/trinodb/trino/actions/runs/12463782682/job/34786781881

@lukasz-stec
Copy link
Member Author

@lukasz-stec Could you check CI failures on master? https://github.com/trinodb/trino/actions/runs/12463782682/job/34786781881

Those tests were not run as part of this PR, missing label probably. I will need to fix them.

@lukasz-stec
Copy link
Member Author

@lukasz-stec Could you check CI failures on master? https://github.com/trinodb/trino/actions/runs/12463782682/job/34786781881

Those tests were not run as part of this PR, missing label probably. I will need to fix them.

I created #24561. @ebyhr @raunaqmorarka PTAL + I think it needs to run with secrets as from what I understand this was the reason cloud tests were not run in this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

4 participants