-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example of interacting with a remote catalog #13722
Conversation
f3730ad
to
5552e4b
Compare
FYI @westonpace |
"Error during planning: table 'datafusion.remote_schema.remote_table' not found" | ||
); | ||
|
||
// Instead, to use a remote catalog, we must use lower level APIs on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From here on down is the API dance required. I believe @westonpace offered to help make a nicer API for this (I am certainly interested)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. I'll take a stab tomorrow, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, tomorrow turned into next week but here we go: #13800
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, tomorrow turned into next week but here we go: #13800
LOL this is the story of my life the last 2 weeks
Co-authored-by: Berkay Şahin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still playing with this but wanted to give a 👍
remote_catalog_interface: Arc<RemoteCatalogInterface>, | ||
/// Local cache of tables that have been preloaded from the remote | ||
/// catalog | ||
tables: Mutex<Vec<Arc<dyn TableProvider>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just an example so it doesn't matter but I'm curious why Vec
and not HashMap
which would be my default thought (e.g. is this to keep the example simple or am I missing some interesting knowledge).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was likely being pedantic here and trying to avoid storing the table name twice (once in the Catalog and once on the Table). I think perhaps that is just more confusing -- I'll change this to use HashMap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 69723c4 -- it is a nice simplification. Thank you for the suggestion
Co-authored-by: Jonah Gao <[email protected]> Co-authored-by: Weston Pace <[email protected]>
Thank you @jonahgao |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this merges before #13800 then I'll update this example to refer to the helpers introduced there.
Thanks @westonpace -- I was procrastinating on merging this example as I wanted to use it as an excuse to consolidate some of the examples down. that wa sunecessary |
I feel bad that adding yet another example makes the "too many examples" problem even worse, so I filed a few tickets to try and make it better: |
Which issue does this PR close?
Rationale for this change
Many of the Catalog APIs in DataFusion are not
async
and thus it is not clear how to interface with remote catalogs (e.g. catalogs that require remote network access)This pattern is possble with the current APIs but it is not obvious. Let's document how to do so
What changes are included in this PR?
Are these changes tested?
By CI
Are there any user-facing changes?
A new example