-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added download_file()
; fixed missing word in write_disk()
docs
#599
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of this convenience function. It's not my call but I wonder if this sort of layer is something we would book against httr2 (https://github.com/r-lib/httr2/issues) or perhaps there is no reason to not just get on with it.
download_file <- function(url, path, overwrite = FALSE, ...) { | ||
|
||
url <- as.character(url) | ||
path <- as.character(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me think of something I do in usethis::use_course()
and recently decided to not present as a PR into curl (jeroen/curl#187): trying to determine local file name from the Content-Disposition header (and falling back to base name of the URL).
@hrbrmstr Do you have any thoughts on that? Maybe the curl issue is the best place to put them btw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a super gd question. I have a similar idiom setup for some light crawls we do but hadn't thought to make it generic somewhere.
I wonder if adding a "helper" to/for write_disk()
might be a way to go (not sure curl
is the right place for it).
Something like (seriously off the top of my head) write_disk <- function(path, infer = FALSE, overwrite = FALSE)
where if infer
were TRUE
it'd have the behaviour you posited:
- look for
Content-Disposition
and use that if found (potentially sanitizing it since it's theoretically untrusted content) - use last part of URL if ^^ does not exist, BUT
- use a generated UUID-ish name if ^^ is a
/
AND store it in the directory pointed to in path
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you describe it basically exactly what I do in usethis (Content-Disposition > last bit of URL > random name + sanitization):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 It'd be pretty straightforward for me to add this to write_disk()
(on holiday in 2 days so wldn't be until after that unless it seriously downpours in Acadia :-) if folks deem that a worthy add.
httr has been superseded in favour of httr2, so is no longer under active development. Thanks for using httr, thanks for contributing, and my apologies that your contribution never made it into the package. |
GET
+write_disk()
does a fine job dealing with downloads but is somewhat verbose to type.Base
download.file()
works fine but isn't cached-file-aware likewrite_disk()
is.GET
+write_disk()
is not multi-URL/file-aware like basedownload.file()
is.Base
download.file()
doesn't truly have all of the nice browser-like HTTP operations likeGET
does.Abusing bandwidth (i.e. needlessly re-downloading) even in these modern times is not cool.
Thus begat
download_file()
.The function:
download.file()
can in some circumstances)399
status code is used to identify whether a URL resource was already in a cached state since304 Not Modified
does not make sense (since the function does not test for existence) and399
is not likely to be used in our lifetime as a "standard"300
-level responseExample(s):
There are tests for single & multi-file downloads.
I also added a missing word to the
path
parameter inwrite_disk()
docs.devtools::check()
passed.