-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Accept compressed files as input to predict
when using a Predictor
#5237
Comments
I think this feature is a great idea! The latter design (passing a flag) seems better to me. I am adding @epwalsh here to get his input as well. |
Yeup, this seems reasonable. I think we should try to automatically detect the compression type, but also have the flag so that users can override it when the automatic detection fails. You may find this helper function useful: allennlp/allennlp/common/file_utils.py Line 1087 in 7a5106d
|
Hi, I'd like to try working on this. I'm relatively a noobie so are there any pointers I should keep in mind before raising a pull request? |
Hi @epwalsh! is this issue still not resolved? I'm looking for issues to start contributing to AllenNLP, can I take this up if not resolved already? |
Hi @spranjal25, we haven't heard from @Dbhasin1 for a while on their PR, so it's probably okay for you take over at this point. |
hey, sorry I'd been engaged elsewhere for a while. I'd like to give it one more shot! |
is the issue still open ? |
Hi @danieldeutsch , @epwalsh , |
Hi @Akshat977, feel free to open a PR when you're ready |
Is your feature request related to a problem? Please describe.
I typically used compressed datasets (e.g. gzipped) to save disk space. This works fine with AllenNLP during training because I can write my dataset reader to load the compressed data. However, the
predict
command opens the file and reads lines for thePredictor
. This fails when it tries to load data from my compressed files.allennlp/allennlp/commands/predict.py
Lines 208 to 218 in 39d7e5a
Describe the solution you'd like
Either automatically detect the file is compressed or add a flag to
predict
that indicates that the file is compressed. One method that I have used to detect if a file is gzipped is here, although it isn't 100% accurate. I have an implementation here. Otherwise a flag like--compression-type
to mark how the file is compressed should be sufficient. Passing the type of compression would allow support for gzip, bz2, or any other method.The text was updated successfully, but these errors were encountered: