A simple library to extract specific entries from a remote http zip archive without the need to download the entire file.
When opening a zip archive using a remote url, the zip library will need to download the entire file to be able to read its contents. So if you had a 90 mega zipfile and wanted only a 100 kbyte file from within it, you will end doing the entire 90 mega download anyway.
The zip format defines a directory pointing to all it's inner entries. Containing properties like names, starting offset, size, and other stuff. And this directory is pretty small, just a few bytes placed on the very end of the archive. So, if we could just read this directory, we could know where, on the entire zip archive, is stored the file we want.
And if we could just request from the remote url, just that part of the content, we could get a smaller download, with just what we want and need.
Turns out that the http protocol supports a technique called byte serving. That states that we could define some header parameters on the http request specifying the byte ranges we want for that request.
With that in mind, what we do it's pretty simple. We make a first http request asking just for the http headers (not its content) and from that we know the content size. Then we make a small range requests at the end of the file, extracting all the directory info. Then, for the entries we want, we make requests for just that ranges. Apply the deflate algoritm and it's done.
With this approach, we end doing more http requests, so its only good to use if the desired content represents a small part of the entire zip archive.
More on this, can be found on my medium article.
You can add the library to your project using the nuget package:
dotnet add package HttpZipStream
Extracting just the first entry from a remote zip archive:
var httpUrl = "http://MyRemoteFile.zip";
using (var zipStream = new System.IO.Compression.HttpZipStream(httpUrl))
{
var entryList = await zipStream.GetEntriesAsync();
var entry = entryList.FirstOrDefault();
byte[] entryContent = await zipStream.ExtractAsync(entry);
/* do what you want with the entry content */
}
- Some minor documentation adjust.
- Proper name convention for async methods.
- Preparing projects to be build, packed and deploy by the server.
- Implementing a ExtractAsync overload that results just the entry content byte array.
- BUG #13: Some entries are not deflate correctly.
- Upgrading dotnet version to 3.1
MIT License - see the LICENSE file for details