Skip to content

Ideas for new API (Reading)

Joe Amenta edited this page Aug 19, 2020 · 1 revision

Simple usage samples

No .dbf file (or it's irrelevant)

string shapefilePath = @"C:\path\to\file.shp";
foreach (var geom in Shapefiles.ReadWithoutData(shapefilePath))
{
    DoStuffWith(geom);
}

Use data from .dbf file

string shapefilePath = @"C:\path\to\file.shp";
foreach (var (geom, data) in Shapefiles.ReadWithData(shapefilePath))
{
    if (!data.GetFieldValue<bool>("BLOCKED"))
    {
        DoStuffWith(geom, data.GetFieldValue<int>("ID"));
    }
}

More advanced usage samples

Delayed-read, indexing (no .dbf file)

string shapefilePath = @"C:\path\to\file.shp";

// the following line returns after reading the .shx file in full and only the header of the .shp
using var geoms = Shapefiles.OpenWithoutData(shapefilePath);
Parallel.For(0, geoms.Length, i =>
{
    // the following line uses the data we read from the .shx file before
    // it reads the range of bytes from the .shp file and decodes it to Geometry
    // if we open it as a memory-mapped file, this can be completely thread-safe
    DoStuffWith(geoms[i]);
});

Delayed-read, cached field lookups, indexing (.dbf file)

string shapefilePath = @"C:\path\to\file.shp";

using var geoms = Shapefiles.OpenWithData(shapefilePath);

// .GetFieldValue<T>(string fieldName) from above has two problems:
// 1. we have to look up the index of the field on every call
// 2. we have to check if the field is compatible on every call
// this is probably not the end of the world for most use cases, but
// when you process hundreds of millions of records split among scores
// of shapefiles, it would be nice to be able to do that once per
// shapefile, such that the only check we need to do is "did this
// DataField<T> instance come from this actual shapefile?".
DataField<bool> blockedField = geoms.GetDataField<bool>("BLOCKED");
DataField<int> idField = geoms.GetDataField<int>("ID");
Parallel.For(0, geoms.Length, i =>
{
    // combine the basic concepts from before.  .dbf files support
    // seeking just fine on their own: every record is the same size.
    var (geom, data) = geoms[i];
    if (!data.GetFieldValue(blockedField))
    {
        DoStuffWith(geom, data.GetFieldValue(idField));
    }
});

Weird usage samples

DBF only, no SHP

I've actually seen this: a data set provides a bunch of full .shp / .shx / .dbf triples, and then a few files that are just .dbf only, to work around the fact that .dbf doesn't support variable-length records. Think, "ticket prices at movie theater #3201 are $X on weekends and weeknights, $Y on weekdays from 6 AM to 4 PM, $Z on holidays, ...".

The most efficient and appropriate way to encode that info is to put one record per movie theater in the shapefile, with appropriate metadata about the location, and then add a separate independent .dbf file for ticket prices, with a separate row for each distinct ticket price indicating the times at which that price should be used.

So with that long-winded explanation in mind, we should be able to support this use case much more nicely.

string dataFilePath = @"C:\path\to\file.dbf";
foreach (var data in Shapefiles.ReadDataOnly(dataFilePath))
{
    if (!data.GetFieldValue<bool>("BLOCKED"))
    {
        DoStuffWith(data.GetFieldValue<int>("ID"), data.GetFieldValue<string>("NAME"));
    }
}

Or, for the performance-conscious:

string dataFilePath = @"C:\path\to\file.dbf";
using var openedData = Shapefiles.OpenDataOnly(dataFilePath));
DataField<bool> blockedField = openedData.GetDataField<bool>("BLOCKED");
DataField<int> idField = openedData.GetDataField<int>("ID");
DataField<string> nameField = openedData.GetDataField<string>("NAME");
foreach (var data in openedData)
{
    if (!data.GetFieldValue(blockedField))
    {
        DoStuffWith(data.GetFieldValue(idField), data.GetFieldValue(nameField));
    }
}