-
Notifications
You must be signed in to change notification settings - Fork 25
Ideas for new API (Reading)
string shapefilePath = @"C:\path\to\file.shp";
foreach (var geom in Shapefiles.ReadWithoutData(shapefilePath))
{
DoStuffWith(geom);
}
string shapefilePath = @"C:\path\to\file.shp";
foreach (var (geom, data) in Shapefiles.ReadWithData(shapefilePath))
{
if (!data.GetFieldValue<bool>("BLOCKED"))
{
DoStuffWith(geom, data.GetFieldValue<int>("ID"));
}
}
string shapefilePath = @"C:\path\to\file.shp";
// the following line returns after reading the .shx file in full and only the header of the .shp
using var geoms = Shapefiles.OpenWithoutData(shapefilePath);
Parallel.For(0, geoms.Length, i =>
{
// the following line uses the data we read from the .shx file before
// it reads the range of bytes from the .shp file and decodes it to Geometry
// if we open it as a memory-mapped file, this can be completely thread-safe
DoStuffWith(geoms[i]);
});
string shapefilePath = @"C:\path\to\file.shp";
using var geoms = Shapefiles.OpenWithData(shapefilePath);
// .GetFieldValue<T>(string fieldName) from above has two problems:
// 1. we have to look up the index of the field on every call
// 2. we have to check if the field is compatible on every call
// this is probably not the end of the world for most use cases, but
// when you process hundreds of millions of records split among scores
// of shapefiles, it would be nice to be able to do that once per
// shapefile, such that the only check we need to do is "did this
// DataField<T> instance come from this actual shapefile?".
DataField<bool> blockedField = geoms.GetDataField<bool>("BLOCKED");
DataField<int> idField = geoms.GetDataField<int>("ID");
Parallel.For(0, geoms.Length, i =>
{
// combine the basic concepts from before. .dbf files support
// seeking just fine on their own: every record is the same size.
var (geom, data) = geoms[i];
if (!data.GetFieldValue(blockedField))
{
DoStuffWith(geom, data.GetFieldValue(idField));
}
});
I've actually seen this: a data set provides a bunch of full .shp / .shx / .dbf triples, and then a few files that are just .dbf only, to work around the fact that .dbf doesn't support variable-length records. Think, "ticket prices at movie theater #3201 are $X on weekends and weeknights, $Y on weekdays from 6 AM to 4 PM, $Z on holidays, ...".
The most efficient and appropriate way to encode that info is to put one record per movie theater in the shapefile, with appropriate metadata about the location, and then add a separate independent .dbf file for ticket prices, with a separate row for each distinct ticket price indicating the times at which that price should be used.
So with that long-winded explanation in mind, we should be able to support this use case much more nicely.
string dataFilePath = @"C:\path\to\file.dbf";
foreach (var data in Shapefiles.ReadDataOnly(dataFilePath))
{
if (!data.GetFieldValue<bool>("BLOCKED"))
{
DoStuffWith(data.GetFieldValue<int>("ID"), data.GetFieldValue<string>("NAME"));
}
}
Or, for the performance-conscious:
string dataFilePath = @"C:\path\to\file.dbf";
using var openedData = Shapefiles.OpenDataOnly(dataFilePath));
DataField<bool> blockedField = openedData.GetDataField<bool>("BLOCKED");
DataField<int> idField = openedData.GetDataField<int>("ID");
DataField<string> nameField = openedData.GetDataField<string>("NAME");
foreach (var data in openedData)
{
if (!data.GetFieldValue(blockedField))
{
DoStuffWith(data.GetFieldValue(idField), data.GetFieldValue(nameField));
}
}