Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use separated Coords #11

Open
JosiahParry opened this issue Nov 27, 2023 · 3 comments
Open

Use separated Coords #11

JosiahParry opened this issue Nov 27, 2023 · 3 comments

Comments

@JosiahParry
Copy link
Owner

Per recommendation separated x &y coordinates should be used when creating a geoarrow array as it will improve conversion to R types.

Comment: geoarrow/geoarrow-c#78

Example affected lines:

let arr = geoarrow::array::PolygonArray::<i32>::from(res);

@kylebarron

@paleolimbot
Copy link

FWIW, interleaved coords will still work and the optimizations that I'm talking about aren't implemented quite yet. The optimizations I'm talking about are:

  • Conversion to wk::xy(), which, under the hood, is the same as geoarrow.point with separated coordinates. nanoarrow doesn't currently do ALTREP for double, so this will incur a copy right now, although the conversion today is still much faster than converting interleaved coordinates (because for anything I haven't optimized I go through the "generic" conversion which has lots of overhead).
  • Conversion to sfc. This is probably the same for both interleaved and separated right now, but because matrices store columns together in R, I will probably optimize the separated coordinate -> sfc conversion first. Because sfc is pretty slow anyway after constructing it this one might not matter.

If ESRI JSON is similar to GeoJSON I imagine constructing the interleaved version is easier to do from your end. All this to say that I think you can safely defer this one for quite a while if you'd like 🙂

@JosiahParry
Copy link
Owner Author

Thanks for the heads up! Esri JSON is quite quite close to geojson coords are structured like [[x, y, z, m], [x, y, z, m]]. Fortunately the conversion to geoarrow falls (almost) entirely on geoarrow-rs.

The way it works is that I have my struct EsriPolygon, for example, where I have to implement the trait PolygonTrait which defines how to get x & y coords out of the struct. Then once that trait is defined I get for "free" the ability to create a PolygonArray using geoarrow-rs. However, at present that assumes an interleaved coordinate buffer. But it seems like moving forward I'll have the option to choose the coordinate representation type geoarrow/geoarrow-rs#279


Aside: I suspect any use of geoarrow or more low-level geometry representations will lead to a massive speed up regardless of a copy or not :)

@paleolimbot
Copy link

Aside: I suspect any use of geoarrow or more low-level geometry representations will lead to a massive speed up regardless of a copy or not :)

It's true. I've even found so far that Arrow WKB is a huge speed boost (compared to anything involving lists of R objects).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants