Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add h3Distance, and internal h3ToIjk and ijkDistance #83

Merged
merged 8 commits into from
Jul 25, 2018

Conversation

isaacbrodsky
Copy link
Collaborator

@isaacbrodsky isaacbrodsky commented Jun 22, 2018

Experimental.

Adds an h3ToIjk function which produces users consumable IJK+ coordinates. An "origin" for the coordinate system must be specified, and coordinates are only comparable if they were obtained from the same origin. This is limited to obtaining coordinates for indexes which are on the same base cell or neighboring base cells. Pentagons have more restrictions about crossing the pentagonal distortion. The intention is in the future this could be improved to handle pentagon distortion better, which may make current coordinates invalid. Coordinates should not be persisted or passed between library versions.

Pentagon cases were mostly restricted by experimentally disabling a number of cases.

With IJK+ coordinates, it's possible to determine the grid distance of cells, with the same limitations of h3ToIjk. This is done under the assumption that the distances that are of interest are below the granularity of a base cell (at that resolution, haversine may be a better approach.)

Along with this change, CoordIJK is moved to the public API.

edit: this would be version 3.1.0.

H3Index bc;
setH3Index(&bc, 0, i, 0);
int childrenSz = H3_EXPORT(maxUncompactSize)(&bc, 1, res);
H3Index* children = calloc(childrenSz, sizeof(H3Index));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this should be stack allocated

@isaacbrodsky
Copy link
Collaborator Author

Looks like the build may have timed out due to iterating several resolutions of indexes.

@dfellis
Copy link
Collaborator

dfellis commented Jun 22, 2018

What is the purpose of exposing this when its stated that the values produced should not be compared between different versions of H3? If this is for hexagon distance calculation, why not simply implement a function that does that but will return -1 if any constraints are met, which can at first include the base cell constraint mentioned and potentially improved over time?

@isaacbrodsky
Copy link
Collaborator Author

h3ToIjk has other uses, for example when using H3 bucketed data in systems which expect a matrix of data. I think some good examples would be using H3 bucketed data in a GPU or in a system like Tensorflow. Having non-hierarchical coordinates (which IJK+ fulfills) is needed for that case.

@sahrk
Copy link
Collaborator

sahrk commented Jun 23, 2018

While I do understand the convenience of the IJK coordinates, I fear that most users won’t understand the issues involved and this might lead to confusion and frustration. Maybe better to keep those issues internal where they can be cleanly resolved; @dfellis solution gives a path to achieving that. Frankly, if I hadn’t been so over-cautious (cowardly?) there would be barely any CoordIJK in the H3 core.

I wonder if many of the use cases that the IJK coordinates support might be served by a mapping of the H3 indexes of a single resolution, in their implicit integer order, to a compact 1-dimensional sequence. Function h3ToSeqNum would map an H3 index to its place in the sequence from 1 (or zero) to numHexagons(res), and seqNumToH3 would do the inverse. Then H3 bucketed data would be simple arrays, minimal sub-sequences of the total sequence. Note this sequence does a good job of preserving spatial locality, since cells which share a hierarchical ancestor would be closer together in the sequence (cells that share a parent would be contiguous). This would give you a non-hierarchical address format that can be safely externalized, since it needn’t change when the H3 internals evolve. I'd even volunteer to write the functions.

@dfellis
Copy link
Collaborator

dfellis commented Jun 25, 2018

@sahrk aren't the compact 64-bit integer forms of H3 already a type of sequence number, though?

I do see @isaacbrodsky 's point about wanting a coordinate system and not just a 1-d sequence for matrix-type operations, but just exposing the CoordIJK values without being clear about their limitations seems like a not-great idea.

What about a function that takes an origin hexagon and a "k-ring" size and creates a simple 2D IJ coordinate system with <0, 0> centered on the specified hexagon (that would simply spit out the maximum I and J) and another function that takes the origin hexagon and I, J coordinates and spits out the specified hexagon id?

The first can be used to define boundaries (can return <-1, -1> if the k-ring size is too large for the algorithm) and the second can use CoordIJK internally to quickly get hexagons, but could in the future also fall back to the (I think I'm remembering correctly) hexRing algorithm to get all hexagons at a given radius (using the IJ coordinates to determine the radius and choosing the right hexagon based on the direction).

That one would take longer to figure out and probably wouldn't have to be in the first release, but this allows for an implementation that won't depend on internals that may or may not change in the future. I can implement that if you guys want.

@sahrk
Copy link
Collaborator

sahrk commented Jun 25, 2018

The 64-bit integer H3 indexes form a sequence, just not a compact one. So they can't be used directly for @isaacbrodsky 's data buffer case. For the physical layout of a data buffer it's not clear to me how a 3D buffer (with a redundant axis) is more tractable than a 1D buffer that retains the same structural information (e.g., you could define the neighbor function, metric distance, etc., using sequence numbers, even if it was just done by translating back-and-forth to H3 indexes).

Perhaps there are two use cases here that don't necessarily have the same solution: an H3-organized data buffer, and a planar hex grid coordinate system that allows users to perform arbitrary grid coordinate manipulations that aren't yet supported by H3 directly. For the latter @dfellis I like your idea of generating the planar coordinate system as needed for a particular problem. And if it's going to be used for the physical layout of data buffers then 2D makes more sense than 3D. I'm just afraid that the pentagons are going to make this approach very messy to scale to large geographical regions, whereas most of those complications go away for the 1D approach (the one complication I still see is that, since pentagons have a different sized descendent footprint than the hexagon base cells you can't use a constant stride length to step through the buffer by base cell).

@isaacbrodsky
Copy link
Collaborator Author

I think keeping the function internal and only exposing a h3Distance as part of the public API could be a reasonable solution. In the future, h3ToIjk (or h3ToIj) could be exposed when we have more confidence in the coordinate space it creates for a given "origin". This API could naturally be complimented with an H3Index ijToH3(H3Index origin, const CoordIJ *ij).

This approach is focused on smaller geographical regions, on the assumption that if you're working with data near the base cell level, performing the Haversine distance calculation is reasonable enough, and the distances could even be precomputed. It certainly gets very messy to deal with pentagons. :)

It was suggested to me that the API could provide the hexagons directly in a 2D planar coordinate format (perhaps named CoordIJ), but that would necessarily need to call a function like h3ToIjk because of H3's internal use of IJK+. We could wait on that being available to making the function part of the public API.

I'm not sure that @dfellis' solution will work. It seems like it would require the base cell grid to be an IJK coordinate system, and I'm not sure that works due to pentagon distortion. (I'd also hope for a solution which doesn't have time or memory complexity relative to the distance between the indexes, unless it can be precomputed i.e. if we could precompute something for all base cells).

I think the sequence number idea that @sahrk brought up is interesting, but as you mentioned above I think there are a few different use cases, so I think there may be cases where generating a problem-specific coordinate space would be advantageous.

@coveralls
Copy link

coveralls commented Jul 2, 2018

Coverage Status

Coverage remained the same at 100.0% when pulling 6b7299f on isaacbrodsky:h3-to-ijk into 2bb2f29 on uber:master.

@isaacbrodsky isaacbrodsky changed the title Add h3ToIjk, along with ijkDistance and h3Distance Add h3Distance, and internal h3ToIjk and ijkDistance Jul 4, 2018
Copy link
Collaborator

@nrabinowitz nrabinowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think this looks good, but I support the suggestion that we keep h3ToIjk internal for the moment.

* Rotate an H3Index 60 degrees clockwise about a pentagonal center.
* @param h The H3Index.
*/
H3Index _h3RotatePent60cw(H3Index h) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Is 60cw right here, or are we really rotating 72 degrees? Is there a more semantic way to express this, like _h3RotatePentSideCW?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parallels the existing _h3RotatePent60ccw, which is a special case of _h3Rotate60ccw/_h3Rotate60cw. I think it could be argued we're rotating 60 degrees, but also skipping over part of the coordinate space.

*/
int h3ToIjk(H3Index origin, H3Index h3, CoordIJK* out) {
if (H3_GET_MODE(origin) != H3_HEXAGON_MODE ||
H3_GET_MODE(h3) != H3_HEXAGON_MODE) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'm not sure if this is worthwhile. We technically ought to be doing this check on every H3 index algo, and then there's a slippery slope about what we ought to be checking - e.g. would h3IsValid be better here? I have mixed feelings about this, since it's basically a service to the developer to guard against stuff we can't type-check, but because it has perf issues in some cases, we can only do it in some functions and not others. It would be nice if we had a consistent approach to this across all algo functions.

The ideal option would be to make this a compile-time check. I don't see any way to do that with externally-provided indexes though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using assertions makes sense if you have people using debug builds versus release builds. Other languages tend to default to release builds so that is still an issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of the other algos don't do this because they do not have a way to return error cases to the caller. Other functions like geoToH3 do check for out of bounds inputs and return 0, and geoToH3 is certainly performance critical.

I would rather not use assertions, especially in release builds because instead of propagating an error it would abort the entire process, which would probably be bad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertions are not compiled into release builds. The definition of assert looks something like this:

#ifdef NDEBUG  /* Release mode */
#define assert(cond) /* Empty definition */
#else  /* Debug mode */
#define assert(cond) ...  /* < Define assert macro */
#endif


int res = H3_GET_RESOLUTION(origin);

if (res != H3_GET_RESOLUTION(h3)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Is there a reason to do this check here and not in other algos?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertion is better as I stated above.

int revDir = 0;
if (originBaseCell != baseCell) {
for (dir = 1; dir < 7; dir++) {
int testBaseCell = _getBaseCellNeighbor(originBaseCell, dir);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest encapsulating this as Direction getBaseCellNeighborDirection(int origin, int dest), in baseCells.c, returning -1 on failure.

int pentagonRotations = 0;
int directionRotations = 0;

if (originOnPent) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a concrete suggestion here, but is it possible to move the pentagon handling blocks to a separate function to hide the complexity they entail?

} else {
return b;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the classic case of where you should use a macro. Unless you can move this to a header (requires use of static inline so not strictly C89 compliant), a macro is the better choice here:

#define _IMAX(a, b) (((a) > (b)) ? (a) : (b))

@@ -42,4 +42,8 @@ void geoBoundaryPrint(const GeoBoundary* b);
void geoBoundaryPrintln(const GeoBoundary* b);
int readBoundary(FILE* f, GeoBoundary* b);

void iterateAllIndexesAtRes(int res, void (*callback)(H3Index));
void iterateAllIndexesAtResPartial(int res, void (*callback)(H3Index),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, C users would probably prefer macros here and you'll get better performance because C compiler can see what is going on in a macro, but function pointers are opaque. Side note: this is why using the C qsort in C++ is slower than std::sort from C++ STL, due to function pointers being worse off than templates.

Anyway, here's how I'd do it based on Jansson:

#define H3_FOR_EACH_INDEX_RES(argName, res) H3_FOR_EACH_INDEX_PARTIAL((res), NUM_BASE_CELLS)
#define H3_FOR_EACH_INDEX_PARTIAL(argName, res, baseCells) \
    assert((baseCells) <= NUM_BASE_CELLS); \
     for (int i = 0; i < baseCells; i++) { \
         H3Index bc; \
         setH3Index(&bc, 0, i, 0); \
         int childrenSz = H3_EXPORT(maxUncompactSize)(&bc, 1, (res)); \
         STACK_ARRAY_CALLOC(H3Index, children, childrenSz); \
         H3_EXPORT(uncompact)(&bc, 1, children, childrenSz, (res)); \
         for (int j = 0; j < childrenSz; j++) { \
             if (children[j] == 0) { \
                 continue; \
             } \
             H3Index argName = children[j];

Then users can do this:

H3_FOR_EACH_INDEX_AT_RES(1, myIndex) {
    printf("%lld\n", myIndex);
}

Which expands to:

for (...) {
    ...
    H3Index myIndex = children[j];
    printf("%lld\n", myIndex);
}

Copy link
Collaborator Author

@isaacbrodsky isaacbrodsky Jul 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change this, but I am not a huge fan of macros. Since this is a test, I don't think performance is that big a concern.

edit: also, this does not permit reusing the code for more than one resolution. That still needs to be a function call or another macro.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya everyone hates macros until they program in C for a while and realize it's the only way to avoid seriously ugly code. Also, underrated technique: X macro.

Meanwhile, rethinking this API too so that the blocks have matching parentheses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK seeing as this is a test I don't really think it's worth the effort. Thought this might lead to a convenience API in general. Never mind my suggestion.

void h3Distance_kRing_assertions(H3Index h3) {
int maxK = 5;
int r = H3_GET_RESOLUTION(h3);
if (r == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch is more performant and cleaner than else if chain.

int kSz = H3_EXPORT(maxKringSize)(k);

H3Index *neighbors = calloc(kSz, sizeof(H3Index));
int *distances = calloc(kSz, sizeof(int));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like stack allocations make more sense, unless this is an example to avoid difficulties with build tools.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was refactored out of a previous test without updating the allocations.

*/
int h3ToIjk(H3Index origin, H3Index h3, CoordIJK* out) {
if (H3_GET_MODE(origin) != H3_HEXAGON_MODE ||
H3_GET_MODE(h3) != H3_HEXAGON_MODE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using assertions makes sense if you have people using debug builds versus release builds. Other languages tend to default to release builds so that is still an issue.


int res = H3_GET_RESOLUTION(origin);

if (res != H3_GET_RESOLUTION(h3)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertion is better as I stated above.

Copy link
Contributor

@isaachier isaachier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work.

@isaacbrodsky
Copy link
Collaborator Author

Bumping for review from those who had comments before @dfellis @sahrk , if there are no comments I'd like to merge this and release 3.1.0.

@dfellis
Copy link
Collaborator

dfellis commented Jul 19, 2018

I'm still not convinced exposing the CoordIJK struct is a good idea (rather than some coordinate system intentionally derived from an origin hexagon) but I won't block you on this.

@isaacbrodsky
Copy link
Collaborator Author

@dfellis this pr no longer exposes CoordIJK as part of the public api. (It could be used by code willing to reach into internals)

I think you mean you prefer a 2d coordinate system derived from the origin, rather than the current ijk+ one used internally.

@dfellis
Copy link
Collaborator

dfellis commented Jul 19, 2018

Wait, let me refresh this page. I leave Chrome tabs open for months, sometimes they break in weird ways, maybe I'm still seeing an old version of the code.

CHANGELOG.md Outdated
@@ -7,6 +7,9 @@ The public API of this library consists of the functions declared in file

## [Unreleased]
### Added
- h3ToIjk function for getting IJK+ coordinates from an index (#83)
- Internal `ijkDistance` function for determining the grid distance between IJK+ coordinates (#83)
- Internal `h3Distance` function for determining the grid distance between H3 indexes (#83)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, you're right that it's not doing what I thought, but these CHANGELOG statements look wrong. h3Distance is marked as an export below, while ijkDistance and h3ToIjk are not.

Also the h3ToIjk filter application is not mentioned in these notes and I'm not entirely sure what it's purpose is, now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, those are reversed. h3Distance should be the public one, the others are internal. I rebased and updated.

h3ToIjk is just for experimental use.

Isaac Brodsky added 8 commits July 19, 2018 15:15
Adds an h3ToIjk function which produces users consumable IJK+ coordinates. An "origin" for the coordinate system must be specified, and coordinates are only comparable if they were obtained from the same origin. This is limited to obtaining coordinates for indexes which are on the same base cell or neighboring base cells. Pentagons have more restrictions about crossing the pentagonal distortion. The intention is in the future this could be improved to handle pentagon distortion better, which may make current coordinates invalid. Coordinates should not be persisted or passed between library versions.

Pentagon cases were mostly restricted by experimentally disabling a number of cases.

With IJK+ coordinates, it's possible to determine the grid distance of cells, with the same limitations of h3ToIjk. This is done under the assumption that the distances that are of interest are below the granularity of a base cell (at that resolution, haversine may be a better approach.)

Along with this change, CoordIJK is moved to the public API.
Also change some tests to stack allocation and remove an `else if` chain.
Also updated some copyright headers
@sahrk
Copy link
Collaborator

sahrk commented Jul 19, 2018

Until better solutions are available this sure is handy for lots of use cases. But I concur with @dfellis that a 2D CoordIJ result would help dissociate these addresses from the internal H3 system and thus limit any potential confusion.

@dfellis
Copy link
Collaborator

dfellis commented Jul 19, 2018

I agree with @sahrk and I don't even think it would be that hard? Especially if the proposed IJK -> IJ coord transform can be applied (though it looks like Dr. Sahr removed it?) I just referenced this PR from another PR since it seems an efficient polyfill is needed, and a CoordIJ would make that much simpler to write and reason about, I think.

@sahrk
Copy link
Collaborator

sahrk commented Jul 20, 2018

@dfellis I removed the straight-forward transformation into the CoordIJ system with i- and j-axes aligned with the original CoordIJK system because I realized that the best CoordIJ system would place the non-origin hex in the +i/+j quadrant, since there's no reason to try to preserve an arbitrary spherical concept of direction once we're on the plane. Since you have ijk+ coordinates at least one of the components will be 0, and the other two axes (with components that are positive or zero) will map to the new i- and j-axes (@dfellis I think you suggested this in different words earlier).

I haven't tested this but I believe it will be:

if (ijk.i == 0) {
   ij.i = ijk.j;
   ij.j = ijk.k;
} else if (ijk.j == 0) {
   ij.i = ijk.k;
   ij.j = ijk.i;
} else { // ijk.k == 0
   ij.i = ijk.i;
   ij.j = ijk.j;
}

@sahrk
Copy link
Collaborator

sahrk commented Jul 20, 2018

Probably now moot on this particular PR, but @dfellis it occurred to me that for your polyfill use case (and many others) it probably makes sense to keep the orientation of cells constant around the same origin (within the limits of pentagon distortion), even if the resulting 2D components are negative. For that you could use ij.i = ijk.i - ijk.k; ij.j = ijk.j - ij.k;.

An alternative might be to use two H3 indexes to define a coordinate system, and then pass that coordinate system to h3ToIJ, as well as to ijToH3, which you'll probably also need to get your results back into H3.

@isaacbrodsky
Copy link
Collaborator Author

@dfellis I'd like to point out regarding uber/h3-java#21 that this approach does support crossing icosahedron edges, it doesn't support crossing more than one base cell, and it doesn't support crossing pentagons.

@sahrk like you said, I think we need to preserve orientation.

I don't have any objection to adding CoordIJ and related public API functions, but I am wondering if adding them in a separate PR makes more sense. My main concern is that h3ToIjk has partial handling of pentagons, and it might be better to experiment with that more before adding it to the public API. If h3ToIjk is "good enough" then I can certainly add int h3ToIj(H3Index origin, CoordIJ *ij);

@dfellis
Copy link
Collaborator

dfellis commented Jul 20, 2018

@isaacbrodsky if you preprocess the polygons into an array of polygons intersected by the icosahedron faces, that point is moot.

@sahrk
Copy link
Collaborator

sahrk commented Jul 20, 2018

@isaacbrodsky some of this discussion was based on the code confusion over whether h3ToIjk was being made public. If h3ToIjk remains private for now, which would be my preference, then h3ToIj should also be private. To reduce confusion probably only one of them needs to exist, and for the reasons we've been discussing h3ToIj might be the better choice.

@dfellis if h3ToIjk can only cross one base cell boundary then I think you still have an issue, since the center hex on a face isn't adjacent to the pentagonal base cells at the face vertexes. There is already a well-defined FaceIJK coordinate system, maybe we need h3ToFaceIjk? All the pieces are in place to make that pretty straight-forward.

@isaacbrodsky
Copy link
Collaborator Author

My plan is to merge this early next week. It might be best to open an issue to discuss a public "H3Index to planar coordinates" API, or else that discussion can happen on a separate PR proposing such an API.

@isaacbrodsky isaacbrodsky merged commit e506b81 into uber:master Jul 25, 2018
@isaacbrodsky isaacbrodsky deleted the h3-to-ijk branch July 25, 2018 23:56
This was referenced Jul 26, 2018
mrdvt92 pushed a commit to mrdvt92/h3 that referenced this pull request Jun 19, 2022
Add h3Distance, and internal h3ToIjk and ijkDistance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants