-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add filter_map
, dedup
and some variants to the stdlib
#2120
Conversation
554ffc9
to
800a949
Compare
It seems we now blow the stack on Windows (1MB), and it's can't be really tweaked from tests - I tried to wrap all main tests as a scoped thread with larger stack but it didn't work so I suspect this is the separate LSP command. This is annoying and also probably means the standalone LSP binary is also overflowing the stack on Windows. |
5b1c106
to
d8e18b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I just realized this has been hanging for weeks.
I think filter_map
and dedup
are totally fine. sort_dedup
is fine too, but I proposed an alternative just for fun.
I'm not totally sold on hash_dedup
, though...
core/stdlib/std.ncl
Outdated
quadratic for `std.array.dedup`). | ||
|
||
The hash function must separate distinct elements with very high | ||
probability. If two elements have the same hash, they will be considered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the mention of probability misleading, especially when talking about a "hash" function: in a hash table, having a low collision probability only matters for performance, but here you absolutely need two distinct elements to have different "hash"es or else you'll get the wrong answer.
I wonder if it's worth baking in a notion of hashes and sets directly into nickel? It seems like almost as "core" a notion as equality, and we've had a feature request for sets for a while now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it's a logic error to not separate unequal keys. In some sense the hash function is the notion of equality at hand. Maybe I shouldn't call this a hash function at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I don't have a better name in mind that would be discoverable as a function name (mathematically what we want is an injection T -> String
), but dedup_inject
doesn't sound so good.
For the motivation, this function is taken from a real use-case: this is how we check for duplicates uid
in our build machine profiles. Also, it's the only way to deduplicate an array efficiently (in this context meaning O(nlog(n))
or less) without re-ordering it
core/stdlib/std.ncl
Outdated
[] | ||
array, | ||
|
||
sort_dedup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just brainstorming here, but what if we had dedup_sorted : forall a. -> Array a -> Array a
instead (which would do what the inner go
function does, basically)? Then you'd use it like xs |> sort cmp |> dedup_sorted
, but it feels a bit more composable than sort_dedup
because if you know that your array is already sorted then you don't need to sort it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can factor out dedup_sorted
. I would still keep sort_dedup
as a handy alias, though.
After discussing with @jneem through other channels, we agreed that |
Closes #1958. Adds
filter_map
,dedup
, as well as two variants ofdedup
that avoid the quadratic behavior of the purely equality-based vanilla version,sort_dedup
andhash_dedup
.