Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement Statistics.median #973

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

stev47
Copy link
Contributor

@stev47 stev47 commented Nov 29, 2021

This implements Statistics.median based on the existing bitonic
sorting, avoiding unnecessary allocation.
While it is generally suboptimal to sort the whole array, the compiler
manages to skip some branches since only the middle element(s) are used.
Thus median is generally faster than sort.

Using a dedicated median selection network could yield better
performance and might be considered for future improvement.

@RoyiAvital
Copy link

Any chance to push this forward?

This implements `Statistics.median` based on the existing bitonic
sorting, avoiding unnecessary allocation.
While it is generally suboptimal to sort the whole array, the compiler
manages to skip some branches since only the middle element(s) are used.
Thus `median` is generally faster than `sort`.

Using a dedicated median selection network could yield better
performance and might be considered for future improvement.
@stev47
Copy link
Contributor Author

stev47 commented Oct 29, 2024

Any chance to push this forward?

updated the MR, waiting for review

@mkitti
Copy link

mkitti commented Nov 12, 2024

This pull request implements median for StaticVector. It supports the empty array, missing, and NaN edge cases. For regular functionality, the tests includes a comparison to the Vector equivalent of a random StaticVector

While testing random static vectors is useful, defined tests would be welcome. In particular, one could replicate the the tests in Statistics.jl for median:
https://github.com/JuliaStats/Statistics.jl/blob/d49c2bf4f81e1efb4980a35fe39c815ef8396297/test/runtests.jl#L31-L92

@stev47
Copy link
Contributor Author

stev47 commented Nov 23, 2024

[...] defined tests would be welcome. In particular, one could replicate the the tests in Statistics.jl for median:

ok, I've added the tests from Statistics.jl

@RoyiAvital
Copy link

@mkitti , Any chance t ping another reviewer to push this?

@mateuszbaran
Copy link
Collaborator

mateuszbaran commented Dec 7, 2024

I tried benchmarking it and there seems to be an issue:

julia> a = [1, 10, 2, 1.2, -19, 12];

julia> sa = SVector{length(a)}(a);

julia> @btime median($a)
  24.433 ns (2 allocations: 112 bytes)
1.6

julia> @btime median($sa)
  515.203 ns (5 allocations: 176 bytes)
1.6

From a quick look it seems like nanix being a union might be the culprit. Could you improve the performance here?

@stev47
Copy link
Contributor Author

stev47 commented Dec 22, 2024

I tried benchmarking it and there seems to be an issue:

Can you try in a fresh session? I cannot reproduce your timings (Julia 1.11.2):

julia> using StaticArrays

julia> using Statistics

julia> a = [1, 10, 2, 1.2, -19, 12];

julia> sa = SVector{length(a)}(a);

julia> @btime median($a)
  31.548 ns (2 allocations: 112 bytes)
1.6

julia> @btime median($sa)
  8.529 ns (0 allocations: 0 bytes)
1.6

similar for both Julia 1.10.7 and Julia 1.12.0-DEV.1793 at commit 5ee67b8551.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants