Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Best linear unbiased estimate of expected heterozygosity following eq. 8 in Harris and DeGiorgio (2017).
Heterozygosity_blue is a variation of heterozygosity_expected which minimizes bias due to related individuals based on a kniship matrix.
This implementation makes the kinship matrix an optional argument.
If kinship is not specified then "regular" heterozygosity_expected is calculated using the (n / (n-1)) correction
described in #145 (resolving that issue).
This is a sensible 'default' because heterozygosity_blue is equivalent to heterozygosity_expected when the kinship matrix indicates non-related, non-inbred individuals (a matrix of zeros with a diagonal of 1/ploidy).
The implementation is a little more complex than ideal because the inverse of the kinship matrix must be calculated for only the called samples at each variant.
I have tried to do this in a way that minimizes recalculation of the inverse matrix.
Additionally a nan value is returned if the (sub-)kinship matrix has no inverse.
It's worth noting that heterozygosity_blue can return values outside of the interval [0, 1] with some inputs.
Related to #287 there is an optional argument to specify mixed ploidy data