Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heterozygosity Blue #335

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

timothymillar
Copy link
Contributor

Best linear unbiased estimate of expected heterozygosity following eq. 8 in Harris and DeGiorgio (2017).

Heterozygosity_blue is a variation of heterozygosity_expected which minimizes bias due to related individuals based on a kniship matrix.
This implementation makes the kinship matrix an optional argument.

If kinship is not specified then "regular" heterozygosity_expected is calculated using the (n / (n-1)) correction
described in #145 (resolving that issue).
This is a sensible 'default' because heterozygosity_blue is equivalent to heterozygosity_expected when the kinship matrix indicates non-related, non-inbred individuals (a matrix of zeros with a diagonal of 1/ploidy).

The implementation is a little more complex than ideal because the inverse of the kinship matrix must be calculated for only the called samples at each variant.
I have tried to do this in a way that minimizes recalculation of the inverse matrix.
Additionally a nan value is returned if the (sub-)kinship matrix has no inverse.

It's worth noting that heterozygosity_blue can return values outside of the interval [0, 1] with some inputs.

Related to #287 there is an optional argument to specify mixed ploidy data

This function calculates expected heterozygosity with a correction
for relatedness of individuals based on a kinship matrix.
In this implementation the kinship matrix is an optional argument
an if it is not present then the 'typical' expected heterozygosity
is calculated which is equivilent to heterozygosity_blue with
unrelated individuals.
This function is also suitable for mixed ploidy data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant