Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request - unphased IHS/XPEHH #374

Open
kullrich opened this issue Jan 21, 2022 · 1 comment
Open

Feature Request - unphased IHS/XPEHH #374

kullrich opened this issue Jan 21, 2022 · 1 comment

Comments

@kullrich
Copy link

Dear scikit-allel team,

would it be possible to add an unphased version of the IHS and XPEHH statistic to be directly used with a GenotypeArray?

The developer of selscan has just published an unphased version of both methods:

https://www.biorxiv.org/content/10.1101/2021.10.22.465497v1.full.pdf

Best regards and thank you in anticipation

Kristian

@kullrich
Copy link
Author

kullrich commented Jan 21, 2022

Might be a starting point, since only biallelic sites are used and converted into a so-called multi-allelic representation (ref-hom (0), het (1), alt-hom (2), I guess I need to create a new ihh01_scan_unphased function in allel/opt/stats.pyx, a new ssl2ihh_unphased function in allel/opt/stats.pyx and create a new ihs_unphased function in allel/stats/selection.py

import allel
from allel.util import asarray_ndim
from allel.util import check_integer_dtype
from allel.util import check_dim0_aligned
from allel.compat import memoryview_safe

def genotype_to_multilocus(g, pos):
    """Converts a Genotype array into a Multilocus array,
    only retaining biallelic sites and corresponding positions.

    Parameters
    ----------
    g : array_like, int, shape (n_variants, n_samples, ploidy)
        Genotype array.
    pos : array_like, int, shape (n_variants,)
        Variant positions (physical distance).

    Returns
    -------
    ms : array_like, int, shape (n_variants, n_loci)
        Multilocus array.
    mspos : array_like, int, shape (n_variants,)
        Biallelic positions (physical distance).

    """

    # convert genotype array into multi-locus array
    ac = g.count_alleles()
    bipos = [x for x,y in enumerate(ac.allelism()) if y in [1,2]]
    gs = g.subset(bipos)
    # get multi-locus array
    ms = gs.is_het().astype(int)
    ms[gs.is_hom_alt()] = 2
    ms = allel.HaplotypeArray(ms)
    mspos = [y for x,y in enumerate(pos) if x in bipos]

    # check inputs
    ms = asarray_ndim(ms, 2)
    check_integer_dtype(ms)
    mspos = asarray_ndim(mspos, 1)
    check_dim0_aligned(ms, mspos)
    ms = memoryview_safe(ms)
    mspos = memoryview_safe(mspos)

    return [ms, mspos]


g = allel.GenotypeArray([[[0, 0], [0, 0], [0, 0]],
                         [[0, 0], [0, 1], [0, 1]],
                         [[0, 0], [1, 1], [0, 1]],
                         [[0, 1], [1, 1], [1, 1]],
                         [[1, 1], [1, 1], [-1, -1]],
                         [[0, 0], [1, 2], [0, 0]],
                         [[0, 1], [1, 2], [1, 2]],
                         [[0, 1], [-1, -1], [0, 0]],
                         [[-1, -1], [-1, -1], [-1, -1]]])

pos = [2, 4, 7, 14, 15, 18, 19, 25, 27]

ms, mspos = genotype_to_multilocus(g, pos)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant