Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cigar format using ssw #45

Open
rmcolq opened this issue Oct 28, 2019 · 2 comments
Open

Cigar format using ssw #45

rmcolq opened this issue Oct 28, 2019 · 2 comments

Comments

@rmcolq
Copy link

rmcolq commented Oct 28, 2019

I want to do a SW alignment and get both a score, and a cigar as output. I have tried the following two methods using the latest pip installed version of parasail-python:

query_seq = "VNDDGVERRRCSYCC*C*KH**Q**FYKEPFLHKVEELRF*Q*AHKN*EHPHGP*LN**GSISNSTR*TP*RPEG*TPKECG*DPAGNGMDRPPHGPESTNLGGVKPNRKARDTQKQFKLV*CWWFGTS*HPYWLCEEYRATTP*PPSQVECQ*WG*SWGRQFNNCARSSMESVPHATN*PTQQWSSKGTTYW*HDRIL*GAHLPLCNIHWPRKNCWGSLTSSGLFNSQDNYTPYRRLVEGLLCANT*SEVKL*PTQRT*E*AMAICCSY**LL*VLLQDSQSSGLLVGTSLGHIWPFQPPSTRLVQGHA*LPTR*QVRNSQGCRACCYQWLGIKTF*RP"
ref_seq = "VNDDGVERRRCSYCC*C*KH**Q**FYKEPFLHKVEELRF*Q*AHKN*EHPHGP*LN**GSISNSTR*TP*RPEG*TPKECG*DPAGNGMDRPPHGPESTNLGGVKPNRKARDTQKQFKLV*CWWFGTS*HPYWLCEEYRATTP*PPSQVECQ*WG*SWGRQFNNCARSSMESVPHATN*PTQQWSSKGTTYW*HDRIL*GAHLPLCNIHWPRKNCWGSLTSSGLFNSQDNYTPYRRLVEGLLCANT*SEVKL*PTQRT*E*AMAICCSY**LL*VLLQDSQSSGLLVGTSLGHIWPFQPPSTRLVQGHA*LPTR*QVRNSQGCRACCYQWLGIKTF*RPPKQT*ATQRAQPTL*L*LDIYGGG*DDNTTYGAVWDLLESP*CVQLYSLIIA*YALARTRRSG*RLNTVGIGRCGACNWVHQGQSYKGHEECC*RTQGSHTIRPVWPRDICSFKEIFLWWRSNREDLK"
result = parasail.ssw(query_seq, ref_seq, 10, 1, parasail.blosum62)
print(result.score1)
print(result.cigar)
print(result.ref_begin1)
print(result.ref_end1)
print(result.read_begin1)
print(result.read_end1)
#print(result.cigar.decode)
result = parasail.sw_trace(query_seq, ref_seq, 10, 1, parasail.blosum62)
print(result.cigar)
print(result.cigar.seq)
print(result.cigar.decode)

which yields

1783
[5447]
0
339
0
339
<parasail.bindings_v2.Cigar object at 0x7fa0203b7780>
[5447]
b'340='

It seems that with the first method, using ssw, I can get the score but can't get the cigar in the standard format (instead, printing result.cigar.decode gives error message "AttributeError: 'numpy.ndarray' object has no attribute 'decode'"). Using the sw_trace method I can get the decoded cigar, but not the score? Is it possible to access the decoded cigar (the number 5447 doesn't mean anything to me? How should I interpret it?) in ssw?

@jeffdaily
Copy link
Owner

I had to go back and remind myself why I wrote the ssw python routine. The parasail C library has a parasail_ssw routine that tries to mimic the SSW library using parasail routines. It returns a specific kind of result (parasail_result_ssw). The SSW C library has this comment on their C struct:

	@field	cigar	best alignment cigar; stored the same as that in BAM format, high 28 bits: length, low 4 bits: M/I/D (0/1/2);
					cigar = 0 when the best alignment path is not available
	@field	cigarLen	length of the cigar string; cigarLen = 0 when the best alignment path is not available

I did not provide any user-friendly decode of the raw binary cigar from the python wrapper. It might be slower than a C equivalent, but you can write one in Python like so:

def decode(b):
    __BAM_CIGAR_STR = 'MIDNSHP=X8'
    def _decode(x):
        l = str(x>>4)
        try:
            c = __BAM_CIGAR_STR[x&0xf]
        except:
            c = 'M'
        return l+c
    return ''.join([_decode(i) for i in b])

@rmcolq
Copy link
Author

rmcolq commented Oct 28, 2019

Thanks! I'll try the python decode function you suggest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants