Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the seqFetch for sparse CRAM file #79

Open
cmdcolin opened this issue Nov 9, 2020 · 15 comments
Open

Optimize the seqFetch for sparse CRAM file #79

cmdcolin opened this issue Nov 9, 2020 · 15 comments

Comments

@cmdcolin
Copy link
Contributor

cmdcolin commented Nov 9, 2020

@jrobinso Reported that a CRAM file with very few reads could take longer than expected because it ended up fetching the entire chromosome

@cmdcolin
Copy link
Contributor Author

cmdcolin commented Nov 9, 2020

Xref issue with sample data igvteam/igv.js#1212

@rbuels
Copy link
Contributor

rbuels commented Nov 9, 2020

could we address this by changing how we call the sequence-fetching callback? perhaps fetching only up to a maximum size, and tiling somehow from there, so the whole sequence is not kept in memory?

@jrobinso
Copy link
Contributor

@rbuels You are surely among the handful of people who know the details of CRAM, but I imagine that would be very complex. Perhaps it would solve the out-of-memory error, but would be so slow as to cause the user think the browser has frozen.

I did see this option in samtools (to store the reference sequence in the CRAM), it could be recommended to users with sparse CRAM files, since they are likely to be small anyway, but I don't know if cram.js supports such files

embed_ref=0|1
CRAM output only; defaults to 0 (off). If 1, this will store portions of the reference sequence in each slice, permitting decode without having requiring an external copy of the reference sequence.

@cmdcolin
Copy link
Contributor Author

cmdcolin commented Dec 3, 2024

Ran into this with office hours today

They had Sparse CRAM files and the chain of events was:

  • Zoomed out to whole genome, open CRAM track
  • IndexedFasta breaks up requests
  • Http-range-fetcher recombines requests
  • Then the google cloud complains 500 error that too large of a range request gets made

@cmdcolin
Copy link
Contributor Author

cmdcolin commented Dec 3, 2024

there is a thing in http-range-fetcher the chunkSize being some maximum https://github.com/rbuels/http-range-fetcher/blob/master/src/httpRangeFetcher.ts but i wonder if it is still merging beyond that...not sure. the thing they observed was very large chromosome-wide byte range requests for their sequences their JBrowse

@cedarwarman
Copy link

@cmdcolin This seems to still be happening after trying bam files instead of crams:

Error: HTTP 500 fetching https://<path>/TEST_cram_to_bam/dd5888.fasta bytes 938082304-1042808831

../../../packages/core/util/io/RemoteFileWithRangeCache.ts:94:13 (at d.fetchBinaryRange ()
JBrowse 2.17.0

From the server:

2024-12-10 10:24:40.965 PST
Response size was too large. Please consider reducing response size.

I'm not sure why it's still fetching the large reference interval even with bams?

@cmdcolin
Copy link
Contributor Author

cmdcolin commented Dec 10, 2024

@cedarwarman i can verify that there is a reason unrelated to this issue where the entire sequence gets requested.

it was added actually somewhat recently in jbrowse, basically to help with showing the 'reference sequence base' on snpcoverage mouseover

I will ponder whether there is a way to help with this

@cedarwarman
Copy link

Thanks for the update!

@cmdcolin
Copy link
Contributor Author

@cedarwarman if you want to try out a beta branch of JBrowse with a proposed fix (i'll probably just get it merged here and released) you can try out the branch with e.g.

jbrowse create --branch revert_full_length_sequence_fetch_snpcov newinstance

I am gonna get this out in the next release pretty shortly as well

@cmdcolin
Copy link
Contributor Author

(linked PR GMOD/jbrowse-components#4708)

@cedarwarman
Copy link

It looks like there's no longer errors when using the beta branch! The second loading bar in each track seems pretty stuck when zoomed way out, is this expected?

image

@cmdcolin
Copy link
Contributor Author

I found another code path in the pileup (second loading bar) doing the same thing and applied a patch that hopefully can help there!

@cedarwarman
Copy link

Amazing, thanks for your help on this!

@cmdcolin
Copy link
Contributor Author

both fixes now released in v2.18.0 if you want to try that out. thanks for reporting this!

@cedarwarman
Copy link

Looks like it's working!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants