You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used the FastaInputFormat to read fasta file in Spark.
After code analysis, I see that getSplits method is overrided. Okey, but number of returned partitions equals number of fasta records.
E.g: fasta file has 661 bytes (13 fasta records) and my mapred.max.split.size conf is set to 500 bytes. I expect to get 2 partitions, but I get 13 partitions. What is purpose for that extra splitting? Why getSplits method from FileInputFormat couldn't be used?
Moreover FastqInputFormat uses only getSplits method from FileInputFormat and it's behaviour is expected by me.
The text was updated successfully, but these errors were encountered:
PatrycjaKarbownik
changed the title
FastaInputReader splits file to too many partitions
FastaInputReader splits file into too many partitions
Jan 18, 2020
PatrycjaKarbownik
changed the title
FastaInputReader splits file into too many partitions
FastaInputFormat splits file into too many partitions
Jan 18, 2020
I used the FastaInputFormat to read fasta file in Spark.
After code analysis, I see that getSplits method is overrided. Okey, but number of returned partitions equals number of fasta records.
E.g: fasta file has 661 bytes (13 fasta records) and my mapred.max.split.size conf is set to 500 bytes. I expect to get 2 partitions, but I get 13 partitions. What is purpose for that extra splitting? Why getSplits method from FileInputFormat couldn't be used?
Moreover FastqInputFormat uses only getSplits method from FileInputFormat and it's behaviour is expected by me.
The text was updated successfully, but these errors were encountered: