Diamond+CAZY analysis for mRNA reads #103

sumitra20 · 2022-09-06T14:25:13Z

Dear Developers,

I am trying to do CAZymes analysis using my mRNA reads. I downloaded a copy of the database (CAZyDB.08062022.fa), indexed it to Diamond format, and then performed a diamond blast. I've also downloaded the mapping file (CAZyDB.08062022.fam.subfam.ec.txt) but now I'm really confused about how to proceed with the annotation and getting the annotation summary out from the generated diamond output. Can I perform it using the MEGAN6 tool? I'd really appreciate it if you could give me some ideas on how to proceed. I'm very new to bioinformatic analysis and I'm struggling to understand the analysis pipelines.

#download CaZyme databse
wget http://bcb.unl.edu/dbCAN2/download/Databases/V11/CAZyDB.08062022.fa

#index it to diamond format
/home/combio7/diamond-2.0.12/bin/diamond makedb --in ./CAZyDB.08062022.fa -d CAZy

#diamond_blast

/home/combio7/diamond-2.0.12/bin/diamond blastx -d CAZy.dmnd -q /media/combio7/8TB_Backup/sumitra_130122/cycle_5_30822/5_sortme_cycle5/R33P5_R_sortme_non_rRNA.fq -o ./R33P5_R_cazyme.out -b 20.0 -f 6 -e 1e-3

#donwloading mapping file (not sure which to use)
wget https://bcb.unl.edu/dbCAN2/download/Databases/dbCAN-old@UGA/CAZyDB-ec-info.txt.07-20-2017

wget https://bcb.unl.edu/dbCAN2/download/Databases/V11/CAZyDB.08062022.fam.subfam.ec.txt

Thank you

The text was updated successfully, but these errors were encountered:

yinlabniu · 2022-09-07T18:26:52Z

You can use the 2022 ec file. As to your question on what's next, that depends on your research aims. Very often people will take the diamond output to calculate an abundance (e.g., FPKM) value for each cazyme family. The cazyme family can be connected to EC and substrates, so that they can answer questions, e.g., what families are more highly expressed than others in what conditions/samples, or what substrate degradation is more active, etc. I've never used MEGAN, but I know it can compute the abundance info. Other choices include HUMANN3, but I don't know if they have CAZyme database integrated in their pipeline. You can look into https://github.com/AnantharamanLab/METABOLIC, which has dbCAN integrated.

Yanbin

sumitra20 · 2022-09-10T07:24:08Z

Hi Yanbin,

Thank you so much for the suggestions given. I'll look into METABOLIC as well.

sumitra20 · 2022-09-27T06:14:11Z

Hi @yinlabniu

hi @yinlabniu ,

Sorry, i might have another question. As i mentioned before i ran a blastx for my nucleotide sequence against the CAZY database and below is the snapshot of how the output looks like

However, when i try analyzing my .fasta (nucleotide) file using run_dbcan tool i dont seem to get any put for diamond, i only get output for eCAMI. I have tried running with both -meta and -prok. Any idea why?

run_dbcan /media/combio7/8TB_Backup/sumitra_130122/CAZyme/megan6_output/bacteria_archea_R11P4_BS_phylum.fasta meta --out_dir output2_megan -t diamond eCAMI --dia_cpu 8

run_dbcan /media/combio7/8TB_Backup/sumitra_130122/CAZyme/megan6_output/bacteria_archea_R11P4_BS_phylum.fasta prok --out_dir output23_megan -t diamond eCAMI --dia_cpu 8

output:

Thank you

linnabrown · 2022-09-27T20:05:38Z

Hi @sumitra20 . If there is no output diamond column, it means that your input sequences do not have matched annotations from databases searched against by DIAMOND software.

This command is quite right. You can try this command in the example and it prompts that our program is only running diamond and eCAMI software currently.

run_dbcan EscheriaColiK12MG1655.fna prok --out_dir output_EscheriaColiK12MG1655 -t diamond eCAMI


***************************1. DIAMOND start*************************************************


diamond v2.0.15.153 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 4
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: output_EscheriaColiK12MG1655
#Target sequences to report alignments for: 1
Opening the database...  [0.078s]
Database: db/CAZy.dmnd (type: Diamond database, sequences: 2161786, letters: 1024499689)
Block size = 2000000000
Opening the input file...  [0s]
Opening the output file...  [0s]
Loading query sequences...  [0.004s]
Masking queries...  [0.012s]
Building query seed set...  [0.067s]
Algorithm: Query-indexed
Building query histograms...  [0.001s]
Allocating buffers...  [0s]
Loading reference sequences...  [1.649s]
Initializing temporary storage...  [0s]
Building reference histograms...  [3.259s]
Allocating buffers...  [0s]
Processing query block 1, reference block 1/1, shape 1/2.
Building reference seed array...  [1.902s]
Building query seed array...  [0.002s]
Computing hash join...  [0.363s]
Searching alignments...  [2.014s]
Processing query block 1, reference block 1/1, shape 2/2.
Building reference seed array...  [1.811s]
Building query seed array...  [0.002s]
Computing hash join...  [0.349s]
Searching alignments...  [2.235s]
Deallocating buffers...  [0.055s]
Clearing query masking...  [0s]
Computing alignments...  [4.846s]
Deallocating reference...  [0.025s]
Loading reference sequences...  [0s]
Deallocating buffers...  [0s]
Deallocating queries...  [0s]
Loading query sequences...  [0s]
Closing the input file...  [0s]
Closing the output file...  [0s]
Cleaning up...  [0s]
Total time = 18.736s
Reported 197 pairwise alignments, 197 HSPs.
197 queries aligned.


***************************1. DIAMOND end***************************************************




***************************3. eCAMI start***************************************************


Using CAZyme db in eCAMI
total time:346.363926s


***************************3. eCAMI end***************************************************


Preparing overview table from hmmer, eCAMI and diamond output...
overview table complete. Saved as output_EscheriaColiK12MG1655/overview.txt

sumitra20 · 2022-09-28T00:34:40Z

Hey @yinlabniu ,

Yes, running the test file EscheriaColiK12MG1655.fna works. But i just find it strange as to why i can get diamond annotation output when i run

diamond blastx -q /R33P5_R_sortme_non_rRNA.fq -o ./R33P5_R_cazyme.out -b 20.0 -f 6 -e 1e-3
But no any matched annotation when i use the exact same input file with run_dbcan. Is it bcs of the difference in databse used?

yinlabniu · 2022-09-28T02:02:29Z

It could be the evalue. You used 1e-3, but within run_dbcan we have set the default evalue 1e-102. Since you are working with short reads, the default is certainly too stringent. Yanbin

…

________________________________ From: sumitra20 ***@***.***> Sent: Tuesday, September 27, 2022 7:34 PM To: linnabrown/run_dbcan ***@***.***> Cc: Yanbin Yin ***@***.***>; Mention ***@***.***> Subject: Re: [linnabrown/run_dbcan] Diamond+CAZY analysis for mRNA reads (Issue #103) Non-NU Email

________________________________ Hey @yinlabniu<https://urldefense.com/v3/__https://github.com/yinlabniu__;!!PvXuogZ4sRB2p-tU!EYm1OEj6mvztKleFSVl-Xy_o2d3Lui7lmksd9OZXvKKp9RuBY-1GoviJegc6a0CRG3NKyTvEdkPRGGIkYkHKnA$> , Yes, running the test file EscheriaColiK12MG1655.fna works. But i just find it strange as to why i can get diamond annotation output when i run diamond blastx -q /R33P5_R_sortme_non_rRNA.fq -o ./R33P5_R_cazyme.out -b 20.0 -f 6 -e 1e-3 But no any matched annotation when i use the exact same input file with run_dbcan. Is it bcs of the difference in databse used? — Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/linnabrown/run_dbcan/issues/103*issuecomment-1260240789__;Iw!!PvXuogZ4sRB2p-tU!EYm1OEj6mvztKleFSVl-Xy_o2d3Lui7lmksd9OZXvKKp9RuBY-1GoviJegc6a0CRG3NKyTvEdkPRGGK2KxJI0A$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AEXNKZTK4OTPI2DGJ7Z22C3WAOHCVANCNFSM6AAAAAAQF4HWM4__;!!PvXuogZ4sRB2p-tU!EYm1OEj6mvztKleFSVl-Xy_o2d3Lui7lmksd9OZXvKKp9RuBY-1GoviJegc6a0CRG3NKyTvEdkPRGGIHeHhcqw$>. You are receiving this because you were mentioned.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diamond+CAZY analysis for mRNA reads #103

Diamond+CAZY analysis for mRNA reads #103

sumitra20 commented Sep 6, 2022

yinlabniu commented Sep 7, 2022

sumitra20 commented Sep 10, 2022

sumitra20 commented Sep 27, 2022

linnabrown commented Sep 27, 2022

sumitra20 commented Sep 28, 2022

yinlabniu commented Sep 28, 2022 via email

Diamond+CAZY analysis for mRNA reads #103

Diamond+CAZY analysis for mRNA reads #103

Comments

sumitra20 commented Sep 6, 2022

yinlabniu commented Sep 7, 2022

sumitra20 commented Sep 10, 2022

sumitra20 commented Sep 27, 2022

linnabrown commented Sep 27, 2022

sumitra20 commented Sep 28, 2022

yinlabniu commented Sep 28, 2022 via email