Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diamond+CAZY analysis for mRNA reads #103

Open
sumitra20 opened this issue Sep 6, 2022 · 6 comments
Open

Diamond+CAZY analysis for mRNA reads #103

sumitra20 opened this issue Sep 6, 2022 · 6 comments

Comments

@sumitra20
Copy link

Dear Developers,

I am trying to do CAZymes analysis using my mRNA reads. I downloaded a copy of the database (CAZyDB.08062022.fa), indexed it to Diamond format, and then performed a diamond blast. I've also downloaded the mapping file (CAZyDB.08062022.fam.subfam.ec.txt) but now I'm really confused about how to proceed with the annotation and getting the annotation summary out from the generated diamond output. Can I perform it using the MEGAN6 tool? I'd really appreciate it if you could give me some ideas on how to proceed. I'm very new to bioinformatic analysis and I'm struggling to understand the analysis pipelines.

#download CaZyme databse
wget http://bcb.unl.edu/dbCAN2/download/Databases/V11/CAZyDB.08062022.fa

#index it to diamond format
/home/combio7/diamond-2.0.12/bin/diamond makedb --in ./CAZyDB.08062022.fa -d CAZy

#diamond_blast

/home/combio7/diamond-2.0.12/bin/diamond blastx -d CAZy.dmnd -q /media/combio7/8TB_Backup/sumitra_130122/cycle_5_30822/5_sortme_cycle5/R33P5_R_sortme_non_rRNA.fq -o ./R33P5_R_cazyme.out -b 20.0 -f 6 -e 1e-3

#donwloading mapping file (not sure which to use)
wget https://bcb.unl.edu/dbCAN2/download/Databases/dbCAN-old@UGA/CAZyDB-ec-info.txt.07-20-2017

wget https://bcb.unl.edu/dbCAN2/download/Databases/V11/CAZyDB.08062022.fam.subfam.ec.txt

Thank you

@yinlabniu
Copy link
Collaborator

You can use the 2022 ec file. As to your question on what's next, that depends on your research aims. Very often people will take the diamond output to calculate an abundance (e.g., FPKM) value for each cazyme family. The cazyme family can be connected to EC and substrates, so that they can answer questions, e.g., what families are more highly expressed than others in what conditions/samples, or what substrate degradation is more active, etc. I've never used MEGAN, but I know it can compute the abundance info. Other choices include HUMANN3, but I don't know if they have CAZyme database integrated in their pipeline. You can look into https://github.com/AnantharamanLab/METABOLIC, which has dbCAN integrated.

Yanbin

@sumitra20
Copy link
Author

Hi Yanbin,

Thank you so much for the suggestions given. I'll look into METABOLIC as well.

@sumitra20
Copy link
Author

Hi @yinlabniu

hi @yinlabniu ,

Sorry, i might have another question. As i mentioned before i ran a blastx for my nucleotide sequence against the CAZY database and below is the snapshot of how the output looks like
image

However, when i try analyzing my .fasta (nucleotide) file using run_dbcan tool i dont seem to get any put for diamond, i only get output for eCAMI. I have tried running with both -meta and -prok. Any idea why?

run_dbcan /media/combio7/8TB_Backup/sumitra_130122/CAZyme/megan6_output/bacteria_archea_R11P4_BS_phylum.fasta meta --out_dir output2_megan -t diamond eCAMI --dia_cpu 8

run_dbcan /media/combio7/8TB_Backup/sumitra_130122/CAZyme/megan6_output/bacteria_archea_R11P4_BS_phylum.fasta prok --out_dir output23_megan -t diamond eCAMI --dia_cpu 8

output:
image

Thank you

@linnabrown
Copy link
Owner

Hi @sumitra20 . If there is no output diamond column, it means that your input sequences do not have matched annotations from databases searched against by DIAMOND software.

This command is quite right. You can try this command in the example and it prompts that our program is only running diamond and eCAMI software currently.

run_dbcan EscheriaColiK12MG1655.fna prok --out_dir output_EscheriaColiK12MG1655 -t diamond eCAMI


***************************1. DIAMOND start*************************************************


diamond v2.0.15.153 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 4
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: output_EscheriaColiK12MG1655
#Target sequences to report alignments for: 1
Opening the database...  [0.078s]
Database: db/CAZy.dmnd (type: Diamond database, sequences: 2161786, letters: 1024499689)
Block size = 2000000000
Opening the input file...  [0s]
Opening the output file...  [0s]
Loading query sequences...  [0.004s]
Masking queries...  [0.012s]
Building query seed set...  [0.067s]
Algorithm: Query-indexed
Building query histograms...  [0.001s]
Allocating buffers...  [0s]
Loading reference sequences...  [1.649s]
Initializing temporary storage...  [0s]
Building reference histograms...  [3.259s]
Allocating buffers...  [0s]
Processing query block 1, reference block 1/1, shape 1/2.
Building reference seed array...  [1.902s]
Building query seed array...  [0.002s]
Computing hash join...  [0.363s]
Searching alignments...  [2.014s]
Processing query block 1, reference block 1/1, shape 2/2.
Building reference seed array...  [1.811s]
Building query seed array...  [0.002s]
Computing hash join...  [0.349s]
Searching alignments...  [2.235s]
Deallocating buffers...  [0.055s]
Clearing query masking...  [0s]
Computing alignments...  [4.846s]
Deallocating reference...  [0.025s]
Loading reference sequences...  [0s]
Deallocating buffers...  [0s]
Deallocating queries...  [0s]
Loading query sequences...  [0s]
Closing the input file...  [0s]
Closing the output file...  [0s]
Cleaning up...  [0s]
Total time = 18.736s
Reported 197 pairwise alignments, 197 HSPs.
197 queries aligned.


***************************1. DIAMOND end***************************************************




***************************3. eCAMI start***************************************************


Using CAZyme db in eCAMI
total time:346.363926s


***************************3. eCAMI end***************************************************


Preparing overview table from hmmer, eCAMI and diamond output...
overview table complete. Saved as output_EscheriaColiK12MG1655/overview.txt

@sumitra20
Copy link
Author

Hey @yinlabniu ,

Yes, running the test file EscheriaColiK12MG1655.fna works. But i just find it strange as to why i can get diamond annotation output when i run

diamond blastx -q /R33P5_R_sortme_non_rRNA.fq -o ./R33P5_R_cazyme.out -b 20.0 -f 6 -e 1e-3
But no any matched annotation when i use the exact same input file with run_dbcan. Is it bcs of the difference in databse used?

@yinlabniu
Copy link
Collaborator

yinlabniu commented Sep 28, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants