dolphinsuite.Rmd

---
title: "DolphinSuite"
---

<style>
table, th, td {
  border: 1px solid black;
  border-collapse: collapse;
}
body {
text-align: justify}
</style>

All raw sequencing data (scRNA-Seq, scATAC-Seq and CAGE-Seq) were processed by pipelines developed within the interactive pipeline manager [DolphinNext](./dolphinsuite.html) ([Yukselen et. al 2020](https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-020-6714-x)). The metadata as well as the processed datasets are currently being hosted in [DolphinSuite](./dolphinsuite.html), an end-to-end bioinformatics analysis platform.

## DolphinSuite Platforms

<a href="https://www.umassmed.edu/biocore/" target="_blank">UMass Chan Medical School BioCore</a> has developed **DolphinSuite**, a bioinformatics platform to support the analysis of high throughput data. DolphinSuite tracks samples from sample collection to data processing (sequencing, proteomics, metabolomics) and provides interactive analysis using an intuitive web interface. DolphinSuite is built to ensure secure access to the processed data using 3rd party applications for tailor-made analysis and data sharing. 

DolphinSuite is comprised of three major components that supports distinct aspects of high throughput data analysis. 
      
<table>
<tbody>
  <tr>
   <td style="text-align:center;width:33%; vertical-align: text-top;"> 
      <p> <a href="https://dnext.dolphinnext.com" target="_blank" style="font-size: 1.5em;">DolphinNext (Dnext)</a> </p>
      <img width="300" height="220" src="./images/dolphinnext.png" class="center">
      <br>
      <p> **Dnext**: a distributed data processing platform for high throughput genomics </p>
   </td>
   <td style="text-align:center;width:33%; vertical-align: text-top;"> 
      <p> <a href="https://dmeta.dolphinnext.com" target="_blank" style="font-size: 1.5em;">DolphinMeta (Dmeta)</a> </p>
      <img width="300" height="200" src="./images/dmeta.png" class="center">
      <br>
      <p> **Dmeta**: automated data processing and management using ontology-based metadata tracking system </p>
   </td>
   <td style="text-align:center; width:33%; vertical-align: text-top;"> 
      <p> <a href="https://dportal.dolphinnext.com" target="_blank" style="font-size: 1.5em;">DolphinPortal (Dportal)</a> </p>
      <img width="300" height="200" src="./images/dportal.png" class="center">
      <br>
      <p> **Dportal**: a user-friendly and interactive data browser for filtering and querying</p>
   </td>
  </tr>
</tbody>
</table>

<br>
<br>

## Preprocessing Pipelines

DolphinNext is a revolutionary pipeline management system that allows users to <strong>interactively customize data pre-processing</strong> workflows and ensures <strong>reproducibility</strong> of analysis results. Below, we give a summary of three pipelines that are used to process scRNA-Seq, scATAC-Seq and CAGE-Seq reads incorporated in this project. 

<table>
<tbody>
  <tr>
   <td style="text-align:center;width:40%; vertical-align: text-top;"> 
      <p> <a href="https://dnext.dolphinnext.com/index.php?np=1&id=12" target="_blank" style="font-size: 1.2em;">scRNA-Seq (Tasic et. al 2018) Pipeline</a> </p>
      <img width="400" height="240" src="./images/scRNA-Seq pipeline.png" class="center">
      <br>
   </td>
   <td style="width:60%; vertical-align: text-top; padding: 10px"> 
      <p> The <strong>scRNA-Seq (Tasic et. al 2018) pipeline</strong> includes Quality Control, rRNA filtering, Genome Alignment using STAR and estimates gene expression levels by GenomeAlignment Package in R. </p>
      <p> <strong>Steps</strong>: </p>
      <ol>
         <li>For Quality Control, we use FastQC to create qc outputs. There are optional read quality filtering (trimmomatic), read quality trimming (trimmomatic), adapter removal (cutadapt) processes available.</li>
         <li>STAR is used to count or filter out common RNAs (eg. rRNA, miRNA, tRNA, piRNA etc.).</li>
         <li>STAR is used to align RNA-Seq reads to a genome.</li>
         <li>SummarizeOverlaps is used to calculate gene counts.</li>
      </ol>
      <p style="margin : 0; padding-top:0;">**Docker**: <a href="https://hub.docker.com/r/dolphinnext/tasic2018_scrnaseq_pipeline" target="_blank">https://hub.docker.com/r/dolphinnext/tasic2018_scrnaseq_pipeline</a></p>
      <p>**GitHub**: <a href="https://github.com/dolphinnext/tasic2018_scrnaseq_pipeline" target="_blank">https://github.com/dolphinnext/tasic2018_scrnaseq_pipeline</a> </li></p>
   </td>
  </tr>
  <tr>
   <td style="text-align:center;width:40%; vertical-align: text-top;"> 
      <p> <a href="https://dnext.dolphinnext.com/index.php?np=1&id=18" target="_blank" style="font-size: 1.2em;">scATAC-Seq (Graybuck et. al 2021) Pipeline</a> </p>
      <img width="400" height="180" src="./images/scATAC-Seq pipeline.png" class="center">
      <br>
   </td>
   <td style="width:60%; vertical-align: text-top; padding: 10px"> 
      <p> The <strong>scATAC-Seq (Graybuck et. al 2021) pipeline</strong> includes a set of tools necessary for pre-processing ATAC-Seq libraries prepared as in (Graybuck et. al 2021).</p>
      <p> <strong>Steps</strong>: </p>
      <ol>
         <li>Trim Galore tool is used to remove adapters from reads.</li>
         <li>Trimmed and untrimmed reads are aligned to the reference using Bowtie.</li>
         <li>Duplicates reads are deleted using Samtools.</li>
      </ol>
      <p style="margin : 0; padding-top:0;">**Docker**: <a href="https://hub.docker.com/r/dolphinnext/graybuck2021_scatacseq_pipeline " target="_blank">https://hub.docker.com/r/dolphinnext/graybuck2021_scatacseq_pipeline </a></p>
      <p>**GitHub**: <a href="https://github.com/dolphinnext/graybuck2021_scatacseq_pipeline " target="_blank">https://github.com/dolphinnext/graybuck2021_scatacseq_pipeline </a> </li></p>
   </td>
  </tr>
    <tr>
   <td style="text-align:center;width:40%; vertical-align: text-top;"> 
      <p> <a href="https://dnext.dolphinnext.com/index.php?np=1&id=38" target="_blank" style="font-size: 1.2em;">scATAC-Seq (Mich et. al 2021) Pipeline</a> </p>
      <img width="400" height="180" src="./images/mich pipeline.png" class="center">
      <br>
   </td>
   <td style="width:60%; vertical-align: text-top; padding: 10px"> 
      <p> The <strong>scATAC-Seq (Mich et. al 2021) pipeline</strong> includes a set of tools necessary for pre-processing ATAC-Seq libraries prepared as in (Mich et. al 2021).</p>
      <p> <strong>Steps</strong>: </p>
      <ol>
         <li>Reads are aligned to the reference using Bowtie2.</li>
         <li>Low quality, secondary and unmapped reads are deleted using Samtools.</li>
      </ol>
      <p style="margin : 0; padding-top:0;">**Docker**: <a href="https://hub.docker.com/r/dolphinnext/mich2021_scatacseq_pipeline " target="_blank">https://hub.docker.com/r/dolphinnext/mich2021_scatacseq_pipeline </a></p>
      <p>**GitHub**: <a href="https://github.com/dolphinnext/mich2021_scatacseq_pipeline " target="_blank">https://github.com/dolphinnext/mich2021_scatacseq_pipeline </a> </li></p>
   </td>
  </tr>
  
  <tr>
   <td style="text-align:center;width:40%; vertical-align: text-top;"> 
      <p> <a href="https://dnext.dolphinnext.com/index.php?np=1&id=21" target="_blank" style="font-size: 1.2em;">CAGE-Seq Pipeline</a> </p>
      <img width="400" height="160" src="./images/CAGE-Seq pipeline.png" class="center">
      <br>
   </td>
   <td style="width:60%; vertical-align: text-top; padding: 10px"> 
      <p> The <strong>CAGE-seq pipeline</strong> includes rRNA filtering, Genome Alignment using STAR, and peak calling using MACS2. </p>
      <p> <strong>Steps</strong>: </p>
      <ol>
         <li>STAR is used to count or filter out common RNAs (eg. rRNA, miRNA, tRNA, piRNA etc.).</li>
         <li>STAR is used to align RNA-Seq reads to a genome.</li>
         <li>MACS2 is used for calling peaks in aligned bam files.</li>
      </ol>
      <p style="margin : 0; padding-top:0;">**Docker**: <a href="https://hub.docker.com/r/dolphinnext/cageseq_pipeline" target="_blank">https://hub.docker.com/r/dolphinnext/cageseq_pipeline</a></p>
      <p>**GitHub**: <a href="https://github.com/dolphinnext/cageseq_pipeline" target="_blank">https://github.com/dolphinnext/cageseq_pipeline</a> </li></p>
   </td>
  </tr>
</tbody>
</table>

<br>

## Downstream Analysis Pipelines

DolphinNext can also be used to design downstream analysis pipelines that may incorporate processed sequencing datasets to further investigate 
cell types of interests and build necessary files for online applications such as [Cellxgene](./cellxgenebrowser.html) and [UCSC Track Hub](./ucscbrowser.html). 

All **Aggregate Analysis** pipelines incorporate a few number of common processes as well as steps geared towards each individual sequencing technology or dataset. 

<p> <strong>Steps</strong>: </p>
<ol>
   <li>Aggregating Bam Files with unique mapped reads to Cell Types.</li>
   <li>Peak Calling with MACS2.</li>
   <li>Creating BigWig files for UCSC Genome Browser.</li>
   <li>Creating auxiliary files for building custom UCSC track hubs.</li>
   <li>Single cell RNA downstream analysis using Seurat (only for scRNA-Seq pipeline)</li>
   <li>Making h5ad files for visualization in Cellxgene (only for scRNA-Seq pipeline)</li>
</ol>

We have designed an additional downstream analysis pipeline (see <strong>Integrate-MultiOmics-Tracks</strong>) to build a large custom UCSC Track Hub of combined scRNA-Seq, scATAC-Seq and CAGE-Seq analysis. The resulting UCSC track hub can be accessed from here: [mouse](./ucscbrowser_mouse.html) and [human](./ucscbrowser_human.html). 

<br>
<table>
<tbody>
  <tr>
    <td style="text-align:center;width:33%; vertical-align: text-top;"> 
      <p> <a href="https://dnext.dolphinnext.com/index.php?np=1&id=20" target="_blank" style="font-size: 1.2em;">Aggregate Analysis (Tasic 2018)</a> </p>
      <img width="300" height="180" src="./images/aggregate Tasic.png" class="center">
   </td>
   <td style="text-align:center; width:33%; vertical-align: text-top;"> 
      <p> <a href="https://dnext.dolphinnext.com/index.php?np=1&id=19" target="_blank" style="font-size: 1.2em;">scATAC-Seq Aggregate Analysis</a> </p>
      <img width="300" height="180" src="./images/aggregate Graybuck.png" class="center">
   </td>
   <td style="text-align:center;width:33%; vertical-align: text-top;"> 
      <p> <a href="https://dnext.dolphinnext.com/index.php?np=1&id=22" target="_blank" style="font-size: 1.2em;">Cage-Seq Aggregate Analysis</a> </p>
      <img width="300" height="130" src="./images/aggregate CAGE-Seq.png" class="center">
   </td>
  </tr>
  <tr>
   <td style="text-align:center;width:33%; vertical-align: text-top;"> 
      <p> <a href="https://dnext.dolphinnext.com/index.php?np=1&id=23" target="_blank" style="font-size: 1.2em;">Integrate-MultiOmics-Tracks</a> </p>
      <img width="300" height="280" src="./images/multi-omics pipeline.png" class="center">
      <br>
   </td>
   <td colspan=2 style="width:66%; vertical-align: text-top; padding: 10px"> 
      <p> The <strong>Integrate-MultiOmics-Tracks</strong> pipeline generates a custom <strong>USCS Track Hub</strong> and auxiliary tables for interrogating RNA, ATAC and CAGE seq reads from following studies: </p>
      <!-- <ol> -->
      <!--    <li>ATAC-seq (Graybuck et. al 2021): Enhancer viruses for combinatorial cell-subclass- specific labeling.</li> -->
      <!--    <li>RNA-seq (Tasic et. al 2018): Shared and distinct transcriptomic cell types across neocortical areas.</li> -->
      <!--    <li>CAGE-seq: FANTOM5 CAGE profiles of human and mouse samples.</li> -->
      <!-- </ol> -->
      <p> <strong>Steps</strong>: </p>
      <ol>
         <li>Collect bigWig files from multiple DolphinNext runs.</li>
         <li>Build a custom UCSC track hub for all tracks of scRNA-Seq, scATAC-Seq and CAGE-Seq reads</li>
         <li>Aggregate counts from scRNA-seq data for each cell type.</li>
      </ol>
   </td>
  </tr>
</tbody>
</table>
<br>