Skip to content

Commit

Permalink
udpate hicpro2juicebox for dnase
Browse files Browse the repository at this point in the history
  • Loading branch information
Nicolas Servant committed Mar 30, 2016
1 parent 5061f3b commit a3d1844
Show file tree
Hide file tree
Showing 4 changed files with 48 additions and 25 deletions.
35 changes: 14 additions & 21 deletions bin/utils/hicpro2juicebox.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
## The current script should but '0' if no restriction fragments are available, but juicebox does not recognize it

function usage {
echo -e "usage : hicpro2juicebox -i VALIDPAIRS -g GENOME -j JUICEBOXJAR -r RESFRAG [-t TEMP] [-o OUT] [-h]"
echo -e "usage : hicpro2juicebox -i VALIDPAIRS -g GENOME -j JUICEBOXJAR [-r RESFRAG] [-t TEMP] [-o OUT] [-h]"
echo -e "Use option -h|--help for more information"
}

Expand All @@ -27,7 +27,7 @@ function help {
echo " -i|--input VALIDPAIRS : allValidPairs file generated by HiC-Pro >= 2.7.5"
echo " -g|--genome GENOME : genome information required by Juicebox, must be one of hg18, hg19, hg38, dMel, mm9, mm10, anasPlat1, bTaurus3, canFam3, equCab2, galGal4, Pf3D7, sacCer3, sCerS288c, susScr3, or TAIR10"
echo " -j|--jar JUICEBOXJAR : path to juicebox_clt.jar file"
echo " -r|--resfrag RESFRAG : restriction fragment file used by HiC-Pro"
echo " [-r|--resfrag] RESFRAG : restriction fragment file used by HiC-Pro"
echo " [-t|--temp] TEMP : path to tmp folder. Default is current path"
echo " [-o|--out] OUT : output path. Default is current path"
echo " [-h|--help]: help"
Expand Down Expand Up @@ -91,13 +91,6 @@ if [[ -z $VALIDPAIRS || -z $GENOME || -z $JUICEBOXJAR ]]; then
exit
fi

if [[ -z $RESFRAG ]]; then
echo "hicpro2juicebox is currently only available for restriction fragment Hi-C. Please specify -r option"
usage
exit
fi


## Create output folder
mkdir -p ${TEMP}

Expand All @@ -114,32 +107,32 @@ fi

echo "Generating Juicebox input files ..."

#if [[ ! -z $RESFRAG ]]; then
if [[ ! -z $RESFRAG ]]; then

## The restriction fragment sites file needs to be converted in order to be used in Juicebox command line tool (see attached script).
## They expect one line per chromosome with restriction sites separated by tabs and sorted by coordinate.
awk 'BEGIN{OFS="\t"; prev_chr=""}$1=="chrM"{$1="chrMT"}$1!=prev{print ""; prev=$1; sub(/chr/, "", $1); printf "%s\t", $1; printf "%s\t", $3}$1==prev{printf "%s\t",$3}END{print ""}' $RESFRAG | sed "s/\t\n/\n/" | sed "/^$/d" > ${TEMP}/$$_resfrag.juicebox
awk 'BEGIN{OFS="\t"; prev_chr=""}$1=="chrM"{$1="chrMT"}$1!=prev{print ""; prev=$1; sub(/chr/, "", $1); printf "%s\t", $1; printf "%s\t", $3}$1==prev{printf "%s\t",$3}END{print ""}' $RESFRAG | sed "s/\t\n/\n/" | sed "/^$/d" > ${TEMP}/$$_resfrag.juicebox

## The “pre” command needs the contact map to be sorted by chromosome and grouped so that all reads for one chromosome (let’s say, chr1) appear in the same column.
## Also, chromosomes should not have the ‘chr” substring and the strand is coded as 0 for positive and anything else for negative (in practice, 1).
awk '$2=="chrM"{$2="chrMT"}$5=="chrM"{$5="chrMT"}{$4=$4!="+"; $7=$7!="+"; sub(/chr/, "", $2); sub(/chr/, "", $5); split($9, frag1, "_"); split($10, frag2, "_"); } $2<=$5{print $1, $4, $2, $3, frag1[3], $7, $5, $6, frag2[3], $11, $12 }$5<$2{ print $1, $7, $5, $6, frag2[3], $4, $2, $3, frag1[3], $12, $11 }' $VALIDPAIRS | sort -k3,3d -k7,7d -S 90 > ${TEMP}/$$_allValidPairs.pre_juicebox_sorted
awk '$2=="chrM"{$2="chrMT"}$5=="chrM"{$5="chrMT"}{$4=$4!="+"; $7=$7!="+"; sub(/chr/, "", $2); sub(/chr/, "", $5); split($9, frag1, "_"); split($10, frag2, "_"); } $2<=$5{print $1, $4, $2, $3, frag1[3], $7, $5, $6, frag2[3], $11, $12 }$5<$2{ print $1, $7, $5, $6, frag2[3], $4, $2, $3, frag1[3], $12, $11 }' $VALIDPAIRS | sort -k3,3d -k7,7d -S 90 > ${TEMP}/$$_allValidPairs.pre_juicebox_sorted

#else
# awk '$2=="chrM"{$2="chrMT"}$5=="chrM"{$5="chrMT"}{$4=$4!="+"; $7=$7!="+"; sub(/chr/, "", $2); sub(/chr/, "", $5)} $2<=$5{print $1, $4, $2, $3, 0, $7, $5, $6, 0, $11, $12 }$5<$2{ print $1, $7, $5, $6, "", $4, $2, $3, "", $12, $11 }' $VALIDPAIRS | sort -k3,3d -k7,7d -S 90 > ${TEMP}/$$_allValidPairs.pre_juicebox_sorted
#fi
else
awk '$2=="chrM"{$2="chrMT"}$5=="chrM"{$5="chrMT"}{$4=$4!="+"; $7=$7!="+"; sub(/chr/, "", $2); sub(/chr/, "", $5)} $2<=$5{print $1, $4, $2, $3, 0, $7, $5, $6, 1, $11, $12 }$5<$2{ print $1, $7, $5, $6, 0, $4, $2, $3, 1, $12, $11 }' $VALIDPAIRS | sort -k3,3d -k7,7d -S 90 > ${TEMP}/$$_allValidPairs.pre_juicebox_sorted
fi

echo -e "Running Juicebox ..."
OUTPUTFILE=${OUT}/$(basename $VALIDPAIRS).hic

## This is the command to generate the hic file that can be visualized with Juicebox with restriction fragment information

#if [[ ! -z $RESFRAG ]]; then
java -jar ${JUICEBOXJAR} pre -f ${TEMP}/$$_resfrag.juicebox ${TEMP}/$$_allValidPairs.pre_juicebox_sorted ${OUTPUTFILE} ${GENOME}
#else
# java -jar ${JUICEBOXJAR} pre ${TEMP}/$$_allValidPairs.pre_juicebox_sorted ${OUTPUTFILE} ${GENOME}
#fi
if [[ ! -z $RESFRAG ]]; then
java -jar ${JUICEBOXJAR} pre -f ${TEMP}/$$_resfrag.juicebox ${TEMP}/$$_allValidPairs.pre_juicebox_sorted ${OUTPUTFILE} ${GENOME}
else
java -jar ${JUICEBOXJAR} pre ${TEMP}/$$_allValidPairs.pre_juicebox_sorted ${OUTPUTFILE} ${GENOME}
fi

## Clean
##/bin/rm -f ${TEMP}/$$_resfrag.juicebox ${TEMP}/$$_allValidPairs.pre_juicebox_sorted
/bin/rm -f ${TEMP}/$$_resfrag.juicebox ${TEMP}/$$_allValidPairs.pre_juicebox_sorted

echo "done !"
4 changes: 2 additions & 2 deletions doc/ERRORS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,5 +24,5 @@ HiC-Pro is based on the pysam python library. Old pysam version does not support
Error in Paring of R1 and R2 tags
-------------------------------------------------------------------------------

Reads from the R1 and R2 input files should be have the name.
BAM files generated before the pairing step should be sorted by name, as the pairing is done line by line.
| Reads from the R1 and R2 input files should be have the name.
| BAM files generated before the pairing step should be sorted by name, as the pairing is done line by line.
2 changes: 1 addition & 1 deletion doc/FAQ.rst
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ Here is a small example of how to use HiCPlotter.
## Plot the chrX at 150Kb resolution
python HiCPlotter.py -f hic_results/matrix/sample1/iced/150000/sample1_150000_iced.matrix -o Exemple -r 150000 -tri 1 -bed hic_results/matrix/sample1/raw/150000/sample1_150000_ord.bed -n Test -chr chrX -ptr 1
Since version 2.7.6 HiC-Pro should be compatible with the Juicebox viewer. See the hicpro2juicebox utility to generate Juicebox input file from the list of valid interactions.
Since version 2.7.6 HiC-Pro is compatible with the Juicebox viewer. See the hicpro2juicebox utility to generate Juicebox input file from the list of valid interactions.

How much disk space is require for running HiC-Pro
--------------------------------------------------
Expand Down
32 changes: 31 additions & 1 deletion doc/UTILS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ or **How can I extract SNPs information from phasing data ?**
2- digest_genome.py
3- digest_genome.py
-------------------
or **How can I generate the list of restriction fragments after genome digestion ?**

Expand All @@ -64,4 +64,34 @@ or **How can I generate the list of restriction fragments after genome digestion
4- make_viewpoints.py
---------------------
or **How can I generate a BED profile from a given viewpoints ?**

| The make_viewpoints.py script was initially designed in the context of capture-C data.
| For a given anchor (i.e. capture site), it allows to generate a track file with all interacting fragments.
| Regions around the capture are usually excluded from the profile ('-e' parameter)
.. code-block:: bash
## Generate a viewpoints from capture site
HICPRO_PATH/bin/utils/make_viewpoints -i hicpro_res/hic_results/data/dixon_2M/dixon_2M_allValidPairs -f HICPRO_PATH/data_info/HindIII_resfrag_hg19.bed -t mycapture.bed -e 1000 -d -v > capture.bedgraph
4- hicpro2juicebox.sh
---------------------
or **How can I generate load my HiC-Pro data into Juicebox visualization software ?**

| The hicpro2juicebox.sh utility allows to generate input file for Juicebox.
| It can be use both for restriction fragment Hi-C or Dnase Hi-C by specifying the -f FRAGMENT_FILE option. Note that in this case, it is advice to use the HiC-Pro annotation file, as the fragment name is expected to be HiC_CHROMOSOME_FRAGMENTNUMBER.
| This utility requires HiC-Pro version 2.7.6 or later, and the installation of Juicebox command line tools (https://github.com/theaidenlab/juicebox)

.. code-block:: bash
## Convert HiC-Pro output to Juicebox input
HICPRO_PATH/bin/utils/hicpro2juicebox.sh -i hicpro_res/hic_results/data/dixon_2M/dixon_2M_allValidPairs -g hg19 -j /usr/local/juicebox/juicebox_clt_1.4.jar
## Convert HiC-Pro output to Juicebox input up to restriction fragment resolution
HICPRO_PATH/bin/utils/hicpro2juicebox.sh -i hicpro_res/hic_results/data/dixon_2M/dixon_2M_allValidPairs -g hg19 -j /usr/local/juicebox/juicebox_clt_1.4.jar -f HICPRO_PATH/data_info/HindIII_resfrag_hg19.bed

0 comments on commit a3d1844

Please sign in to comment.