Skip to content

Commit

Permalink
Merge branch 'release-0.14.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
yannickspill committed Nov 9, 2018
2 parents f00fb4b + 0845e23 commit 13c7041
Show file tree
Hide file tree
Showing 274 changed files with 97 additions and 1,536 deletions.
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
*.tsv.gz filter=lfs diff=lfs merge=lfs -text
*.tsv filter=lfs diff=lfs merge=lfs -text
*.dat.gz filter=lfs diff=lfs merge=lfs -text
*.RData filter=lfs diff=lfs merge=lfs -text
10 changes: 9 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,13 @@ All notable changes to *binless* will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
for versions 0.x of binless, minor releases might break backwards compatibility.

## [0.14.0]
### Changed
- Tutorials now use the smaller SEMA3C dataset, which can be run quickly on a
laptop
### Removed
- Old article/ folder

## [0.13.0]
### Added
- use 64 bit integers in gfl_graph_fl to expand the index limit in lasso.
Expand Down Expand Up @@ -121,7 +128,8 @@ for versions 0.x of binless, minor releases might break backwards compatibility.
- Initial commit


[0.13.0]: ../../compare/v0.12.0...HEAD
[0.14.0]: ../../compare/v0.13.0...HEAD
[0.13.0]: ../../compare/v0.12.0...v0.13.0
[0.12.0]: ../../compare/v0.11.0...v0.12.0
[0.11.0]: ../../compare/v0.10.2...v0.11.0
[0.10.2]: ../../compare/v0.10.1...v0.10.2
Expand Down
21 changes: 12 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ install.packages("devtools")
devtools::install_github("3DGenomes/binless",subdir="binless")
```

Installation should take about 10 minutes.

#### Manual installation
You can also install it manually as follows:

Expand All @@ -29,11 +31,16 @@ Binless uses the following packages: `data.table`, `Hmisc`, `foreach`,
`doParallel`, `MASS`, `matrixStats`, `ggplot2`, `dplyr`, `Matrix`, `quadprog`,
`scales`, `utils`

Binless has been developed and tested on a MacBook Pro (2015) and on a CentOS 7
linux workstation with 128Gb RAM and 32 cores. Resource usage can go from modest
(fast binless will run on a laptop for loci <10Mb) to huge (fast binless on
human chromosome 1 at 5kb base resolution requires about 500Gb of RAM).

### How does it work?

In the `example/` folder, we provide plots and files to perform a normalization,
taken from publicly available data (Rao *et al.*, 2014). Alternatively you can
use your own data. Start with something not too large, for example 2Mb. If you
use your own data. Start with something not too large, for example 1Mb. If you
want a quick and dirty overview, skip to the *Fast binless* section. Otherwise,
read on.

Expand All @@ -55,16 +62,15 @@ on the `CSnorm` object you built at the previous step. Once normalized, datasets
can be combined, and signal and difference detection can be performed. **This
is the full-blown version of the algorithm, with statistically
significant output**. Note that this is a beta version, so check for updates
frequently.
regularly.

### Fast binless

See `fast_binless.R`. Here, we implemented a fast approximation with fixed
fusion penalty and an approximate decay. You can either use a `CSnorm` object
fusion penalty and approximate decay and bias terms. You can either use a `CSnorm` object
produced at the preprocessing stage, or directly provide the binned raw matrix.
**This is a fast and approximate version of the full algorithm, so you will not
get statistically significant output**, and it might not look as *smooth* as the
full-blown algorithm. But you can try out a whole chromosome ;)
get statistically significant output**. But you can try out a whole chromosome ;)

### Base-resolution (arrow) plots

Expand Down Expand Up @@ -108,7 +114,7 @@ the following columns
1. `re.up2`
1. `re.dn2`

Binned raw matrix (used for fast binless): tab or space-separated text file
Binned raw matrix (used for fast binless): tab, comma or space-separated text file
containing multiple datasets. The first line is a header that must start with
`"name" "bin1" "pos1" "bin2" "pos2" "distance" "observed" "nobs"`. Optionally, more columns
can be added but make sure their column names are different.
Expand All @@ -126,6 +132,3 @@ provide them as integers starting at 1 (i.e. use 1 for the first dataset, 2 for
Also, **you must have pos2 >= pos1, and the data must be sorted by name, pos1 and pos2, in that order**.





10 changes: 9 additions & 1 deletion arrow_plot.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,14 @@ library(binless)
#if you use a Mac, use gzcat instead of zcat, or provide the path to an uncompressed file
#refer to README.md for a description of the tsv file format
#Use nrows optional argument if you only want to read parts of the file
data=read_tsv("zcat example/GM12878_MboI_HICall_FOXP1ext.tsv.gz")
data=read_tsv("zcat example/GM12878_MboI_HICall_SEMA3C.tsv.gz")

#plot the whole region at 10kb resolution
plot_binned(data, resolution=10000, b1=data[,min(rbegin1)], e1=data[,max(rend2)])

#plot the whole region at 5kb resolution
plot_binned(data, resolution=5000, b1=data[,min(rbegin1)], e1=data[,max(rend2)])

#arrow plots need a category column. we can add a dummy one
data[,category:="NA"]
#plot a 20kb subset of it with base resolution (arrow plot)
Expand All @@ -41,3 +44,8 @@ data = categorize_by_new_type(data, dangling.L = c(0), dangling.R = c(3), maxlen
#plot the same region as before, with the new colours
plot_raw(data, b1=data[,min(rbegin1)+50000], e1=data[,min(rbegin1)+70000])

#plot a region that's further away from the diagonal
plot_raw(data, b1=data[,min(rbegin1)+120000], e1=data[,min(rbegin1)+130000],
b2=data[,min(rbegin1)+900000], e2=data[,min(rbegin1)+910000])


Loading

0 comments on commit 13c7041

Please sign in to comment.