Skip to content

hart-lab/drugz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

drugz

DrugZ software from the Hart Lab

DrugZ detects synergistic and suppressor drug-gene interactions in CRISPR screens.

usage: drugz.py [-h] [-i sgRNA_count.txt] [-o drugz-output.txt]  
                [-f drugz-foldchange.txt] -c control samples -x drug samples  
                [-r remove genes] [-p pseudocount] [-I INDEX_COLUMN]  
                [--minobs minObs] [--half_window_size half_window_size] [-q]  
  
-i      	Readcount file, tab-delimited text (input)  
-o      	DrugZ results file, tab-delimited text (output)  
-f      	DrugZ Z-transformed fold change file (optional)  
-c      	Control samples: comma-delimited list of column headers in readcount file  
-x      	Treated samples: comma-delimited list of column headers in readcount file  
-r      	Comma-delimited list of genes to remove before analysis  
-p      	Pseudocount to add to all readcounts; prevents log(0) problems (default=5) 
-I      	Index column (default=0)  
--minobs   	Ignore genes with fewer observations ( gRNA/gene x replicates) (default=1) 
--half_window_size  Size of the first bin and half the size of the inital sample
    (window) to estimate std (default=500) 
-unpaired Unpaired approach: compares mean(treated samples) to mean(control samples) (default=False)

The input file should be a tab-delimited file with the following format:

sgRNA	Gene	T0	T15_A_control	T15_B_control	T15_C_control	T15_A_olaparib	T15_B_olaparib	T15_C_olaparib
A1BG_CACCTTCGAGCTGCTGCGCG	A1BG	313	235	47	337	428	115	340
A1BG_AAGAGCGCCTCGGTCCCAGC	A1BG	99	8	1	13	26	5	28
A1BG_TGGACTTCCAGCTACGGCGC	A1BG	650	336	74	185	392	193	304
A1BG_CACTGGCGCCATCGAGAGCC	A1BG	718	192	34	296	178	69	185
A1BG_GCTCGGGCTTGTCCACAGGA	A1BG	180	230	29	122	394	148	364
A1BG_CAAGAGAAAGACCACGAGCA	A1BG	428	300	158	294	366	184	489
A1CF_CGTGGCTATTTGGCATACAC	A1CF	677	452	74	423	585	446	434
A1CF_GGTATACTCTCCTTGCAGCA	A1CF	138	69	43	109	96	184	127
A1CF_GACATGGTATTGCAGTAGAC	A1CF	396	183	38	106	193	120	198
(etc)

Critically, the "gene" column must be the first non-index column in the file, and the column headers are used on the command line. For example, to execute DrugZ analyzing just the A and B replicates of this file, the command line would be:

drugz.py -i [input_file] -o drugz-output.txt -c T15_A_control,T15_B_control -x T15_A_olaparib,T15_B_olaparib

To save the intermediate gRNA-level raw and normalized fold changes for other analyses, add the -f flag:

drugz.py -i [input_file] -o drugz-output.txt -f drugz-foldchange.txt -c T15_A_control,T15_B_control -x T15_A_olaparib,T15_B_olaparib

To run drugZ for an unpaired approach, add the -unpaired flag:

drugz.py -i [input_file] -o drugz-output.txt -f drugz-foldchange.txt -c T15_A_control,T15_B_control -x T15_A_olaparib,T15_B_olaparib -unpaired

To run drugZ analysis in a jupyter notebook, and save the output as variable:

# define the Arguments class (more convinient since iPython doesn't recognize argparse arguments)
# these are user-specified arguments

# infile = input readcounts matrix
# drugz_out_file = name of a file in which you will write the drugz results
# control_samples = the names of control samples (included in column names)
# drug_samples = the names of drug-treated samples (included in column names)
# unpaire = unpaired approach - compares mean(treated samples) to mean(control samples) 
# pseudocount = counts added to the observed readscounts, default = 5
# half_window_size = size of the first bin and half the size of the inital sample (window) to estimate std, default = 500 (for whole genome screens)

class Args:
    infile = "./sgRNA_count.txt"
    drugz_output_file = "./drugz_results.txt"
    fc_outfile = "./fc_results.txt"
    control_samples = 'T15_A_control,T15_B_control,T15_C_control'
    drug_samples = 'T15_A_olaparib,T15_B_olaparib,T15_C_olaparib'
    remove_genes = 'LacZ,luciferase,EGFR'
    unpaired = False
    pseudocount = 5
    half_window_size = 5 # 5 because of the size of test data set          (sgRNA_count.txt = 9 guides (i.e. rows))
    
drugz_results = dz.drugZ_analysis(Args())

For more option check drugZ_in_jupyter_notebook_tutorial.html