forked from avilella/phageterm
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.txt
executable file
·139 lines (97 loc) · 4.56 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
PROGRAM
=======
PhageTerm.py - run as command line in a shell
UPDATES
=======
Bug fix:
When the repeat region (of DTR phages) wraps around the reference contig ends, the first and last bases of the repeat region were missing in the reported sequence.
Bug fixed thanks to Matthew Lueder (Naval Medical Research Center-Frederick).
Bug fix: cohesive sequence of COS 3' phages are now correctly determined.
Thanks to Dr Wei Shen.
VERSION
=======
Version 1.0.12
INTRODUCTION
============
PhageTerm software is a tool to determine phage termini and packaging mode
from high throughput sequences that rely on the random fragmentation of DNA (e.g.
Illumina TruSeq). Phage sequencing reads from a fastq file are aligned to the phage
reference genome in order to calculate two types of coverage values (whole genome coverage
and the starting position coverage). The starting position coverage is used to perform a
detailed termini analysis. If the user provides the host sequence, reads that does not
match the phage genome are tested on the host using the same mapping function.
The PhageTerm program and information is available at https://sourceforge.net/projects/phageterm/
A Galaxy wrapper version is also available at https://galaxy.pasteur.fr
PREREQUISITES
=============
Unix/Linux
- Python 2.7
- matplotlib 2.0.2
- numpy 1.11
- pandas 0.19.1
- scikit-learn 0.18.1
- scipy 0.19.0
- statsmodels 0.8.0
- reportlab 3.4.0
COMMAND LINE
============
./PhageTerm.py -f reads.fastq -r phage_sequence.fasta [-n phage_name -p reads_paired
-s seed_lenght -d surrounding -t installation_test -c nbr_core -g host.fasta
(warning increase process time)]
Help:
./PhageTerm.py -h
./PhageTerm.py --help
Options:
Raw reads file in fastq format:
-f INPUT_FILE, --fastq=INPUT_FILE
Fastq reads
(NGS sequences from random fragmentation DNA only,
e.g. Illumina TruSeq)
Raw reads file in fastq format:
-p INPUT_FILE, --paired=INPUT_FILE
Paired fastq reads
(NGS sequences from random fragmentation DNA only,
e.g. Illumina TruSeq)
Phage genome in fasta format:
-r INPUT_FILE, --ref=INPUT_FILE
Reference phage genome as unique contig in fasta format
Name of the phage being analyzed by the user:
-n PHAGE_NAME, --phagename=PHAGE_NAME
Manually enter the name of the phage being analyzed.
Used as prefix for output files.
Lenght of the seed used for reads in the mapping process:
-s SEED_LENGHT, --seed=SEED_LENGHT
Manually enter the lenght of the seed used for reads
in the mapping process (Default: 20).
Lenght of the seed used for reads in the mapping process:
-d SUROUNDING_LENGHT, --surrounding=SUROUNDING_LENGHT
Manually enter the lenght of the surrounding used to
merge close peaks in the analysis process (Default: 20).
Host genome in fasta format:
-g INPUT_FILE, --host=INPUT_FILE
Reference host genome as unique contig in fasta format
Warning: increase drastically process time
Core processor number to use:
-c CORE_NBR, --core=CORE_NBR
Number of core processor to use (Default: 1).
Define phage mean coverage:
-m MEAN_NBR, --mean=MEAN_NBR
Phage mean coverage to use (Default: 250).
Software run test:
-t TEST_VALUE, --test=TEST_VALUE
TEST_VALUE=C5 : Test run for a 5' cohesive end (e.g. Lambda)
TEST_VALUE=C3 : Test run for a 3' cohesive end (e.g. HK97)
TEST_VALUE=DS : Test run for a short Direct Terminal Repeats end (e.g. T7)
TEST_VALUE=DL : Test run for a long Direct Terminal Repeats end (e.g. T5)
TEST_VALUE=H : Test run for a Headful packaging (e.g. P1)
TEST_VALUE=M : Test run for a Mu-like packaging (e.g. Mu)
OUTPUT FILES
==========
(i) Report (.pdf)
(ii) Statistical table (.csv)
(iii) Sequence files (.fasta)
CONTACT
=======
Julian Garneau <[email protected]>
Marc Monot <[email protected]>
David Bikard <[email protected]>