-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.txt
50 lines (29 loc) · 2.09 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
How to run the code :
=====================
— Run assignment2_driver.py [We have used ‘Anaconda’ with ’Spyder’ IDE for the development (python 3.5)]
- User should enter the folder path which contains the folder ’train’.
- Output will be stored in the folder path passed above by the user. We also create 3 folders to store the preprocessed files and baseline files.
Console Output :
================
Input path to the train folder : /Users/Deekshith/Desktop/Cornell/2_NLP/assignment_2/nlp_project2_uncertainty/
will start reading at: /Users/Deekshith/Desktop/Cornell/2_NLP/assignment_2/nlp_project2_uncertainty/train/
If that's right enter yes else no: yes
Total number of words in /Users/Deekshith/Desktop/Cornell/2_NLP/assignment_2/nlp_project2_uncertainty/test-public-baseline1/ : 55758
Total number of sentences in /Users/Deekshith/Desktop/Cornell/2_NLP/assignment_2/nlp_project2_uncertainty/test-public-baseline1/ : 2006
Total number of words in /Users/Deekshith/Desktop/Cornell/2_NLP/assignment_2/nlp_project2_uncertainty/test-private-baseline1/ : 55663
Total number of sentences in /Users/Deekshith/Desktop/Cornell/2_NLP/assignment_2/nlp_project2_uncertainty/test-private-baseline1/ : 2003
>>>
Code Organization :
===================
We have created following modules to do required tasks in this project.
1) assignment2_driver.py
Driver for the entire project. Imports other modules to do required tasks.
Interacts with User and requests for input. Also creates 3 directories to save the output of preprocess and baseline of test-public and test-private folder.
2) file_reader.py
Folder path is passed and reads all the file names and stores them in a list.
3) Preprocess_BIO.py
Replaces instance of CUE* in train folder to sequence of B, I and O.
4) baseline1.py and baseline.py
Generates a weasel dictionary and output baseline files. We have two baseline files and at a time one of the baseline can be chosen to perform the sequence tagging task.
5) kaggle_op.py
Generates the gaggle output based on the baseline files generated for test-public and test-private folders.