This is an implementation of Encoder-Decoder LSTM with Bahdanau Attention to automatic question generation task. SQuAD is used in training and testing.
First, download SQuAd from here. Then, download GloVe from here here
Code consists of several modules. You can type the folowing command to learn about the parameters of the scripts:
python3 modules/{filename}.py --help
You have these modules in the following order:
Preprocessing the dataset:
python3 modules/squad.py --input path-to-squad-json-file --out desired-output-path --out_format pkl_or_csv
Building the tokenizers:
python3 modules/tokenizer.py --input path-to-preprocessed-squad-file
You can customize the tokenizing process. Refer to the help command to know about the parameters you can play with.
Training:
python3 modules/train.py --train_config path-to-train-config-file
You can define the training parameters (batch size, learning rate etc.) by modifying the train_config.json file.
Inference:
python3 modules/inference.py --input path-to-tokenized-test-data
You can decide the decoding method and other things by defining the parameters as well.