dialogue-transformer-e2e

This is a repository for End-to-end Dialogue Transformer project for Statistical Dialogue Systems course.

Goals

Improve sequicity comments
~~Use PyTorch's nn.transformer to implement Sequicity style dialogue system~~
- Try to run Sequicity as is - this should be quite easy.
- ~~Rewrite classes SimpleDynamicEncoder, BSpanDecoder, and ResponseDecoder from tsd_net.py to use transformer instead of RNNs. This will probably involve also adjusting TSD class.~~
Compare it with existing dialogue systems (probably Sequicity, mainly)
~~Improve performance by utilizing pre-trained LM.~~
Implement it in tensorflow

Results

We evaluated out system on the CamRest676 dataset.

System	F1 success	BLEU
Transformer	0.770	0.327
Transformer \ copynet	0.710	0.315
Sequicity	0.854	0.253

We have shown that transformer with copy mechanism comparable performance with Sequicity. We believe the system could be improved by utilizing a pre-trained language model (BERT, GPT-{2|3}, MASS, XLNet, ...)

Although the the success F1 score did not superseded our baseline, our model has BLEU score of responses 7.4% larger than Sequicity. We think that the worse performance of Transformer, compared to recurrent neural networks may be caused by the small amount of data we have, relatively low batch size and generally lower stability of training (Training Tips for the Transfomer Model).

Papers

Papers related to this work

Sequicity
Incorporating Copying Mechanism in Sequence-to-Sequence Learning - the copy mechanism referenced from Sequicity, quite an interesting paper
Attention Is All You Need - the transformer architecture
Hello, It's GPT-2
ALBERT: A Lite BERT - IHMO (ondrej) the methods described in this paper might be easier to use with limited computational resources compared to other pretrained transformers (BERT, GPT-2, XLNet, Transformer-XL, ...)
Training Tips for the Transfomer Model - A nice paper form UFAL about practical tips for training transformer, might be useful
On Layer Normalization in the Transformer Architecture - They stabilize the training by placing layer normalization inside the residual block and before the multi-head attention (Pre-LN). Therefore they can remove warm-up and use a larger learning rate.

Acknowledgements

The transformer is the official Tensorflow implementation.
Sequicity implementation from the authors' repository

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
implementation		implementation
notebooks		notebooks
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dialogue-transformer-e2e

Goals

Results

Papers

Acknowledgements

About

Releases

Packages

Contributors 5

Languages

pixelneo/dialogue-transformer-e2e

Folders and files

Latest commit

History

Repository files navigation

dialogue-transformer-e2e

Goals

Results

Papers

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages