v0.5.0: GPT Neo + misc fixes
aitextgen has been updated to support GPT Neo and fix a few outstanding generation issues! However, in the process there are a few breaking changes.
Breaking Changes
Loading Models
While making model-loading architecture-agnostic for GPT Neo support, it turns out aitextgen was loading models in an unofficial way, so this has now been addressed. The user must now specify the model_folder
where the pytorch_model.bin
and config.json
are located (with those exact filenames).
Assuming the model is located in trained_folder
:
Old :
ai2 = aitextgen(model="trained_model/pytorch_model.bin",
tokenizer_file="aitextgen.tokenizer.json",
config="trained_model/config.json")
New:
ai2 = aitextgen(model_folder="trained_model",
tokenizer_file="aitextgen.tokenizer.json")
All notebooks and documentation have been updated with this new workflow, and an assert will be raised of the old behavior is still used.
Incorrect tokenization for Colab-trained GPT-2 tokenizers.
There was an underlying issue due to a recent change in tokenizers
which broke the implementation of the default GPT-2 tokenizer by preventing it from tokenizing <|endoftext|>
tokens correctly. As a result, this broke the truncation
Only the case where the Colab GPT-2 Notebook was used for training line-by-line texts were affected by this; unfortunately the only fix now is to retrain the model with v0.5.0
Other Major Changes/Fixes
GPT Neo support
GPT Neo is now supported! The Colab Notebook was updated to indicate how to finetune the smaller versions of the model.
Out of the box, all variants of GPT-Neo have a 2048 context window (versus GPT-2’s 1024 context length) allowing double the generation length, and the pretrained models are trained on much more recent data. Finetuning a GPT Neo model takes about 2x as long per step as a GPT-2 model: notable as normally increasing the context window causes training to scale quadraticly instead of linearly, and does appear to converge faster.
However, text-generation performance-wise, it’s currently unclear whether GPT-Neo is “better”, especially on short-form content. Future releases of aitextgen will analyze this more closely.
DeepSpeed support [BETA] (#103)
Thanks to the team at pytorch-lightning, DeepSpeed support has been added for aitextgen, allowing training of larger models (>1.5B params) with multi-GPUs. However, this isn’t fully tested, so more documentation is pending!
Misc changes
-
Added a
nonempty_output
param togenerate()
, default True: If the output is empty (possible on shortform content), skip it if generating multiple texts, or try again if it's a single text. Ifmin_length
is specified, the same behavior occurs for texts below the minimum length after processing. -
Bumped minimum versions of
transformers
andpytorch-lightning
. -
Completed another pass of notebooks and documentation.
-
Forced single-GPU training on Windows to avoid bugs (#116)
-
Calling the aitextgen instance will now print the model type and number of params to the console, helpful for debugging.