Train fastai models faster with fastxtend’s fused optimizers, Progressive Resizing callback, integrated FFCV DataLoader, and integrated PyTorch Compile support.
Train Models Faster
- Drop in fused optimizers, which are 21 to 293 percent faster then fastai native optimizers.
- Up to 75% optimizer memory savings with integrated bitsandbytes 8-bit optimizers.
- Increase GPU throughput and decrease training time with the Progressive Resizing callback.
- Use the highly optimized FFCV DataLoader, fully integrated with fastai.
- Integrated support for
torch.compile
via the Compile callbacks.
General Features
- Fused implementations of modern optimizers, such as Adan, Lion, & StableAdam.
- Hugging Face Transformers compatibility with fastai
- Flexible metrics which can log on train, valid, or both. Backwards compatible with fastai metrics.
- Easily use multiple losses and log each individual loss on train and valid.
- Multiple profilers for profiling training and identifying bottlenecks.
- A fast Exponential Moving Average callback for smoother training.
Vision
- Apply
MixUp
,CutMix
, or Augmentations at once withCutMixUp
orCutMixUpAugment
. - Additional image augmentations.
- Support for running fastai batch transforms on CPU.
- More attention and pooling modules
- A flexible implementation of fastai’s
XResNet
.
Check out the documentation for additional splitters, callbacks, schedulers, utilities, and more.
https://fastxtend.benjaminwarner.dev
fastxtend is avalible on pypi:
pip install fastxtend
fastxtend can be installed with task-specific dependencies for vision
,
ffcv
, text
, audio
, or all
:
pip install "fastxtend[all]"
To easily install most prerequisites for all fastxtend features, use Conda or Miniconda:
conda create -n fastxtend python=3.11 "pytorch>=2.1" torchvision torchaudio \
pytorch-cuda=12.1 fastai nbdev pkg-config libjpeg-turbo "opencv<4.8" tqdm psutil \
terminaltables numpy "numba>=0.57" librosa timm kornia rich typer wandb \
"transformers>=4.34" "tokenizers>=0.14" "datasets>=2.14" ipykernel ipywidgets \
"matplotlib<3.8" -c pytorch -c nvidia -c fastai -c huggingface -c conda-forge
conda activate fastxtend
pip install "fastxtend[all]"
replacing pytorch-cuda=12.1
with your preferred supported version of
Cuda.
To create an editable development install:
git clone https://github.com/warner-benjamin/fastxtend.git
cd fastxtend
pip install -e ".[dev]"
Like fastai, fastxtend provides safe wildcard imports using python’s
__all__
.
from fastai.vision.all import *
from fastxtend.vision.all import *
from fastxtend.ffcv.all import *
In general, import fastxtend after all fastai imports, as fastxtend modifies fastai. Any method modified by fastxtend is backwards compatible with the original fastai code.
Use a fused ForEach optimizer:
Learner(..., opt_func=adam(foreach=True))
Or a bitsandbytes 8-bit optimizer:
Learner(..., opt_func=adam(eightbit=True))
Speed up image training using Progressive Resizing:
Learner(... cbs=ProgressiveResize())
Log an accuracy metric on the training set as a smoothed metric and validation set like normal:
Learner(..., metrics=[Accuracy(log_metric=LogMetric.Train, metric_type=MetricType.Smooth),
Accuracy()])
Log multiple losses as individual metrics on train and valid:
mloss = MultiLoss(loss_funcs=[nn.MSELoss, nn.L1Loss],
weights=[1, 3.5], loss_names=['mse_loss', 'l1_loss'])
Learner(..., loss_func=mloss, metrics=RMSE(), cbs=MultiLossCallback)
Compile a model with torch.compile
:
from fastxtend.callback import compiler
learn = Learner(...).compile()
Profile a fastai training loop:
from fastxtend.callback import simpleprofiler
learn = Learner(...).profile()
learn.fit_one_cycle(2, 3e-3)
To run the benchmark on your own machine, see the example scripts for details on how to replicate.