Skip to content

Latest commit

 

History

History
104 lines (98 loc) · 2.9 KB

TRAIN.md

File metadata and controls

104 lines (98 loc) · 2.9 KB

Training

To train minGPT on direct dataset with single-node training, run the following on 1 node with 4 GPUs:

torchrun --nproc_per_node=4 train.py\
 --file ${DATA_DIR}\
 --folder ${TASK}\
 --output_dir ${OUTPUT_DIR}\
 --maxlen ${MAXLEN}\
 --maxdata ${MAXDATA}\
 --vocab ${VOCAB_SIZE}\
 --num_range ${NUM_RANGE}\
 --weight_decay 0.01\
 --learning_rate 1e-4\
 --drop 0.1\
 --batch_size 128\
 --epoch 100\
 --warmup 5\
 --dmodel 256\
 --head 4\
 --num_layer ${LAYER}
  • Here folder should be one of the followings: arithmetic, equation, LIS or ED.
  • maxdata always equals to maxlen in direct dataset.
  • The effective batch size is 128 (batch_size per gpu) * 4 (gpus per node) = 512.

The numbers of MAXLEN, VOCAB_SIZE and NUM_RANGE are listed in the following table.

Dataset MAXLEN VOCAB_SIZE NUM_RANGE
Arithmetic_len4 19 21 11
Arithmetic_len5 23 21 11
Arithmetic_len6 27 21 11
Equation_len3 46 22 11
Equation_len4 73 22 11
Equation_len5 106 22 11
LIS_len50 54 253 250
LIS_len80 84 253 250
LIS_len100 104 253 250
ED_len12 31 91 60
ED_len16 39 91 60
ED_len20 47 91 60

To train minGPT on CoT dataset, run the following on 1 node with 4 GPUs:

torchrun --nproc_per_node=4 train.py\
 --file ${DATA_DIR}\
 --folder ${TASK}\
 --output_dir ${OUTPUT_DIR}\
 --maxlen ${MAXLEN}\
 --maxdata ${MAXDATA}\
 --vocab ${VOCAB_SIZE}\
 --num_range ${NUM_RANGE}\
 --weight_decay 0.01\
 --learning_rate 1e-4\
 --drop 0.1\
 --batch_size 128\
 --epoch 100\
 --warmup 5\
 --dmodel 256\
 --head 4\
 --num_layer 3\
 --chain
  • maxdata always equals to maxlen in CoT dataset.

The numbers of MAXLEN, VOCAB_SIZE and NUM_RANGE are listed in the following table.

Dataset MAXLEN VOCAB_SIZE NUM_RANGE
Arithmetic_len4 43 21 11
Arithmetic_len5 63 21 11
Arithmetic_len6 87 21 11
Equation_len3 91 22 11
Equation_len4 181 22 11
Equation_len5 316 22 11
LIS_len50 105 253 250
LIS_len80 165 253 250
LIS_len100 205 253 250
ED_len12 212 91 60
ED_len16 344 91 60
ED_len20 508 91 60

To train minGPT on CoT dataset with alibi relative positional embeddings, run the following on 1 node with 4 GPUs:

torchrun --nproc_per_node=4 train.py\
 --file ${DATA_DIR}\
 --folder arithmetic\
 --output_dir ${OUTPUT_DIR}\
 --maxlen 1000\
 --maxdata 435\
 --vocab 21\
 --num_range 11\
 --weight_decay 0.01\
 --learning_rate 1e-4\
 --drop 0.1\
 --batch_size 128\
 --epoch 100\
 --warmup 5\
 --dmodel 256\
 --head 4\
 --num_layer 3\
 --chain\
 --rpe