The Training module is a PyTorch pipeline that trains the Factorized Mixture-of-Experts neural network from the binary datasets produced by the Preprocessing module. It uses memory-mapped dataset loaders so that hundreds of millions of positions can be trained on with minimal RAM overhead.

Training is organised into three carefully designed phases that prevent expert collapse and ensure each expert develops genuine specialisation for its assigned position type.

Detailed documentation:

Multi-Phase Training Curriculum — Three-phase curriculum for expert specialisation
Training Infrastructure — Dataset loading, weight export, and experiment tracking

Source Files

File	Purpose
train.py	Main training loop with multi-phase curriculum
model.py	Network architecture — Factorized MoE with 1×1 mixer and expert bodies
dataset.py	Memory-mapped binary dataset with chunked random sampling
loss.py	Custom loss functions: WDL cross-entropy, eval MSE
export.py	Converts PyTorch state dict to the engine's flat binary weight format
train.sh	End-to-end shell script that runs training phases sequentially
utils.py	Logging, colour-coded console output, and training utilities

Quick Start

# Run the training pipeline end-to-end
cd training
bash train.sh
 
# Or run individual phases manually
python train.py --phase 1 --data_dirs /data/chess
python train.py --phase 2 --data_dirs /data/chess --checkpoint checkpoints/model_base_phase1_final.pt
python train.py --phase 4 --data_dirs /data/chess --checkpoint checkpoints/model_experts_phase2.pt
 
# Export trained weights for the C++ engine
python export.py --checkpoint checkpoints/model_finetune_phase4_final.pt --output weights.bin