|
io-chess
UCI chess engine
|
The Writers module serialises extracted features and labels into a custom binary .bin format optimised for fast random-access by the PyTorch dataloader. Each record in the file contains:
| Field | Type | Size | Description |
|---|---|---|---|
| Feature planes | float32 | N × 8 × 8 | Dense spatial features |
| Evaluation target | int16 | 2 bytes | Stockfish centipawn evaluation |
| WDL target | float32[3] | 12 bytes | Win/Draw/Loss probabilities |
| Expert label | uint8 | 1 byte | Routing category (0-3) |
Records are fixed-size, enabling O(1) random access via memory-mapped I/O in the dataloader. No per-record headers or delimiters are used.
Raw Stockfish centipawn evaluations span a wide range (±10000+) and contain outliers (forced mates, tablebase scores). The preprocessing pipeline applies two normalisation steps:
The preprocessing binary reads FEN + evaluation CSV lines from stdin and writes packed .bin files to disk:
For very large datasets (100M+ positions), the included split_preprocess_two_drives.sh script splits the input across two output drives to maximise I/O throughput.