|
io-chess
UCI chess engine
|
The Data Pipeline provides utilities for acquiring, filtering, and shuffling the chess evaluation dataset used by the Preprocessing and Training modules. All tools are designed for out-of-core operation and can handle datasets that exceed available RAM.
The data flow proceeds in two stages:
The project uses the mateuszgrzyb/lichess-stockfish-normalized dataset hosted on HuggingFace. This dataset contains millions of chess positions from Lichess games, each annotated with a Stockfish evaluation at varying search depths.
| Field | Type | Description |
|---|---|---|
| fen | string | Board position in Forsyth-Edwards Notation |
| cp | int / null | Stockfish centipawn evaluation (null if mate) |
| mate | int / null | Mate-in-N score (null if not a forced mate) |
| depth | int | Stockfish search depth used for the evaluation |
The dataset is loaded via the HuggingFace datasets library using memory-mapped access, meaning the full dataset is never loaded into RAM at once.