Here we give examples of how to launch training runs similar to those in our papers. The examples should give you a sense of how to launch your own training runs or fine-tune our models.
For warmstarting, you will need to download our models and training data hosted on Google Drive. You may find gdown useful for downloading files from Google Drive.
The victims listed in the curriculum files need to be available at
/shared/victims and are mostly publicly available KataGo
networks.
Use the commands in this repo's root README without any warmstarting to launch a training run similar to the original cyclic adversary.
Besides using the configs/examples/pass-adversary-curriculum.json curriculum
config, the curriculum should omit the -harden-below-visits flag for the
curriculum. All the other adversary training runs use -harden-below-visits to
avoid degenerating into a pass attack.
attack-may23 is the fine-tuned adversary from "Adversarial Policies Beat
Superhuman Go AIs".
Here we need to warmstart from the original cyclic adversary:
- From Google Drive, download the
cyclic-adversary-s545mmodel directory and thetraining-data/cyclic-adversary.tar.gztraining data. - Untar the training data.
- Pass the flag
--initial-weights <path to models>/cyclic-adversary-s545mto the train script. - Pass the flag
--preseed <path to training data>/cyclic-adversaryto the shuffle-and-export script. (In our originalattack-may23run we did not use data pre-seeding, but in later training runs we found that pre-seeding made training more stable.) - Pass the flag
--warmstartto the victim-play script.
Use the configs/examples/attack-may23-curriculum.json curriculum config.
This is the general recipe for fine-tuning: warmstart from the starting adversary using the aforementioned flags and change the curriculum to point at a different victim.
The continuous adversary needs to be warmstarted from attack-may23:
- Download the
attack-may23model, thetraining-data/attack-may23.tar.gztraining data, and thetraining-data/cyclic-adversary.tar.gztraining data. - Untar both training data tarballs.
- Since
attack-may23itself was warmstarted fromcyclic-adversary, symlinkcyclic-adversaryintoattack-may23asln -s <path to training data>/cyclic-adversary <path to training data>/attack-may23/prev-selfplay. - Pass the flag
--initial-weights <path to models>/attack-may23-s168mto the train script. - Pass the flag
--preseed <path to training data>/attack-may23to the shuffle-and-export script. - Pass the flag
--warmstartto the victim-play script.
For the curriculum, we started with a curriculum similar to
configs/examples/continuous-adversary-curriculum.json that only lists one
victim model with increasing victim visits. Then we periodically checked the
KataGo website for new networks and
manually updated the curriculum.
E.g., suppose the curriculum has currently progressed to 512 victim visits
against kata1-b18c384nbt-s8526915840-d3929217702.bin.gz, and we see on the
KataGo website that a newer network kata1-b18c384nbt-s8617907712-d3952620469
is available. Then we download the new network into /shared/victims, replace
all instances of kata1-b18c384nbt-s8526915840-d3929217702 with
kata1-b18c384nbt-s8617907712-d3952620469 in the curriculum config, remove all
lower visit counts <512 from the curriculum config, and relaunch the curriculum
with the new config.
The gift adversary is warmstarted from an early checkpoint of the original cyclic adversary:
- From Google Drive, download the
cyclic-adversary-early-s227mmodel directory and thetraining-data/cyclic-adversary.tar.gztraining data. - Untar the training data.
- Pass the flag
--initial-weights <path to models>/cyclic-adversary-early-s227mto the train script. - Pass the flag
--preseed <path to training data>/cyclic-adversary/t0-s227013120-d57265857to the shuffle-and-export script. - Pass the flag
--warmstartto the victim-play script.
Use the configs/examples/gift-adversary-curriculum.json curriculum config. You
will also need to pass the flag --config /go_attack/configs/gift-adversary-experiment.cfg before the positional
arguments to the victim-play script. Moreover, when running the resulting
adversary (including with kubernetes/evaluate-loop.sh), you will need to set
the KataGo config parameters forceAllowNoResultPredictions = true and
noResultUtility = -1.6 for the adversary.
The big adversary is warmstarted from attack-may23, so follow the same
instructions as for continuous-adversary, but use the
configs/examples/big-adversary-curriculum.json curriculum config.
The model is warmstarted from the KataGo network
kata1-b40c256-s11840935168-d2898845681.
- Download both the network file and the TF weights from the KataGo website.
- Move the network file to
/shared/victims. - Untar the TF weights.
- Pass the flag
--initial-weights <TF weights path>to the train script. Reduce the learning rate by changing the last positional argument from1.0to0.1. - Download the network's training
data. Instead of
downloading all the data, you can download the data from 2022-04-25 to
2022-06-24 (the model's release data) inclusive. Untar all of the tarballs and
place them in some directory. Then point the
--preseedshuffle-and-export flag to the directory, and add a flag-add-to-window 2852985812after the positional arguments to tell the shuffler that 2.85 billion data rows have been omitted. Downloading 2022-04-25 to 2022-06-24 data is enough to populate the shuffler's sliding window. - Pass the flags
--warmstart --config /go_attack/configs/iterated-training/alternate-experiment.cfgto the victim-play script. - Use the curriculum config
/go_attack/configs/iterated-training/alternate-curriculum.json. - If using
evaluate-loop.sh, pass it the flag--config /go_attack/configs/iterated-training/alternate-match-1gpu.cfg.
Warmstart from the cyclic adversary (see attack-may23 instructions). After
symlinking the final model of the first defense iteration into /shared/victims
where the curriculum script looks for victim models, use a curriculum config
that points towards the final model from the first defense iteration (e.g., copy
configs/examples/continuous-adversary-curriculum.json but replace the victim
model name with the defense iteration's model).
On subsequent [defense/attack] iterations, update the curriculum to point at the latest [attack/defense] iteration's model and either warmstart (use the warmstart flags and give a new run name to create a new experiment directory for the subsequent iteration) from or resume (use the same launch commands as the latest iteration to continue using the same experiment run directory) the previous [defense/attack] iteration.
(Although kubernetes/iterated-training contains experimental scripts for
automating switching between attack and defense iterations, for our published
experiments we switched iterations manually and did not use those scripts.)
Like the gift-adversary, the atari adversary should be warmstarted from the early cyclic adversary checkpoint using the same instructions.
Use the configs/examples/atari-adversary-curriculum.json curriculum config.
The victim v9 is available on Google Drive.
Warmstart from the original cyclic adversary (see attack-may23 instructions),
and use the configs/examples/stall-adversary-curriculum.json curriculum
config.
The ViT model is trained with self-play rather than victim-play, so
- do not launch the curriculum.
- launch self-play workers by passing the
--selfplayflag to the victim-play script. (The script still saves the experiment in/shared/victimplaybecause other scripts inkubernetes/assume experiments are located in/shared/victimplay.) - pass a self-play config (e.g., any config in
engines/KataGo-custom/cpp/configs/training/selfplay*.cfg) with the--configflag to the victim-play script. (In our ViT training run, we modified the configs to disable rule variation and rectangular boards, but this is not necessary.)
kubernetes/evaluate-loop.sh also will not work since it reads from a directory
created by the curriculum, though it is straightforward to modify it to evaluate
trained models against some fixed KataGo network instead of against curriculum
victim models.
The ViT model is trained and exported as a PyTorch model, so
- training needs the extra flag
--use-pytorchbefore its positional arguments. You will also need to specify the model architecture with the flag--model-kind vitp2b4c384(vitp2b4c384is a 4-block ViT; seeengines/KataGo-custom/python/modelconfigs.pyto see available model kinds or to define your own.) - shuffle-and-export needs the extra flags
--use-pytorch --use-torchscript. - the bot will need to have KataGo config params
useNHWC = falseandinputsUseNHWC = false.
If you first train a smaller ViT and then want to use its generated data to pre-train a larger ViT, then
- launch only shuffle-and-export and the training process. To speed up training,
you may want to train with multiple GPUs by appending the flag
-multi-gpus <GPU indices, e.g., "0,1,2,3">to the training script. - slowly copy the smaller ViT's training data into the larger ViT's directory,
using a script like
vit-link-pretraining.sh. - only launch the self-play workers after the pre-training has finished.
If the ViT training is overfitting, try increasing the shuffler sliding window
size by adding the flag -min-rows <some number, e.g., 10000000> after the
positional arguments to the shuffle-and-export script.
Warmstart from the cyclic adversary (see attack-may23 instructions). Use the
vit-adversary-curriculum.json curriculum config. The victim will need the
KataGo config params useNHWC = false and inputsUseNHWC = false since it is a
Torchscript model.