This directory is dedicated to the Mimick algorithm itself. Starting with an embedding dictionary and (optionally) a target vocabulary, the tools here will provide you with:
- A model that can be loaded to perform inference on new words downstream; and
- (If needed) an embedding dictionary for the target vocabulary.
For help with any specific script in this directory, run it with --help. This will also describe the parameters.
make_dataset.pyto create a training regimen for the model. Only needs to be called once per input embeddings table.model.pyto train the model, save it, and output embeddings. Default is LSTM, CNN (1 layer) available via--use-cnnparameter.- If needed,
nearest_vecs.pyandinter_nearest_vecs.pycan be used for querying the model for nearest vectors in any embeddings dictionary.inter_is the interactive version.