Describe the issue as clearly as possible:
The TokenId(216) of the GPT2 Alphabet which have the value "\u011c" has only the byte(28) in its Vec of the Vocabulary.
the byte 28 is '\x1C' so, it's possible there is a bad behavior when the alphabet is loaded.
Steps/code to reproduce the bug:
Expected result:
TokenId(226) = vec![0xC4, 0x9C];
Error message:
Outlines/Python version information:
Version information
Details
```
(command output here)
```
Context for the issue:
No response