Vocabulary/ GPT2 : Bad interpretation of tokenId = 216

### Describe the issue as clearly as possible:

The TokenId(216) of the GPT2 Alphabet which have the value "\u011c" has only the byte(28) in its Vec<u8> of the Vocabulary.
the byte 28 is '\x1C' so, it's possible there is a bad behavior when the alphabet is loaded.


### Steps/code to reproduce the bug:

```python
//
```

### Expected result:

```shell
TokenId(226) = vec![0xC4, 0x9C];
```

### Error message:

```shell

```

### Outlines/Python version information:

Version information
<details>
```
(command output here)
```
</details>


### Context for the issue:

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vocabulary/ GPT2 : Bad interpretation of tokenId = 216 #190

Describe the issue as clearly as possible:

Steps/code to reproduce the bug:

Expected result:

Error message:

Outlines/Python version information:

Context for the issue:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Vocabulary/ GPT2 : Bad interpretation of tokenId = 216 #190

Description

Describe the issue as clearly as possible:

Steps/code to reproduce the bug:

Expected result:

Error message:

Outlines/Python version information:

Context for the issue:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions