-
Notifications
You must be signed in to change notification settings - Fork 54
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the issue as clearly as possible:
In #90 an error was fixed regarding the Salamandra and OpenCoder tokenizers. The same error has now returned, and I see that the fix from that PR is nowhere to be seen in the code base anymore. Was it replaced by something else?
Steps/code to reproduce the bug:
from outlines_core.fsm.regex import reduced_vocabulary
from outlines.models.vllm import adapt_tokenizer
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("BSC-LT/salamandra-2b")
tokenizer = adapt_tokenizer(tokenizer)
vocabulary = reduced_vocabulary(tokenizer)Expected result:
No error message.Error message:
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/alex-admin/euroeval/.venv/bin/euroeval", line 8, in <module>
[rank0]: sys.exit(benchmark())
[rank0]: ^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/click/core.py", line 1161, in __call__
[rank0]: return self.main(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/click/core.py", line 1082, in main
[rank0]: rv = self.invoke(ctx)
[rank0]: ^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/click/core.py", line 1443, in invoke
[rank0]: return ctx.invoke(self.callback, **ctx.params)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/click/core.py", line 788, in invoke
[rank0]: return __callback(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/euroeval/cli.py", line 277, in benchmark
[rank0]: benchmarker.benchmark(model=models)
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/euroeval/benchmarker.py", line 461, in benchmark
[rank0]: benchmark_output_or_err = self._benchmark_single(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/euroeval/benchmarker.py", line 768, in _benchmark_single
[rank0]: scores = generate(
[rank0]: ^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/euroeval/generation.py", line 84, in generate
[rank0]: test_scores = generate_single_iteration(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/euroeval/generation.py", line 163, in generate_single_iteration
[rank0]: model_output = model.generate(inputs=batch)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/euroeval/benchmark_modules/vllm.py", line 361, in generate
[rank0]: logits_processor = JSONLogitsProcessor(
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/outlines/processors/structured.py", line 187, in __init__
[rank0]: super().__init__(regex_string=regex_string, tokenizer=tokenizer)
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/outlines/processors/structured.py", line 151, in __init__
[rank0]: guide = RegexGuide.from_regex(regex_string, tokenizer)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/outlines/fsm/guide.py", line 92, in from_regex
[rank0]: return super().from_regex(
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/outlines_core/fsm/guide.py", line 212, in from_regex
[rank0]: ) = _create_states_mapping(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/outlines/fsm/guide.py", line 76, in cached_create_states_mapping
[rank0]: return uncached_create_states_mapping(regex_string, tokenizer, *args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/outlines_core/fsm/guide.py", line 141, in create_states_mapping
[rank0]: return create_states_mapping_from_fsm(regex_fsm, tokenizer, frozen_tokens)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/outlines_core/fsm/guide.py", line 178, in create_states_mapping_from_fsm
[rank0]: states_to_token_maps, empty_token_ids = create_fsm_index_tokenizer(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/outlines_core/fsm/regex.py", line 473, in create_fsm_index_tokenizer
[rank0]: tokens_to_token_ids, empty_token_ids = reduced_vocabulary(tokenizer)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/alex-admin/euroeval/.venv/lib/python3.12/site-packages/outlines_core/fsm/regex.py", line 426, in reduced_vocabulary
[rank0]: raise RuntimeError(
[rank0]: RuntimeError: Cannot convert token `�?` (217017) to bytes: �?Outlines/Python version information:
Outlines version: 0.2.3
Python version: 3.12.3
Packages installed:
Details
absl-py==2.2.2 accelerate==1.4.0 aiofiles==23.2.1 aiohappyeyeballs==2.4.8 aiohttp==3.11.13 aiosignal==1.3.2 airportsdata==20250224 annotated-types==0.7.0 anyio==4.8.0 astor==0.8.1 attrs==25.1.0 bert-score==0.3.13 bitsandbytes==0.45.3 blake3==1.0.4 cachetools==5.5.2 certifi==2025.1.31 charset-normalizer==3.4.1 chex==0.1.89 08:29:11 [146/1972] click==8.1.8 cloudpickle==3.1.1 compressed-tensors==0.9.3 contourpy==1.3.1 cupy-cuda12x==13.4.1 cycler==0.12.1 datasets==3.5.0 demjson3==3.0.6 Deprecated==1.2.18 depyf==0.18.0 dill==0.3.8 diskcache==5.6.3 distro==1.9.0 dnspython==2.7.0 einops==0.8.1 email_validator==2.2.0 etils==1.12.2 EuroEval @ git+https://github.com/EuroEval/EuroEval@6db11af4ee14cb832065f312d78266b5c1b46a26 evaluate==0.4.3 fastapi==0.115.11 fastapi-cli==0.0.7 fastrlock==0.8.3 fbgemm_gpu==1.1.0 ffmpy==0.5.0 filelock==3.17.0 flash_attn==2.7.4.post1 flax==0.10.4 fonttools==4.56.0 frozenlist==1.5.0 fsspec==2024.12.0 genson==1.3.0 gguf==0.16.0 googleapis-common-protos==1.70.0 gradio==5.20.0 gradio_client==1.7.2 groovy==0.1.2 grpcio==1.71.0 h11==0.14.0 hf-xet==1.0.2 httpcore==1.0.7 httptools==0.6.4 httpx==0.28.1 huggingface-hub==0.30.1 humanize==4.12.2 idna==3.10 importlib_metadata==8.0.0 importlib_resources==6.5.2 interegular==0.3.3 iso3166==2.1.1 jax==0.5.3 jaxlib==0.5.3 Jinja2==3.1.5 jiter==0.8.2 joblib==1.4.2 jsonschema==4.23.0 jsonschema-specifications==2024.10.1 kiwisolver==1.4.8 lark==1.2.2 Levenshtein==0.27.1 litellm==1.65.1 llguidance==0.7.11 llvmlite==0.44.0 lm-format-enforcer==0.10.11 markdown-it-py==3.0.0 MarkupSafe==2.1.5 matplotlib==3.10.1 mdurl==0.1.2 mistral_common==1.5.4 ml_dtypes==0.5.1 more-itertools==10.6.0 mpmath==1.3.0 msgpack==1.1.0 msgspec==0.19.0 multidict==6.1.0 multiprocess==0.70.16 nanobind==2.6.1 nest-asyncio==1.6.0 networkx==3.4.2 ninja==1.11.1.4 nltk==3.9.1 numba==0.61.2 numpy==1.26.4 nvidia-cublas-cu12==12.4.5.8 nvidia-cuda-cupti-cu12==12.4.127 nvidia-cuda-nvrtc-cu12==12.4.127 nvidia-cuda-runtime-cu12==12.4.127 nvidia-cudnn-cu12==9.1.0.70 nvidia-cufft-cu12==11.2.1.3 nvidia-curand-cu12==10.3.5.147 nvidia-cusolver-cu12==11.6.1.9 nvidia-cusparse-cu12==12.3.1.170 nvidia-cusparselt-cu12==0.6.2 08:29:11 [54/1972] nvidia-ml-py==12.570.86 nvidia-nccl-cu12==2.21.5 nvidia-nvjitlink-cu12==12.4.127 nvidia-nvtx-cu12==12.4.127 ollama==0.4.7 openai==1.70.0 opencv-python-headless==4.11.0.86 opentelemetry-api==1.26.0 opentelemetry-exporter-otlp==1.26.0 opentelemetry-exporter-otlp-proto-common==1.26.0 opentelemetry-exporter-otlp-proto-grpc==1.26.0 opentelemetry-exporter-otlp-proto-http==1.26.0 opentelemetry-proto==1.26.0 opentelemetry-sdk==1.26.0 opentelemetry-semantic-conventions==0.47b0 opentelemetry-semantic-conventions-ai==0.4.3 opt_einsum==3.4.0 optax==0.2.4 orbax-checkpoint==0.11.10 orjson==3.10.15 outlines==0.2.3 outlines_core==0.1.26 packaging==24.2 pandas==2.2.3 partial-json-parser==0.2.1.1.post5 peft==0.15.0 pillow==11.1.0 prometheus-fastapi-instrumentator==7.0.2 prometheus_client==0.21.1 propcache==0.3.0 protobuf==3.20.3 psutil==7.0.0 py-cpuinfo==9.0.0 pyairports==2.1.1 pyarrow==19.0.1 pycountry==24.6.1 pydantic==2.10.6 pydantic_core==2.27.2 pydub==0.25.1 Pygments==2.19.1 pyinfer==0.0.3 pyparsing==3.2.1 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 python-json-logger==3.3.0 python-multipart==0.0.20 pytz==2025.1 PyYAML==6.0.2 pyzmq==26.2.1 RapidFuzz==3.12.2 ray==2.43.0 referencing==0.36.2 regex==2024.11.6 requests==2.32.3 rich==13.9.4 rich-toolkit==0.13.2 rouge_score==0.1.2 rpds-py==0.23.1 ruff==0.9.9 sacremoses==0.1.1 safehttpx==0.1.6 safetensors==0.5.3 ScandEval==14.0.0 scikit-learn==1.5.2 scipy==1.15.2 semantic-version==2.10.0 sentencepiece==0.2.0 seqeval==1.2.2 setuptools==75.8.2 shellingham==1.5.4 simplejson==3.20.1 six==1.17.0 sniffio==1.3.1 starlette==0.46.0 sympy==1.13.1 tabulate==0.9.0 tenacity==9.0.0 tensorstore==0.1.72 termcolor==2.5.0 threadpoolctl==3.5.0 tiktoken==0.9.0 tokenizers==0.21.1 tomlkit==0.13.2 toolz==1.0.0 torch==2.6.0 torchaudio==2.6.0 torchvision==0.21.0 tqdm==4.67.1 transformers==4.51.3 treescope==0.1.9 triton==3.2.0 typer==0.15.2 typing_extensions==4.12.2 tzdata==2025.1 urllib3==2.3.0 uvicorn==0.34.0 uvloop==0.21.0 vllm==0.8.5.post1 watchfiles==1.0.4 websockets==15.0 wheel==0.45.1 wrapt==1.17.2 xformers==0.0.29.post2 xgrammar==0.1.18 xxhash==3.5.0 yarl==1.18.3 zipp==3.21.0Context for the issue:
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working