Skip to content

[BUG] AssertionError in sliding_attn.py unstash() after vision input with Gemma4 #228

Description

@Juelsman3343

There is a chat completions error caused when running Gemma4. Possibly with other vision models but I haven't tested it. Sending images now crashes the generator.

  • Regression Introduced in: v0.0.35
  • Last known working version: v0.0.34
  • Still broken v0.0.43
  • Error: AssertionError: self.position == stashed["position"] in sliding_attn.py:69

Reproduction:
Running this model Gemma4-26b-A4B-it 5.10bpw works mostly fine. However, when you send an image to the vision model, it will first respond accordingly. But the message immediately following will throw the error. A retry will fix the issue an the model will begin chatting once more.

However, sending a second image will throw the error once more and this error will NOT go away until the entire backend is restarted.

Confirmed by:
Running TabbyAPI on the latest version (using exllama v0.0.43). I tested it on both Sillytavern and OpenWebUI to confirm. Rolling back TabbyAPI to commit #fef811d (which uses exl3 v0.0.34) confirms vision works properly without issues.

Traceback:

2026-06-17 22:22:54.725 INFO:     Received chat completion streaming request 77a7704e400d48b6957e39e5963a80e4
2026-06-17 22:22:54.727 ERROR:    FATAL ERROR with generation. Attempting to recreate the generator. If this fails, please restart the 
server.

2026-06-17 22:22:54.728 WARNING:  Immediately terminating all jobs. Clients will have their requests cancelled.

2026-06-17 22:22:54.728 ERROR:    Error during chat completion 

2026-06-17 22:22:54.732 ERROR:    Error Traceback (most recent call last):
2026-06-17 22:22:54.732 ERROR:      File "./tabbyAPI/endpoints/OAI/utils/chat_completion.py", line 573, in 
stream_generate_chat_completion
2026-06-17 22:22:54.732 ERROR:        raise generation
2026-06-17 22:22:54.732 ERROR:      File "./tabbyAPI/endpoints/OAI/utils/chat_completion.py", line 396, in 
_chat_stream_collector
2026-06-17 22:22:54.732 ERROR:        async for generation in new_generation:
2026-06-17 22:22:54.732 ERROR:      File "./tabbyAPI/backends/exllamav3/model.py", line 885, in stream_generate
2026-06-17 22:22:54.732 ERROR:        async for generation_chunk in self.generate_gen(
2026-06-17 22:22:54.732 ERROR:      File "./tabbyAPI/backends/exllamav3/model.py", line 1258, in generate_gen
2026-06-17 22:22:54.732 ERROR:        raise ex
2026-06-17 22:22:54.732 ERROR:      File "./tabbyAPI/backends/exllamav3/model.py", line 1188, in generate_gen
2026-06-17 22:22:54.732 ERROR:        async for result in job:
2026-06-17 22:22:54.732 ERROR:      File 
"./tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/async_generator.py", line 109, in __aiter__
2026-06-17 22:22:54.732 ERROR:        raise result
2026-06-17 22:22:54.732 ERROR:      File 
"./tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/async_generator.py", line 27, in 
_run_iteration
2026-06-17 22:22:54.732 ERROR:        results = self.generator.iterate()
2026-06-17 22:22:54.732 ERROR:                  ^^^^^^^^^^^^^^^^^^^^^^^^
2026-06-17 22:22:54.732 ERROR:      File 
"./tabbyAPI/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
2026-06-17 22:22:54.732 ERROR:        return func(*args, **kwargs)
2026-06-17 22:22:54.732 ERROR:               ^^^^^^^^^^^^^^^^^^^^^
2026-06-17 22:22:54.732 ERROR:      File 
"./tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/generator.py", line 341, in iterate
2026-06-17 22:22:54.732 ERROR:        self.iterate_start_jobs(results)
2026-06-17 22:22:54.732 ERROR:      File 
"./tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/generator.py", line 943, in iterate_start_jobs
2026-06-17 22:22:54.732 ERROR:        job.allocate_pages()
2026-06-17 22:22:54.732 ERROR:      File 
"./tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/generator/job.py", line 1209, in allocate_pages
2026-06-17 22:22:54.732 ERROR:        self.recurrent_state = self.generator.cache.new_from_stashed(
2026-06-17 22:22:54.732 ERROR:                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-06-17 22:22:54.732 ERROR:      File 
"./tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/cache/cache.py", line 323, in new_from_stashed
2026-06-17 22:22:54.732 ERROR:        return self.recurrent_state_cls(
2026-06-17 22:22:54.732 ERROR:               ^^^^^^^^^^^^^^^^^^^^^^^^^
2026-06-17 22:22:54.732 ERROR:      File 
"./tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/modules/sliding_attn.py", line 39, in __init__
2026-06-17 22:22:54.732 ERROR:        self.unstash(stashed)
2026-06-17 22:22:54.732 ERROR:      File 
"./tabbyAPI/venv/lib/python3.12/site-packages/exllamav3/modules/sliding_attn.py", line 69, in unstash
2026-06-17 22:22:54.732 ERROR:        assert self.position == stashed["position"]
2026-06-17 22:22:54.732 ERROR:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-06-17 22:22:54.732 ERROR:    AssertionError
2026-06-17 22:22:54.740 ERROR:    Sent to request: Chat completion aborted. Please check the server console.

Tested on: Ubuntu version Release 26.04.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions