Skip to content

Uh-oh Spaghettio: Behemoth synchronization timeout in native TP on dev. #202

Description

@Ph0rk0z

Yeah, it is WIP but figure that if nobody reports it, it might go under the radar with all the MoE changes. I tested v.28, master and dev. On dev non-tp gave me 1.2t/s speeds on 4x3090, like a model on CPU. Then I tried native TP as normal and it resulted in the error log. NCCL TP still output correctly.

Also for some reason encode_special_tokens still adds BOS token to all tokenizations. This means all sillytavern token bans and anything related is wrong. Even though it sends the request like this now. With encode_special_tokens disabled, <s> is tokenized correctly so I don't get what this setting is for. I can try to PR them to add the parameter to set it false but feel like this can burn any front end tokenizing via tabby.


2026-05-03 07:42:37.761 INFO:     Headers: {'accept': '*/*', 'accept-encoding': 'gzip, deflate, br', 'authorization': 'Bearer 
befed36be355afb56f593ff82e18dd93', 'content-length': '36', 'content-type': 'application/json', 'user-agent': 'node-fetch', 'x-api-key': 
'befed36be355afb56f593ff82e18dd93', 'host': '192.168.1.211:5000', 'connection': 'close'}
2026-05-03 07:42:37.761 INFO:     Body: {'text': '<s>', 'add_bos_token': False}
 ## Exception in child process
Traceback (most recent call last):
  File "/home/supermicro/miniconda3/envs/cuda12/lib/python3.11/site-packages/exllamav3/model/model_tp_fn.py", line 81, in mp_model_worker
    result = func(local_context, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/supermicro/miniconda3/envs/cuda12/lib/python3.11/site-packages/exllamav3/model/model_tp_fn.py", line 215, in mp_model_forward
    x = module.forward(x, params)
        ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/supermicro/miniconda3/envs/cuda12/lib/python3.11/site-packages/exllamav3/modules/transformer.py", line 167, in forward
    y = self.mlp.forward(y, params)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/supermicro/miniconda3/envs/cuda12/lib/python3.11/site-packages/exllamav3/modules/mlp.py", line 675, in forward
    params["backend"].all_reduce(d)
  File "/home/supermicro/miniconda3/envs/cuda12/lib/python3.11/site-packages/exllamav3/model/model_tp_backend.py", line 333, in all_reduce
    ext.pg_all_reduce_cpu(
RuntimeError: Synchronization timeout

----------------------------------------
 ## Synchronization timeout in kernel: pg_all_reduce_cpu_kernel

----------------------------------------
 ## Exception in child process
Traceback (most recent call last):
  File "/home/supermicro/miniconda3/envs/cuda12/lib/python3.11/site-packages/exllamav3/model/model_tp_fn.py", line 81, in mp_model_worker
    result = func(local_context, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/supermicro/miniconda3/envs/cuda12/lib/python3.11/site-packages/exllamav3/model/model_tp_fn.py", line 215, in mp_model_forward
    x = module.forward(x, params)
        ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/supermicro/miniconda3/envs/cuda12/lib/python3.11/site-packages/exllamav3/modules/gather.py", line 65, in forward
    backend.gather(x, out_tensor, self.gather_devices, self.output_device, self.ldims)
  File "/home/supermicro/miniconda3/envs/cuda12/lib/python3.11/site-packages/exllamav3/model/model_tp_backend.py", line 372, in gather
    ext.pg_gather(
RuntimeError: Synchronization timeout


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions