Skip to content

SpeechLM2 : Add support for offset key in Multimodal conversation#15281

Merged
pzelasko merged 11 commits intoNVIDIA-NeMo:mainfrom
linagora-labs:multimodalconversation_offset
Jan 16, 2026
Merged

SpeechLM2 : Add support for offset key in Multimodal conversation#15281
pzelasko merged 11 commits intoNVIDIA-NeMo:mainfrom
linagora-labs:multimodalconversation_offset

Conversation

@AudranBert
Copy link
Copy Markdown
Contributor

@AudranBert AudranBert commented Jan 9, 2026

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Add support for offset key in Multimodal conversation. Also make that it uses the duration key to load audios. Goal is to be able to have multiples segments per audio.

Collection: speechlm2

Changelog

  • Offset key support (if present)
  • Duration key now used (if present) for audio loading
  • Export_conversations_to_tar.py now write segments (use duration and offset) in .tar instead of the full audio

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Signed-off-by: AudranBert <bert.audran@gmail.com>
Signed-off-by: AudranBert <bert.audran@gmail.com>
@github-actions github-actions Bot added the common label Jan 9, 2026
Signed-off-by: AudranBert <bert.audran@gmail.com>
@AudranBert
Copy link
Copy Markdown
Contributor Author

AudranBert commented Jan 9, 2026

Hi, @pzelasko what do you think of the code logic? Any concerns or warnings?
From my testings, it works with tar and jsonl and does not load the whole file like it did before.
I will write automated test when I can.

Copy link
Copy Markdown
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yeah I think that will work. Please make sure to cover both reading and writing + reading cases with the new offset field.

Comment thread nemo/collections/common/data/lhotse/text_adapters.py Outdated
Signed-off-by: AudranBert <bert.audran@gmail.com>
Signed-off-by: AudranBert <bert.audran@gmail.com>
@AudranBert AudranBert force-pushed the multimodalconversation_offset branch from 246b07f to 09ddb4c Compare January 12, 2026 15:17
@AudranBert
Copy link
Copy Markdown
Contributor Author

Thanks! Yeah I think that will work. Please make sure to cover both reading and writing + reading cases with the new offset field.

Except for NeMoMultimodalConversationTarWriter I do not see functions to write NeMoMultimodalConversation, is there some somewhere else?

@AudranBert
Copy link
Copy Markdown
Contributor Author

Thanks! Yeah I think that will work. Please make sure to cover both reading and writing + reading cases with the new offset field.

Do I need to add offset/duration support to NeMoMultimodalConversationShareGPTJsonlAdapter ? It seems like an established format.

Signed-off-by: AudranBert <bert.audran@gmail.com>
Signed-off-by: AudranBert <bert.audran@gmail.com>
Signed-off-by: AudranBert <bert.audran@gmail.com>
@AudranBert AudranBert marked this pull request as ready for review January 13, 2026 15:14
@AudranBert
Copy link
Copy Markdown
Contributor Author

AudranBert commented Jan 13, 2026

tests/collections/common/test_lhotse_multimodal_dataloading.py are passing @pzelasko. Is it good? For now, I didn't touch NeMoMultimodalConversationShareGPTJsonlAdapter but I can update it if needed (but it is missing the duration key already?). NeMoMultimodalConversationJsonlAdapter is fully working : loading jsonl or tar and writing tar with NeMoMultimodalConversationTarWriter.

The CI job Isort and Black Formatting / reformat_with_isort_and_black (pull_request_target) is not working, it seems to be because the PR is from a forked repository, I don't understand what I should do and if I should do something?
I can't add the ``run CICD` label too.

Signed-off-by: AudranBert <bert.audran@gmail.com>
@AudranBert AudranBert changed the title [WIP] SpeechLM2 : Add support for offset key in Multimodal conversation SpeechLM2 : Add support for offset key in Multimodal conversation Jan 14, 2026
@pzelasko
Copy link
Copy Markdown
Collaborator

Except for NeMoMultimodalConversationTarWriter I do not see functions to write NeMoMultimodalConversation, is there some somewhere else?

I don't think there are - it's OK to just test it with TarWriter

Do I need to add offset/duration support to NeMoMultimodalConversationShareGPTJsonlAdapter ? It seems like an established format.

Could you do that while you're at it? It will be definitely helpful.

Don't worry about the CI job, I just approved it, will probably work on re-trigger.

Signed-off-by: AudranBert <bert.audran@gmail.com>
Signed-off-by: AudranBert <bert.audran@gmail.com>
@github-actions github-actions Bot added the ASR label Jan 15, 2026
@AudranBert
Copy link
Copy Markdown
Contributor Author

Except for NeMoMultimodalConversationTarWriter I do not see functions to write NeMoMultimodalConversation, is there some somewhere else?

I don't think there are - it's OK to just test it with TarWriter

Do I need to add offset/duration support to NeMoMultimodalConversationShareGPTJsonlAdapter ? It seems like an established format.

Could you do that while you're at it? It will be definitely helpful.

Don't worry about the CI job, I just approved it, will probably work on re-trigger.

Hi @pzelasko, I added a line in the doc to acknowledge the offset param. I think the PR is finished, except if you have some feedbacks. Just a side question, the duration param is mandatory for NeMo multimodal conversations? It is used to estimate the number of tokens? Because it was unused in the text_adapters.py file (now it is).

@pzelasko
Copy link
Copy Markdown
Collaborator

Just a side question, the duration param is mandatory for NeMo multimodal conversations? It is used to estimate the number of tokens? Because it was unused in the text_adapters.py file (now it is).

Yes, it was unused, and present there as useful metadata for reading the manifest files in other contexts (EDA etc).

@AudranBert
Copy link
Copy Markdown
Contributor Author

Hi @pzelasko, (sorry for the ping) the PR needs a review

Copy link
Copy Markdown
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @AudranBert !

@pzelasko pzelasko merged commit e777163 into NVIDIA-NeMo:main Jan 16, 2026
255 checks passed
v4xsh pushed a commit to v4xsh/NeMo that referenced this pull request Jan 17, 2026
…IDIA-NeMo#15281)

* Add offset support to multimodal conversation adapter

Signed-off-by: AudranBert <bert.audran@gmail.com>

* Working conversation tar writer

Signed-off-by: AudranBert <bert.audran@gmail.com>

* Fix files with same name in tar

Signed-off-by: AudranBert <bert.audran@gmail.com>

* unitest adapter loading jsonl

Signed-off-by: AudranBert <bert.audran@gmail.com>

* add duration in names in tar

Signed-off-by: AudranBert <bert.audran@gmail.com>

* Deterministic cut ID after truncation

Signed-off-by: AudranBert <bert.audran@gmail.com>

* update lhotse multimodal test with offset

Signed-off-by: AudranBert <bert.audran@gmail.com>

* remove useless tests

Signed-off-by: AudranBert <bert.audran@gmail.com>

* apply formatting

Signed-off-by: AudranBert <bert.audran@gmail.com>

* more robust ids + offset/duration added to sharegpt class

Signed-off-by: AudranBert <bert.audran@gmail.com>

* upd test + add doc

Signed-off-by: AudranBert <bert.audran@gmail.com>

---------

Signed-off-by: AudranBert <bert.audran@gmail.com>
v4xsh pushed a commit to v4xsh/NeMo that referenced this pull request Jan 17, 2026
…IDIA-NeMo#15281)

* Add offset support to multimodal conversation adapter

Signed-off-by: AudranBert <bert.audran@gmail.com>

* Working conversation tar writer

Signed-off-by: AudranBert <bert.audran@gmail.com>

* Fix files with same name in tar

Signed-off-by: AudranBert <bert.audran@gmail.com>

* unitest adapter loading jsonl

Signed-off-by: AudranBert <bert.audran@gmail.com>

* add duration in names in tar

Signed-off-by: AudranBert <bert.audran@gmail.com>

* Deterministic cut ID after truncation

Signed-off-by: AudranBert <bert.audran@gmail.com>

* update lhotse multimodal test with offset

Signed-off-by: AudranBert <bert.audran@gmail.com>

* remove useless tests

Signed-off-by: AudranBert <bert.audran@gmail.com>

* apply formatting

Signed-off-by: AudranBert <bert.audran@gmail.com>

* more robust ids + offset/duration added to sharegpt class

Signed-off-by: AudranBert <bert.audran@gmail.com>

* upd test + add doc

Signed-off-by: AudranBert <bert.audran@gmail.com>

---------

Signed-off-by: AudranBert <bert.audran@gmail.com>
Signed-off-by: v4xsh <vanshdobhal11@gmail.com>
AkCodes23 pushed a commit to AkCodes23/NeMo that referenced this pull request Jan 28, 2026
…IDIA-NeMo#15281)

* Add offset support to multimodal conversation adapter

Signed-off-by: AudranBert <bert.audran@gmail.com>

* Working conversation tar writer

Signed-off-by: AudranBert <bert.audran@gmail.com>

* Fix files with same name in tar

Signed-off-by: AudranBert <bert.audran@gmail.com>

* unitest adapter loading jsonl

Signed-off-by: AudranBert <bert.audran@gmail.com>

* add duration in names in tar

Signed-off-by: AudranBert <bert.audran@gmail.com>

* Deterministic cut ID after truncation

Signed-off-by: AudranBert <bert.audran@gmail.com>

* update lhotse multimodal test with offset

Signed-off-by: AudranBert <bert.audran@gmail.com>

* remove useless tests

Signed-off-by: AudranBert <bert.audran@gmail.com>

* apply formatting

Signed-off-by: AudranBert <bert.audran@gmail.com>

* more robust ids + offset/duration added to sharegpt class

Signed-off-by: AudranBert <bert.audran@gmail.com>

* upd test + add doc

Signed-off-by: AudranBert <bert.audran@gmail.com>

---------

Signed-off-by: AudranBert <bert.audran@gmail.com>
Signed-off-by: Akhil Varanasi <akhilvaranasi23@gmail.com>
@AudranBert AudranBert deleted the multimodalconversation_offset branch January 30, 2026 09:25
nune-tadevosyan pushed a commit to nune-tadevosyan/NeMo that referenced this pull request Mar 13, 2026
…IDIA-NeMo#15281)

* Add offset support to multimodal conversation adapter

Signed-off-by: AudranBert <bert.audran@gmail.com>

* Working conversation tar writer

Signed-off-by: AudranBert <bert.audran@gmail.com>

* Fix files with same name in tar

Signed-off-by: AudranBert <bert.audran@gmail.com>

* unitest adapter loading jsonl

Signed-off-by: AudranBert <bert.audran@gmail.com>

* add duration in names in tar

Signed-off-by: AudranBert <bert.audran@gmail.com>

* Deterministic cut ID after truncation

Signed-off-by: AudranBert <bert.audran@gmail.com>

* update lhotse multimodal test with offset

Signed-off-by: AudranBert <bert.audran@gmail.com>

* remove useless tests

Signed-off-by: AudranBert <bert.audran@gmail.com>

* apply formatting

Signed-off-by: AudranBert <bert.audran@gmail.com>

* more robust ids + offset/duration added to sharegpt class

Signed-off-by: AudranBert <bert.audran@gmail.com>

* upd test + add doc

Signed-off-by: AudranBert <bert.audran@gmail.com>

---------

Signed-off-by: AudranBert <bert.audran@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants