SpeechLM2 : Add support for offset key in Multimodal conversation#15281
SpeechLM2 : Add support for offset key in Multimodal conversation#15281pzelasko merged 11 commits intoNVIDIA-NeMo:mainfrom
Conversation
Signed-off-by: AudranBert <bert.audran@gmail.com>
Signed-off-by: AudranBert <bert.audran@gmail.com>
Signed-off-by: AudranBert <bert.audran@gmail.com>
|
Hi, @pzelasko what do you think of the code logic? Any concerns or warnings? |
pzelasko
left a comment
There was a problem hiding this comment.
Thanks! Yeah I think that will work. Please make sure to cover both reading and writing + reading cases with the new offset field.
Signed-off-by: AudranBert <bert.audran@gmail.com>
Signed-off-by: AudranBert <bert.audran@gmail.com>
246b07f to
09ddb4c
Compare
Except for |
Do I need to add offset/duration support to |
Signed-off-by: AudranBert <bert.audran@gmail.com>
Signed-off-by: AudranBert <bert.audran@gmail.com>
Signed-off-by: AudranBert <bert.audran@gmail.com>
|
The CI job Isort and Black Formatting / reformat_with_isort_and_black (pull_request_target) is not working, it seems to be because the PR is from a forked repository, I don't understand what I should do and if I should do something? |
Signed-off-by: AudranBert <bert.audran@gmail.com>
I don't think there are - it's OK to just test it with TarWriter
Could you do that while you're at it? It will be definitely helpful. Don't worry about the CI job, I just approved it, will probably work on re-trigger. |
Signed-off-by: AudranBert <bert.audran@gmail.com>
Signed-off-by: AudranBert <bert.audran@gmail.com>
Hi @pzelasko, I added a line in the doc to acknowledge the offset param. I think the PR is finished, except if you have some feedbacks. Just a side question, the duration param is mandatory for NeMo multimodal conversations? It is used to estimate the number of tokens? Because it was unused in the text_adapters.py file (now it is). |
Yes, it was unused, and present there as useful metadata for reading the manifest files in other contexts (EDA etc). |
|
Hi @pzelasko, (sorry for the ping) the PR needs a review |
pzelasko
left a comment
There was a problem hiding this comment.
Thanks for the contribution @AudranBert !
…IDIA-NeMo#15281) * Add offset support to multimodal conversation adapter Signed-off-by: AudranBert <bert.audran@gmail.com> * Working conversation tar writer Signed-off-by: AudranBert <bert.audran@gmail.com> * Fix files with same name in tar Signed-off-by: AudranBert <bert.audran@gmail.com> * unitest adapter loading jsonl Signed-off-by: AudranBert <bert.audran@gmail.com> * add duration in names in tar Signed-off-by: AudranBert <bert.audran@gmail.com> * Deterministic cut ID after truncation Signed-off-by: AudranBert <bert.audran@gmail.com> * update lhotse multimodal test with offset Signed-off-by: AudranBert <bert.audran@gmail.com> * remove useless tests Signed-off-by: AudranBert <bert.audran@gmail.com> * apply formatting Signed-off-by: AudranBert <bert.audran@gmail.com> * more robust ids + offset/duration added to sharegpt class Signed-off-by: AudranBert <bert.audran@gmail.com> * upd test + add doc Signed-off-by: AudranBert <bert.audran@gmail.com> --------- Signed-off-by: AudranBert <bert.audran@gmail.com>
…IDIA-NeMo#15281) * Add offset support to multimodal conversation adapter Signed-off-by: AudranBert <bert.audran@gmail.com> * Working conversation tar writer Signed-off-by: AudranBert <bert.audran@gmail.com> * Fix files with same name in tar Signed-off-by: AudranBert <bert.audran@gmail.com> * unitest adapter loading jsonl Signed-off-by: AudranBert <bert.audran@gmail.com> * add duration in names in tar Signed-off-by: AudranBert <bert.audran@gmail.com> * Deterministic cut ID after truncation Signed-off-by: AudranBert <bert.audran@gmail.com> * update lhotse multimodal test with offset Signed-off-by: AudranBert <bert.audran@gmail.com> * remove useless tests Signed-off-by: AudranBert <bert.audran@gmail.com> * apply formatting Signed-off-by: AudranBert <bert.audran@gmail.com> * more robust ids + offset/duration added to sharegpt class Signed-off-by: AudranBert <bert.audran@gmail.com> * upd test + add doc Signed-off-by: AudranBert <bert.audran@gmail.com> --------- Signed-off-by: AudranBert <bert.audran@gmail.com> Signed-off-by: v4xsh <vanshdobhal11@gmail.com>
…IDIA-NeMo#15281) * Add offset support to multimodal conversation adapter Signed-off-by: AudranBert <bert.audran@gmail.com> * Working conversation tar writer Signed-off-by: AudranBert <bert.audran@gmail.com> * Fix files with same name in tar Signed-off-by: AudranBert <bert.audran@gmail.com> * unitest adapter loading jsonl Signed-off-by: AudranBert <bert.audran@gmail.com> * add duration in names in tar Signed-off-by: AudranBert <bert.audran@gmail.com> * Deterministic cut ID after truncation Signed-off-by: AudranBert <bert.audran@gmail.com> * update lhotse multimodal test with offset Signed-off-by: AudranBert <bert.audran@gmail.com> * remove useless tests Signed-off-by: AudranBert <bert.audran@gmail.com> * apply formatting Signed-off-by: AudranBert <bert.audran@gmail.com> * more robust ids + offset/duration added to sharegpt class Signed-off-by: AudranBert <bert.audran@gmail.com> * upd test + add doc Signed-off-by: AudranBert <bert.audran@gmail.com> --------- Signed-off-by: AudranBert <bert.audran@gmail.com> Signed-off-by: Akhil Varanasi <akhilvaranasi23@gmail.com>
…IDIA-NeMo#15281) * Add offset support to multimodal conversation adapter Signed-off-by: AudranBert <bert.audran@gmail.com> * Working conversation tar writer Signed-off-by: AudranBert <bert.audran@gmail.com> * Fix files with same name in tar Signed-off-by: AudranBert <bert.audran@gmail.com> * unitest adapter loading jsonl Signed-off-by: AudranBert <bert.audran@gmail.com> * add duration in names in tar Signed-off-by: AudranBert <bert.audran@gmail.com> * Deterministic cut ID after truncation Signed-off-by: AudranBert <bert.audran@gmail.com> * update lhotse multimodal test with offset Signed-off-by: AudranBert <bert.audran@gmail.com> * remove useless tests Signed-off-by: AudranBert <bert.audran@gmail.com> * apply formatting Signed-off-by: AudranBert <bert.audran@gmail.com> * more robust ids + offset/duration added to sharegpt class Signed-off-by: AudranBert <bert.audran@gmail.com> * upd test + add doc Signed-off-by: AudranBert <bert.audran@gmail.com> --------- Signed-off-by: AudranBert <bert.audran@gmail.com>
Important
The
Update branchbutton must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
Add support for offset key in Multimodal conversation. Also make that it uses the duration key to load audios. Goal is to be able to have multiples segments per audio.
Collection: speechlm2
Changelog
Usage
# Add a code snippet demonstrating how to use thisGitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information