Skip to content

feat(TaskProcessing): add AudioToTextSubtitles TaskType#61127

Open
edward-ly wants to merge 1 commit into
masterfrom
feat/noid/subtitles-task
Open

feat(TaskProcessing): add AudioToTextSubtitles TaskType#61127
edward-ly wants to merge 1 commit into
masterfrom
feat/noid/subtitles-task

Conversation

@edward-ly

@edward-ly edward-ly commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a core:audio2text:subtitles task type which outputs a subtitles file for speech-to-text tasks instead of plain text transcripts (as is the case for core:audio2text).

TODO

  • Ideally, this task should accept both audio and video files as input, but I'm not sure if EShapeType::Audio covers video files in addition to audio files. If not, maybe we should have a separate input shape that does cover both?

Checklist

AI (if applicable)

  • The content of this PR was partly or fully generated using AI

@edward-ly edward-ly added this to the Nextcloud 35 milestone Jun 10, 2026
@edward-ly edward-ly force-pushed the feat/noid/subtitles-task branch from a6604f7 to 86c9847 Compare June 10, 2026 15:51
@edward-ly edward-ly marked this pull request as ready for review June 10, 2026 15:51
@edward-ly edward-ly requested a review from a team as a code owner June 10, 2026 15:51
@edward-ly edward-ly requested review from ArtificialOwl, icewind1991, leftybournes and salmart-dev and removed request for a team June 10, 2026 15:51
@edward-ly edward-ly force-pushed the feat/noid/subtitles-task branch from 86c9847 to df27584 Compare June 13, 2026 00:27
@marcelklehr

marcelklehr commented Jun 15, 2026

Copy link
Copy Markdown
Member

We could use the File shape type, that covers more than only audio and video, but should be ok, perhaps? The downside is that the assistant will not display a mic record button. But that should be fine as the feature is more geared towards transcribing existing files, I'd say, no?

@edward-ly

Copy link
Copy Markdown
Contributor Author

We could use the File shape type, that covers more than only audio and video, but should be ok, perhaps? The downside is that the assistant will not display a mic record button. But that should be fine as the feature is more geared towards transcribing existing files, I'd say, no?

The one potential use case I could see with the record button would be for recording and transcribing a podcast at the same time, although I think recording locally before uploading and transcribing would still be preferred for some people anyway. Let's switch to the File type, then.

In that case, would we still be able to validate eligible files by MIME type (either in the file action or in the Assistant modal)? If so, how?

@marcelklehr

Copy link
Copy Markdown
Member

In that case, would we still be able to validate eligible files by MIME type (either in the file action or in the Assistant modal)? If so, how?

The assistant modal would not be able to do that, as it doesn't know anything about what the provider wants except what the task type tells it. The file action might be able to do so. But we should definitely check in the provider implementation and raise a user facing error if the mime type is not supported.

@edward-ly

Copy link
Copy Markdown
Contributor Author

The assistant modal would not be able to do that, as it doesn't know anything about what the provider wants except what the task type tells it. The file action might be able to do so. But we should definitely check in the provider implementation and raise a user facing error if the mime type is not supported.

Different providers might support different audio/video types too, so yeah, it's probably best to push the validation responsibility to the providers anyway.

Signed-off-by: Edward Ly <contact@edward.ly>
@edward-ly edward-ly force-pushed the feat/noid/subtitles-task branch from df27584 to 3692d1a Compare June 15, 2026 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants