Skip to content

solve zipformer streaming gpu inference#961

Open
whaozl wants to merge 1 commit into
k2-fsa:masterfrom
whaozl:patch-1
Open

solve zipformer streaming gpu inference#961
whaozl wants to merge 1 commit into
k2-fsa:masterfrom
whaozl:patch-1

Conversation

@whaozl

@whaozl whaozl commented Mar 23, 2023

Copy link
Copy Markdown

No description provided.

@yaozengwei

Copy link
Copy Markdown
Collaborator

The script jit_trace_export.py exports model with torch.jit.trace. Why replace it with torch.jit.script?

@yfyeung yfyeung left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain the reason for this modification.

yfyeung

This comment was marked as duplicate.

@whaozl

whaozl commented Mar 23, 2023

Copy link
Copy Markdown
Author

@yaozengwei because it has a error RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! when using the torch.jit.trace on encode model.

see k2-fsa/sherpa#346

so, only need encode to torch.jit.script.
the decode and joiner keep the same as torch.jit.trace.

@yaozengwei

Copy link
Copy Markdown
Collaborator

Could you try to convert the model to cuda device instead of cpu when doing the jit.trace exporting (See

)?
We also need to create the inputs on cuda device in this case. (See
x = torch.zeros(1, T, 80, dtype=torch.float32)
)
I wonder if we need to export the model on cuda device when we want to run the model on cuda device.

See https://pytorch.org/docs/stable/jit.html#frequently-asked-questions
Screenshot 2023-03-23 at 17 14 28

@whaozl

whaozl commented Mar 23, 2023

Copy link
Copy Markdown
Author

@yaozengwei

there are two scenes:
1、convert the model to cuda device instead of cpu when doing the jit.trace exporting
model.to("cuda:0")
it has a error RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
or
torch.jit._trace.TracingCheckError: Tracing failed sanity checks!
ERROR: Graphs differed across invocations!
because https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming/zipformer.py#L2280-L2281
,but it modify:

rows = torch.arange(start=time1 - 1, end=-1, step=-1)
rows = torch.arange(start=time1 - 1, end=-1, step=-1).cuda()

2、when use cpu to export. the sherpa online[https://github.com/k2-fsa/sherpa/blob/master/sherpa/cpp_api/bin/online-recognizer.cc] use gpu inference, it had a error
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

so, the best method is to modify the encode export style. using torch.jit.script.

@yaozengwei

Copy link
Copy Markdown
Collaborator

Could you successfully export the model if you do the change rows = torch.arange(start=time1 - 1, end=-1, step=-1).cuda()?

The reason why we export with jit.trace instead of jit.script is some inference frameworks need that.

@whaozl

whaozl commented Mar 23, 2023

Copy link
Copy Markdown
Author

when use rows = torch.arange(start=time1 - 1, end=-1, step=-1).cuda(), it failed.

so I try to use torch.jit.script for the encode model. then, use sherpa online. it can run successfullly when use_gpu.

@yaozengwei

Copy link
Copy Markdown
Collaborator

when use rows = torch.arange(start=time1 - 1, end=-1, step=-1).cuda(), it failed.

so I try to use torch.jit.script for the encode model. then, use sherpa online. it can run successfullly when use_gpu.

Ok. The exported encoder that you are running on cuda device is jit.script version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants