Skip to content

legacy_apps: xlnx: add api to poll apps on vdev status#100

Open
tnmysh wants to merge 2 commits intoOpenAMP:mainfrom
tnmysh:add_poll_on_vdev_reset_api
Open

legacy_apps: xlnx: add api to poll apps on vdev status#100
tnmysh wants to merge 2 commits intoOpenAMP:mainfrom
tnmysh:add_poll_on_vdev_reset_api

Conversation

@tnmysh
Copy link
Copy Markdown
Collaborator

@tnmysh tnmysh commented Mar 17, 2026

In case of the remoteproc repeat attach-detach, RPU firmware should be able to re-create the virtio devices. For this, the firmware should poll on the vdev status, and if the status is 0, then it should destroy the virtio and rpmsg devices. Then it should re-create them and wait for next driver ready status from the linux kernel. The vdev status 0 is usually done during remoteproc detach operation. Mailbox notification is expected from the Linux side on vdev status reset.

@tnmysh tnmysh requested review from arnopo, edmooring and wmamills March 17, 2026 23:44
break;
}
}
platform_poll_on_vdev_reset(priv, rdev);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This imposes that platform_poll_on_vdev_reset is defined in all machine. For instance this breaks build on Linux PC.

/usr/bin/ld: CMakeFiles/rpmsg-echo-shared.dir/rpmsg-echo.c.o: in function `rpmsg_echo_app':
rpmsg-echo.c:(.text+0x37f): undefined reference to `platform_poll_on_vdev_reset'

then platform_poll seems deprecated

Why not just keeping platform_poll but adding struct rpmsg_device *rpdev parameter?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... But that means, all the current support will move to the virtio reset polling. Is that okay?

Sorry I missed the Linux build on PC.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... But that means, all the current support will move to the virtio reset polling. Is that okay?

Not sure to understand your point, the implementation of platform_poll is already machine dependent.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean all for all places where platform_poll is used, is it okay to use platform_poll_on_vdev_reset? Can we deprecate use of platform_poll ? I don't see any reason to keep platform_poll honestly. What do you think?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without platform_poll, there is no notification of new incoming rpmsg (see the remoteproc_get_notification() function). It seems to me that your platform_poll_on_vdev_reset function is an extension of platform_poll with added management of the vdev status reset.

From my point of view, platform_poll is a more generic name. I would update it instead of creating a new function.

In the openamp-system-reference, I don’t think there is a need to apply the deprecation process, as the objective is only to provide reference code.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnopo Thanks for the suggestion. Even if rpmsg_retarget is an example, it is a part of the open-amp library. It is still not part of the system-reference. It is possible, that people have defined their own interfaces like platform_poll and passing into rpmsg_retarget. IMHO if we decide to change the rpmsg_retarget interface, then we should treat it just same as any other library interface that is exported to the users and go through the deprecation process of an interface. Because as of now, it is still part of the library.

I think we should aim for long-term solution instead, and remove polling from the service, and do it in the system-referece example instead.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnopo Thanks for the suggestion. Even if rpmsg_retarget is an example, it is a part of the open-amp library. It is still not part of the system-reference. It is possible, that people have defined their own interfaces like platform_poll and passing into rpmsg_retarget. IMHO if we decide to change the rpmsg_retarget interface, then we should treat it just same as any other library interface that is exported to the users and go through the deprecation process of an interface. Because as of now, it is still part of the library.

I think we should aim for long-term solution instead, and remove polling from the service, and do it in the system-referece example instead.

If implementing the long-term solution directly is ok with you, then I am also in favor of this approach.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @arnopo,

I did analysis on rpmsg_retarget.c file as well. I think the implementation standpoint it's correct. So, when open call is made, it waits for the confirmation from the Host that the file was created. So the polling happens to wait for that confirmation. We can't expect the polling happen in the system-reference app code, because it is required as part of open call. Without it the file fd won't be received from the host.

This means, our only option is to change the interface of rpc_poll, and include second argument there.
That also means to go through the deprecation process for that interfaces/typedef.

I think the easiest option instead is, to maintain two polling API in the system reference for now, and deprecate rpc_poll later.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @arnopo,

I might have a way to achieve the same result, without changing the platform_poll interfaces. I will test it next week and update you. Please ignore above message till then.

Thank you.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnopo I pushed the latest changes. Please check. I don't have to change platfrom_poll interface now, and still can poll on vdev flag.

The trick is, I can retrieve vdev from the list of remoteproc vdev devices. Today we have only one remoteproc_vdev for each remoteproc, so I just get the first remoteproc_vdev from the list, and from there I can get virtio device.
Let me know if this looks good.

Thanks.

tnmysh added 2 commits April 8, 2026 12:18
In case of the remoteproc repeat attach-detach, RPU firmware should be
able to re-create the virtio devices. For this, the firmware should
poll on the vdev status, and if the status is 0, then it should destroy
the virtio and rpmsg devices. Then it should re-create them and wait for
next driver ready status from the linux kernel. The vdev status 0 is
usually done during remoteproc detach operation. Mailbox notification is
expected from the Linux side on vdev status reset.

Signed-off-by: Tanmay Shah <tanmay.shah@amd.com>
echo and matrix multiply apps were not able to run after
attach()->detach()->attach() command sequence from linux side. Solve
this problem, by not stopping app, and re-creating rpmsg devices after
detection of vdev reset. This allows the app to run again on next
callback when linux sets DRIVER_OK bit.

Signed-off-by: Tanmay Shah <tanmay.shah@amd.com>
@tnmysh tnmysh force-pushed the add_poll_on_vdev_reset_api branch from 252144f to 4608802 Compare April 8, 2026 19:22
if (!rpdev) {
LPERROR("Failed to create rpmsg virtio device.\r\n");
break;
ret = 0;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ret = 0;
ret = -1;

NULL, NULL);
if (!rpdev) {
LPERROR("Failed to create rpmsg virtio device.\r\n");
ret = -EINVAL;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sometime -1 is returned, sometime -EINVAL. What about using EPERM or ENXIO everywhere

metal_err("failed to get vdev status %d\n", ret);
return ret;
}
} while (vdev_status);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except if I miss something this breaks the virtio spec protocol. The virtio driver must not set the status to 0.

2.4 Device Reset
The driver may want to initiate a device reset at various times; notably, it is required to do so during device initialization and device cleanup.

The mechanism used by the driver to initiate the reset is transport specific.

2.4.1 Device Requirements: Device Reset
A device MUST reinitialize device status to 0 after receiving a reset.

A device MUST NOT send notifications or interact with the queues after indicating completion of the reset by reinitializing device status to 0, until the driver re-initializes the device.

2.4.2 Driver Requirements: Device Reset
The driver SHOULD consider a driver-initiated reset complete when it reads device status as 0.

here the sequence should be

  • a reset request should come from the other side
  • The machine detect the reset in platform_poll() function and a a specif error ( perhaps -EPIPE).
  • generic application stop the rpmsg and restart it

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as per this virtio standard, this behavior is already broken in the linux kernel side. Because today we see virtio status set to 0 on detach() call. Perhaps rproc_reset_rsc_table_on_detach this function is restoring the default resource table in the linux kernel which sets vdev_status to 0 on detach.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right, the Linux should not clean the vdev, status

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnopo

How about this. We keep polling up until DRIVER_OKAY bit is set by the linux. If it's not setup anymore, then we stop polling. I think that's not breaking the virtio standard.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnopo

How about this. We keep polling up until DRIVER_OKAY bit is set by the linux. If it's not setup anymore, then we stop polling. I think that's not breaking the virtio standard.

Actually, that's not a good idea too. Because as per the spec, driver can't choose specific bits to reset.

Instead to reset the device, driver will set the status 0. So, are we not resetting the device when we issue detach() command?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view, we should think with a system-level approach rather than going with a temporary solution.

What about a solution based on a new status in the resource table, as discussed during the last OpenAMP RP call?

We should probably discuss the solution on the Linux kernel side before moving forward with this PR. The fact that Linux does not respect the virtio specification should help us reach a solution.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnopo Yes that's what I was thinking too.

The only problem is the backward compatibility from the linux standpoint.

If we implement the new status based on the resource table, and the remote firmware is polling based on current vdev status, then we may break the existing firmwares.

So, I was thinking in the linux side:

if (new_vdev_status in rsc tbl) then:
do not reset vdev status on detach, instead rely on fw to reset vdev status
else:
reset vdev status, and notify firmware.
endif

We can discuss this in the next system-reference or rp call. Which comes first.

break;
}
}
ret = platform_poll(priv);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This make break all machines implementation not only the AMD ones.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how it can break. Remote firmware will still work as it is.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nathalie-ckc
Copy link
Copy Markdown
Collaborator

From 4/15/2026 OpenAMP System Reference call:

  • Tanmay will notify Mathieu in the cover letter of the patch that this is violation of virtio spec & continue discussion on mailing list
  • It's remote's responsibility when it sees the detach instruction from Linux it should reload clean resource table & set status to 0
  • @glneo to review this PR and comment from TI perspective

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants