soundwire: cadence: Fixes for three interrupt races#5659
Draft
rfvirgil wants to merge 3 commits intothesofproject:topic/sof-devfrom
Draft
soundwire: cadence: Fixes for three interrupt races#5659rfvirgil wants to merge 3 commits intothesofproject:topic/sof-devfrom
rfvirgil wants to merge 3 commits intothesofproject:topic/sof-devfrom
Conversation
…thread Clear the CDNS_MCP_INT_RX_WL interrupt before signalling completion. This is to prevent the possible race: - completion is signalled. - waiting thread is immediately scheduled and starts a new message. - new message completes and CDNS_MCP_INT_RX_WL is asserted. - reschedule to sdw_cdns_irq() thread, which then clears the new interrupt. - interrupt is never handled. Before this change, this error message was sometimes seen on kernels that have large amounts of debugging enabled: SCP Msg trf timed out This error indicates that the completion has not been signalled after 500ms. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Fixes: 956baa1 ("soundwire: cdns: Add sdw_master_ops and IO transfer support")
Remove the masking of CDNS_MCP_INT_SLAVE_MASK while the work thread runs. This is to avoid losing interrupts for state changes that happen while the work is running. The code assumed that any status change events that happened while CDNS_MCP_INT_SLAVE_MASK is masked would be latched and would assert the main interrupt when CDNS_MCP_INT_SLAVE_MASK is unmasked. We can avoid relying on this assumption by leaving CDNS_MCP_INT_SLAVE_MASK unmasked. There is no need to mask the interrupt while waiting for the work to run. The kernel will already filter out schedule_work() for work that is already pending. The worst that can happen is that the work can be re-queued while it is running, and so will run again when there are no more status changes. But the core code already has to filter out duplicate states because sdw_handle_slave_status() is given the current status of all devices, so some of those will usually be unchanged. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Fixes: 4a98a6b ("soundwire: intel/cadence: merge Soundwire interrupt handlers/threads")
Clear interrupts before handling them in sdw_cdns_irq() to prevent loasing events that re-triggered before they can be handled. Clearing interrupts at the end of sdw_cdns_irq() is a potential race. If the event re-triggered while sdw_cdns_irq() is still running it would be cleared when sdw_cdns_irq() ends, so the irq handler would not be re-triggered to handle it. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Fixes: 2f52a51 ("soundwire: cdns: Add cadence library")
Author
|
@bardliao @shumingfan Do you have soak/stress tests you could run on Realtek so we can get more confidence this doesn't cause any new problems? |
Collaborator
|
@rfvirgil There are some
That means Device 0 is attached. Not sure if it is a real issue. |
bardliao
reviewed
Feb 2, 2026
| CDNS_MCP_INT_SLAVE_MASK, 0); | ||
|
|
||
| int_status &= ~CDNS_MCP_INT_SLAVE_MASK; | ||
|
|
Collaborator
There was a problem hiding this comment.
@rfvirgil @shumingfan Is it possible that a Peripheral can generate a new event while handling a special event and lead to an infinite loop in a corner case?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This series is to fix three places in the cadence IRQ handling where interrupt evens could be lost.
Two are variations of the same problem that the code cleared interrupts AFTER it handled them. This is not the correct way, because any interrupt that re-triggered while the irq handler function is running would then be cleared at the end of the function without ever having been handled.
The other case is to avoid masking the PING status change interrupt while waiting for the work function to handle it. There is a potential problem here if state changes that happen while masked are lost,