CA-395093: sync between tapback slave process and xenopsd#6825
CA-395093: sync between tapback slave process and xenopsd#6825chunjiez wants to merge 1 commit intoxapi-project:masterfrom
Conversation
Signed-off-by: Chunjie Zhu <chunjie.zhu@cloud.com>
|
The tapback side code update, xapi-project/blktap#435 |
|
add @MarkSymsCtx @TimSmithCtx @LunfanZhang @minglumlu @BengangY to review |
| wait_tapback_ready() | ||
| { | ||
| local statefile="/var/run/tapback.${DOMID}.statefile" | ||
| while true; do |
There was a problem hiding this comment.
Is it possible the while never exits if statefile fails to be created? Consider to add a timeout for max number of retries.
There was a problem hiding this comment.
A simple way would be:
seq 120 | while read i; do
...
sleep 1
done
This would iterate at most 120 times or (2min)
There was a problem hiding this comment.
I've already commented on the related PR to tapdisk that I don't think this is the correct approach and it just adds even more complexity and fargility to the system which will induce even more cost of maintenance going forward.
There was a problem hiding this comment.
@MarkSymsCtx could you link to that PR. What is a generally better approach?
There was a problem hiding this comment.
There was a problem hiding this comment.
It seems correct to me that this side needs to be sure that tapback is ready as otherwise tapback won't be able to process events. So checking for readiness for a limited time and failing if tapback is not ready seems to me not fragile and I would be curious how it could be avoided @MarkSymsCtx.
There was a problem hiding this comment.
If we generally expect tapback to be available and ready, we should only wait briefly before we fail. I agree with @BengangY that we should not wait indefinitely.
In some corner case, physical-device-path xenstore watch event is fired before slave tapback process ready to process xenstore watch event, thus, slave tapback process would miss xenstore watch event, then blktap io datapath fails to establish.
In xenopsd side, the vbd-script waits for tapback slave process ready by checking /var/run/tapback..statefile, if the file is present and file contains "ping" string, then vbd-script updates the file, writes "pong" to the file and continues to update xenstore, otherwise, just wait.
In tapback slave process side, once it get prepared to process xenstore watch event, it writes "ping" string to /var/run/tapback..statefile, then waits for acknowledge by checking if the file contains "pong" string, after seeing "pong" string, it removes /var/run/tapback..statefile and continues to work.