stride_kz, stride_kh, stride_kn, stride_kk,
stride_vz, stride_vh, stride_vk, stride_vn,
Here, the key has the seq dimension first and the dim dimension second, while the value is the opposite.
However, when transferring parameters, including in the test code, both the key and value have the seq dimension first.
I am very confused by this inconsistency. Could you explain the reason for this?