Thank you for the impressive work and for releasing the code.
I was looking through the code and had a question about semantic scale repetition. In ar_infer_infinity_elegant, I see that each scale is repeated as described in the paper. However, I noticed that the training forward function doesn't seem to use the same repetition.
Is there a specific reason for this difference, or am I missing something in how the training pass is set up?