Thank you for a great piece.
I have a question about the positional encoding. Here you seem to interleave sine and cosine curves over the dimensions for a given position. The indexing in the original paper suggests the same, but the code at https://github.com/tensorfl... (and the discussion at http://jalammar.github.io/i... ) suggests a concatenation of sines on the left and cosines on the right.
I'm not sure it makes a substantive difference, but would you agree there's a difference in implementation here?
Return to School Today.
More information about what we collect and how we share your personal information is available in our Privacy Policy.