If Attention Were a Study Group
While reviewing the Transformer model, I took some time to deeply reconsider the meaning of queries (Q), keys (K), and values (V) in the attention layer—concepts I thought I already understood. I've tried to explain them here using an analogy of a study group.
#ML