Yanda's Random Notes

Search

❯

❯

❯

Attention is all you need

Attention is all you need

May 20, 2023, 1 min read

Arxiv link
The use of self attention inside both encoder and decoder itself, not only encoder-decoder level “normal” attention
The clever positional encoding with sin and cos waves, and use residual blocks to propagate that information. You can think of it as a more concise way of binary encoding (for floats).

Transformer Architecture: The Positional Encoding

Multi-head attention is kinda like general attention where a linear layer is used when combining key and query, but probably better, as we could have multiple key, value, query now.

Master Positional Encoding: Part I

Graph View

Backlinks

DETR

Created with Quartz v4.1.5, © 2024