Weight tying

Weight tying은 토큰의 weight matrix (nn.embedding) 을 정규화하는 하나의 방법으로 알려져 있다.
대표적인 논문으로는 Beyond Weight Tying: Learning Joint Input-Output Embeddingsfor Neural Machine Translation가 있다.
다음의 논문의 그림에서 잘 설명을 해주고 있다.
일반적인 개념은 다음과 같다.

Weight Tying : Sharing the weight matrix between input-to-embedding layer and output-to-softmax layer; That is, instead of using two weight matrices, we just use only one weight matrix. The intuition behind doing so is to combat the problem of overfitting. Thus, weight tying can be considered as a form of regularization.
즉 입력 embedding과 예측할 때 쓰이는 output embedding을 같은 matrix으로 서로 transpose 관계로 사용한다.
이것은 오버피팅을 막아주고 정규화 효과가 있다.

target word embedding의 weight을 tying하는 것은 NMT의 target word 분류기를 더 빠르게 학습하고 품질을 향상시켜준다.
Given the success of this parameter sharing, we investigate other forms of sharing in between no sharing and hard equality of parameters.
In particular, we propose a structure-aware output layer which captures the semantic structure of the output space of words within a joint input-output embedding.
The model is a generalized form of weight tying which shares parameters but allows learning a more flexible relationship with input word embeddings and allows the effective capacity of the output layer to be controlled.
추가적으로 모델이 output 분류기와 번역 context의 wieght을 공유함으로써 사전지식을 더 잘 활용할 수 있음을 보여준다.
Our evaluation on English-to-Finnish and English-to-German datasets shows the effectiveness of the method against strong encoder-decoder baselines trained with or without weight tying.

AI Information