Short-004, Mask and Infill: Applying Masked Language Model to Sentiment Transfer (2019-IJCAI)

0. Introduction

RNN 기반의 encoder-deocoder은 단어들 사이의 long dependency 문제가 있기 때문에 이전의 연구들에서는 처음부터 만족스러운 문장을 생성하기 어려웠다.
사람이 문장의 sentiment attribute만을 바꾼다고 생각한다면, simple 하면서 효과적인 것은

In the mask step, we separate style from content by masking the positions of sentimental tokens.
In the infill step, we retrofit MLM to Attribute Conditional MLM, to infill the masked positions by predicting words or phrases conditioned on the context1 and target sentiment.

with broad applications of review sentiment transformation, news rewriting, and so on.

“I highly recommend this movie” to a negative one “I regret watching this movie”.

[Shen et al., 2017; Fu et al., 2018; Xu et al., 2018; Li et al., 2018]
Some of them [Shen et al., 2017; Prabhumoye et al., 2018; Fu et al., 2018] try to learn the disentangled representation of content and attribute of a sentence in a hidden space,
while the others [Xu et al., 2018; Li et al., 2018] explicitly separate style from content in feature-based ways and encode them into hidden representations respectively.

RNN은 높은 품질을 생성하는데 한계가 있다. 또한 short range의 예측 능력에만 제한적이다. (긴 문장은 realistic 하지 않다.)
사람들이 sentiment transfer을 할 때를 생각해보면 do not have to create a new sentence from scratch.

즉 처음부터 문장을 만들 필요가 없다!!
대신에 sentimental tokens만을 바꾸면 된다.
예시) “terrible scenery and poor service”, → “beautiful scenery and good service”.
이는 text infilling 혹은 Cloze과 유사하다.

이 방법은 self-attention을 사용하고 attention weight을 학습하게 된다.
Attention 기반의 방법은 n-gram 방법보다 선호되고 이는 MLM이 다양한 문장의 표현을 할 수 있도록 해준다. (왜 다양하게 하지?)

따라서 이 논문에서는 a simple fused method to utilize the merits of both methods 을 제안한다.
이 두가지 방법을 결합하여 false attribute markers은 filtering을 하는 식이다.
또한 frequency-ratio 방법이 attribute marker을 너무 많이 인식하지 못하면 attention-based 방법을 바로 적용한다고 한다.

채울 때 특정한 sentiment에 맞게 채우기 위해 MLM을 Attribute Conditional Masked Language Model (AC-MLM)으로 수정을 한다.
또한 pre-trained sentiment classifier을 이용하여 생성된 문장이 제대로 된 sentiment을 가지는지 제약을 걸어준다고 한다.

To deal with the discrete nature of language generation, soft-sampling을 활용한다고 한다.

soft-sampling은 gradient을 back-propagtaion통하여 학습하기 위하여 sampling process by using an approximation of the sampled word vector이다.

Yelp, Amazon에 데이터세트로 실험을 하여 quantitative, qualitative, and human evaluations으로 평가를 하였다.
Contribution은 다음과 같이 요약이 가능하다.

We propose a two-stage “Mask and Infill” approach to sentiment transfer task, capable of identifying both simple and complex sentiment markers and producing high quality sentences.
Experimental results show that our approach outperforms most state-of-the-art models in terms of both BLEU and accuracy scores.
We retrofit MLM to AC-MLM for labeled sentence generation. To the best of our knowledge, it is the first work to apply a pre-trained masked language model to labeled sentence generation task.

이 논문에서는 2-stage 방법인 “mask and infill”을 제안한다.
이 방법은 실험결과에서 transfer accuracy and semantic preservation 둘 다 좋은 성능을 보여준다.
Future work로 more than two sentiments에 대한 연구와 how to apply the masked language model to other tasks of natural language generation beyond style transfer에 대해 연구를 하겠다.

Reference