Short-003, A Dual Reinforcement Learning Framework for Unsupervised Text Style Transfer (2019-IJCAI)

0. Abstract

방법 1의 첫 번째 step: the content from the original style → fusing the content with the desired style
그러나 첫 번째 step에서 content와 style을 분리하는 것은 상당히 어렵다.
왜냐하면 문장을 구성하는데에 있어서 content와 style이 상호작용을 하기 때문이다.

Specifically, we consider the learning of the source-to-target and target-to-source mappings as a dual task
이에 대한 style accuracy와 content preservation으로 각각 두 가지 reward를 부과한다.

first separating the content from the original style and then fusing the content with the desired style (content, style 분리하고 합치는 방법)
directly removes the specific style attribute words in the input, and then feeds the neutralized sequence which only contains content words to a style-dependent generation model. (attribute words 없애고 style 입혀서 생성하는 방법)

The former line of research tends to only change the style but fail in keeping the content, since it is hard to get a style independent content vector without parallel data
Parallel data없이 하기 때문에 content와 independent한 style을 유지하기 어렵다.

첫 번째 방법의 단점을 해결하기 위해, content preservation을 강화하기 위해, continuous vector space 대신에 discrete space인 token 단위에서 neutralizing을 하게 된다.

따라서 “The only thing I was offered was a free dessert!!!” 와 같은 암묵적으로 표현된 감정의 문장은 처리하는데 한계가 있다.

one-step mapping model between the source corpora and the target corpora of different styles
Parallel data가 부족하기 때문에 learning of the source-to-target and target-to-source mapping models as a dual task 관점으로 접근 (당연한거 아닌가?)

mapping model f transfers an informal sentence x into a formal sentence y', while the backward one-step mapping model g transfers a formal sentence y into an informal sentence x'.
즉 이 그림을 보면, style을 바꾸는 모델이 각각 존재하는 것이다.
긍정에서 부정으로 바꾸는 모델1, 부정에서 긍정으로 바꾸는 모델2 이런 식임

We propose a dual reinforcement learning framework DualRL for unsupervised text style transfer, without separating content and style.
We resolve two daunting(어려운) problems (pre-training and generation quality) when model is trained via RL without any parallel data.
Experiments on two benchmark datasets show our model outperforms the state-of-the-art systems by a large margin in both automatic and human evaluation.
The proposed architecture is generic and simple, which can be adapted to other sequence-to-sequence generation tasks which lack parallel data.

이 논문은 one-step mapping 의 방식으로 model for the source-to-target style transfer and a dual mapping model for the target-to-source style transfer을 하였다.
Parallel 데이터의 부족함으로, dualRL 학습 방식을 제안하였고 자동으로 생성된 supervision signals으로 two개의 mapping 모델을 학습하였다.
In this way, we do not need to do any explicit separation of content and style, which is hard to achieve in practice even with parallel data.
sentiment transfer and formality transfer datasets에서 우리의 모델이 좋음을 보여준다.
Although pre-training and annealing pseudo teacher forcing are effective, they make the training process complicated.
Therefore, how to get rid of them and train the generative model purely based on RL from scratch is an interesting direction we would like to pursue.
Moreover, since the proposed architecture DualRL is generic and simple, future work may extend to other unsupervised sequence-to-sequence generation tasks which lack of parallel data.

Reference