Short-008, Pragmatically Informative Text Generation (2019-NAACL)

0. Abstract

우리는 계산의 pragmatics의 테크닉을 사용하여 conditional text generation에 대한 모델의 정보량을 향상시킨다.
이 기술은 언어 제작을 speakers와 listeners간의 게임으로 공식화합니다.

여기서 speaker는 listener가 텍스트가 설명하는 원래 입력을 올바르게 식별하는 데 사용할 수있는 출력 텍스트를 생성해야합니다.
마치 GAN과 같은 느낌인데?

이러한 접근법은 cognitive science와 grounded language learning에서 넓게 사용이 된지만, 조금 더 표준의 언어 생성 테스크에서는 관심이 줄어들었다.
우리는 두 가지 pragmatic modeling 방법으로 텍스트 생성을 한다.

one where pragmatics is imposed by information preservation
another where pragmatics is imposed by explicit modeling of distractors.

우리는 이러한 방법들이 abstractive summarization에 사용되는 기존의 강한 시스템과 structured MRs에 대한 생성에 대한 성능을 향상시킴을 발견하였다.

1 Introduction

실용주의에 대한 Computational 접근법은 게임이론 또는 베이지안 inference 과정으로써 언어 생성과 해석을 (cast)던졌다.
이러한 접근법들은 다양한 실용적 현상을 모델링할 수 있지만, NLP에서 그들의 주요 어플리케이션은 근본적인 language learning 문제들에서 생성된 텍스트의 informativeness을 향상시키는 것이다.
이 논문에서는, 우리는 pragmatic reasoning이 요약 또는 structured MR에서부터의 생성과 같은 전통적인 언어 생성문제에서 성능을 향상시키는데 유사하게 사용될 수 있음을 보여준다.
우리의 연구는 학습된 Speech Acts (RSA) models을 연구에 기반을 두고 이는 embedded listener의 행동을 최적화하도록 strings이 선택된다.
RSA 프레임 워크 (Frank and Goodman, 2012)의 canonical presentation은 reference resolution를 기반으로합니다: speakers 모델은 방해자가있는 경우 referents 를 설명하려고 시도하고 listeners 모델은 descriptors를 referents 에게 설명하려고 시도합니다.
최근 연구는 이러한 모델을 images와 trajectories을 포함하는 더 복잡한 groundings으로 확장한다.
이러한 설정에 사용된 기술들은 비슷하며 RSA 프레임 워크의 primary 직관은 유지됩니다.

speaker의 관점에서 볼 때, 좋은 description은 가능한 한 차별적으로 speaker가 listener이 식별하고자하는 콘텐츠를 선택하는 것입니다.

여기서부터
근거,인지 모델링 (Frank et al., 2009) 및 언어 현상의 대상 분석 (Orita et al., 2015) 이외에, 합리적인 언어 행위 모델은 자연어 처리 문헌에서 제한된 적용을 보았습니다.
이 연구에서 우리는 그것들이 lingustic content 또는 다른 자연 언어 텍스트에 대한 구조화 된 설명을 참조로 사용하는 독특한 클래스의 언어 생성 문제로 확장 될 수 있음을 보여줍니다.
최대 수량 (Grice, 1970) 또는 Q 원칙 (Horn, 1984)에 따라 실용적 접근 방식은 최신 언어 생성 시스템에서 관찰 된 정보 부족 문제를 자연스럽게 수정합니다 (그림 1의 S0).
여기까지는 번역 (뭔말인지 모르겠음...)
의미 표현에서 생성 (Novikova et al., 2017)과 요약이라는 두 가지 언어 생성 작업에 대한 실험을 제시합니다.
각 과제에 대해, 우리는 두 가지 실용주의 모델, 즉 Fried et al.의 재구성 자 기반 모델을 평가합니다. (2018) 및 CohnGordon et al.의 distractor-based 모델 (2018).
두 모델 모두 CNN / Daily Mail abstractive 요약과 E2E에서 SoTA을 달성했다.

2 Tasks

생략

3 Pragmatic Models

To produce informative outputs, we consider pragmatic methods that extend the base speaker models, S0, using listener models, L,

이것은 출력이 주어지면 가능한 입력에 대해 distribution L(i | o) 을 생성한다.
o가 출력 i가 가능한 입력?

리스너 모델은 리스너 모델 L이 올바른 입력을 식별할 가능성이 높은 출력을 생성하는 실용적인 스피커 S1 (o | i)를 도출하는 데 사용됩니다.
There are a large space of possible choices for designing L and deriving S1; we follow two lines of past work which we categorize as reconstructor-based and distractor-based.
We tailor each of these pragmatic methods to both our two tasks by developing reconstructor models and methods of choosing distractors.

3.1 Reconstructor-Based Pragmatics

Pragmatic approaches in this category (Dusek and ˇ Jurcˇ´ıcek ˇ , 2016; Fried et al., 2018) rely on a reconstructor listener model L R defined independently of the speaker.
This listener model produces a distribution L^R(i | o) over all possible input contexts i ∈ I, given an output description o.
We use sequence-to-sequence or structured classification models for L^R (described below), and train these models on the same data used to supervise the S0 models.
The listener model and the base speaker model together define a pragmatic speaker, with output score given by:

where λ is a rationality parameter that controls how much the model optimizes for discriminative outputs (see Monroe et al. (2017) and Fried et al. (2018) for a discussion).
We select an output text sequence o for a given input i by choosing the highest scoring output under Eq. 1 from a set of candidates obtained by beam search in S0(· | i).
autoregressive하게 예측하는 것에 대한 설명인 것 같은데? 근데 listner과 speaker 두 개의 모델이 있는 듯
수학기호만 보면 listener은 출력이 조건일때 입력의 분포를 생성하는 거고 speaker은 입력이 조건일때 출력을 생성하는 것이다.
즉 listener은 classification model과 같은 느낌이고 speaker은 generator model 같은 느낌이다.

A Supplemental Material

A.1 Reconstructor Model Details

E2E에서 reconstructor 기반인 speaker에 대해, 우리는 첫 번째로 Puzikov and Gurevych (2018)와 같은 preprocessing steps을 따른다.

이 것은 delexicalization 모듈이 sparsely하게 발생하는 MR attributes인 (NAME, NEAR)을 placeholder tokens와 같은 values로 매핑한다.
나머지 attributes는 placeholder 방식을 안쓰는 듯

MRs은 대부분의 attributes에 대해 오직 몇 가지 가능성있는 values밖에 없다.

8개중 6개의 arributes는 7개의 unique values을 가지고 있고 2개의 attributes (NAME, NEAR)은 S0와 S1에 대해 delexicalized placeholders으로 인해 핸들링 된다.

이 방법에서, reconstructor은 오직 boolean 변수와 함께 두 속성의 존재와 대응되는 카테고리의 변수에 맞는 다른 속성들만 예측하면 된다.
We use a one layer bi-directional GRU (Cho et al., 2014) for the shared sentence encoder.
We concatenate the latent vectors from both directions to construct a bi-directional encoded vector hi for every single word vector di as:

BiGRU 구조임

모든 단어가 각 MR 속성을 예측하는 데 똑같이 기여하는 것은 아니기 때문에 모든 단어의 중요성을 결정하기 위해주의 attention (Bahdanau et al., 2014)을 사용합니다.
task k에 대한 aggregated sentence vector는 다음과 같이 계산됩니다.

The task-specific sentence representation is then used as input to k layers with softmax outputs, returning a probability vector Y(k) for each of the k MR attributes.

4 Experiments

6 Conclusion

우리의 결과는 이전 연구의 S0 모델은 강력하지만 여전히 텍스트를 생성 할 때 사람들이 나타내는 행동을 불완전하게 포착 함을 보여줍니다.
명시적인 실용 모델링 절차는 결과를 개선 할 수 있습니다.
이 논문에서 평가된 pragmatic 방법은 입력을 전체적으로 재구성하거나 실제 입력을 distractors와 구별하여 입력을 구별하는데 사용할 수 있는 출력의 예측을 장려하므로 두 방법 모두 비슷한 성능 향상을 가져 오는 것은 놀라운 일이 아닙니다.
향후 작업을 통해 시퀀스 생성 파이프 라인 (e.g., with a learned communication cost model) 내에서 under- 와 over-informativity 간의 trade-off를 보다 세밀하게 모델링하거나 생성 파이프 라인의 초기에 컨텐츠 선택을 위한 실용적 응용을 탐색 할 수 있습니다.

Reference

https://arxiv.org/pdf/1904.01301.pdf

인공지능, AI, NLP, 논문 리뷰, Natural Language, Leetcode

AI Information

Short-008, Pragmatically Informative Text Generation (2019-NAACL)

0. Abstract

1 Introduction

2 Tasks

3 Pragmatic Models

3.1 Reconstructor-Based Pragmatics

A Supplemental Material

A.1 Reconstructor Model Details

4 Experiments

6 Conclusion

댓글

댓글 쓰기