Short-007, Learning Neural Templates for Text Generation (2018-EMNLP)

■ Comment

이 논문에서는 모델을 해석 및 설명 가능하도록 HSMM 디코더를 사용하는 모델이다.
HSMM은 모델이 해석가능하고 컨트롤가능한 template을 만드는 역할을 할 수가 있다고 한다.
그림 1을 보면 templates가 딱 정해져있다기 보다는 후보들이 있는 개념이다.
그래서 입력에 따라 어떻게 template을 구성해서 가져갈지, HSMM 모델로 학습하고 테스트하는 식으로 보여진다.

0. Abstract

뉴럴 encoder-decoder 모델들은 언어생성에서 중요한 경헙적인 결과를 가지고 있지만 이것은 style 생성에는 설명불가능한 것을 남겨두었다.
Encoder-decoder 모델들은 대체로 그들의 parsing(문구) 혹은 content(내용)에 관하여 uninterpretable(해석불가능)하고 컨트롤하기 어렵다.
이 연구는 hidden semi-markov model (HSMM) decoder을 사용하여 뉴럴 생성 시스템을 제안한다.

HSMM은 학습과 생성을 jointly하게 latent, discrete templates을 배운다.

우리는 모델이 유용한 templates을 배우고 이 템플릿이 interpretable하고 controllable한 것을 생성하는 것을 보여준다.
게다가 우리는 이 접근법이 real data sets로 확장가능하고 encoder-decoder text generation 모델과 가까운 강한 성능을 보여준다.

1 Introduction

번역과 관련된 연구에서 encoder-decoder의 성공을 이어서 general 목적인 data-driven NLG 시스템이 매우 큰 관심을 받고 있다.
이러한 encoder-decoder 모델들은 뉴럴 encoder 모델을 사용하여 source knowledge base을 표현하고 source encoding을 조건이 주어지면 decoder 모델로 textual description word-by-word을 생성한다.
이러한 style 생성은 전통적인 NLG 영역과 대조되며 두 가지 질문을 설명하는 것이 강조된다.

what to say
how to say it
이것들은 시스템이 정확히 content selection, macro-와 micro-planning과 surface realization components으로 이끈다.

Encoder-decoder 생성 시스템은 사람의 노력을 줄여주면서 NLG 출력의 fluency을 향상시킨다.
그러나 black-box 특성 때문에 일반적인 encoder-decoder 모델들은 대체적으로 두 가지 중요한 요구사항 "(b) 내용과 형식측면에서 쉽게 컨트롤"할 수 있는 "(a) 해석가능한 출력"을 희생한다.
이러한 연구는 해석가능하고 컨트롤가능한 뉴럴 생성 시스템 구축을 고려하고 specific first step을 제안한다.

불연속인 template과 같은 conditional text generation을 위한 구조를위한 새로운 data-drivien 생성 모델을 제안

core 시스템은 새로운 뉴럴 hidden semimarkov model (HSMM) decoder을 사용하고 이것은 원천적으로 template-like 텍스트 생성을 제공한다.
우리는 인퍼런스를 통한 backpropagation의한 전체의 data-driven 방법에서 이 모델을 학습하기 위한 더욱이 효율적인 방법을 설명한다.
template-like 구조와 같은 생성은 neural HSMM을 포함하고 정확한 representation을 할 수 있다.

시스템의 의도가 말하고자 하는 것(in the form of a learned template)
시스템이 말하는 방식(in the form of an instantiated template)

우리는 위에서 말한 두 가지 요구사항을 만족시키면서 다른 뉴럴 NLG 접근법에 경쟁력있는 성능을 달성했다.
구체적으로 우리의 실험은 경쟁력있는 automatic scores을 달성하면서 우리가 명시적인 템플릿을 유도할 수 있고 우리는 이러한 템플릿을 manipulating함으로써 생성을 해석하고 컨트롤할 수 있다.

마침내, 우리의 실험은 data-to-text 제도에 중점을 두지만, 우리가 제안한 강력한 접근법은 conditional text의 불연속하고 latent-variable representation을 배운다.

2 Related Work

생략

3 Overview: Data-Driven NLG

우리는 MR 혹은 knowledge base의 textual description을 생성하는데 집중한다.
$x = \{ r_1, \cdots r_J \}$ 은 records의 모음으로 record는 다음의 것으로 구성되어 있다.

a type (r.t), an entity (r.e), and a value (r.m).

예를 들어 knowledge base of restaurants은 다음과 같이 구성되어 있다.

r.t = Cuisine, r.e = Denny’s, and r.m = American

목표는 적절하고 유창한 text 설명을 생성하는 것이다.

x의 text description: $\hat{y}_{1:T} = \hat{y}_1, . . . , \hat{y}_T$

구체적으로 우리는 E2E 데이터세트와 WikiBio 데어터세트을 고려한다.
그림1에의 가장 위줄이 E2E knowledge base x을 보여준다.
그림 2의 윗부분이 WikiBio 데이터세트의 knowledge base x을 보여준다.

x와 밑의 부분의 reference text y=y1:T와 쌍을 이루는 데이터세트이다.

NLG에서 가장 지배적인 방법은 x을 encoder을 통과시킨 후 decoder로 y을 생성하는 것이다.

그러면서 end-to-end 학습을 한다.

주어진 example을 생성하기 위해 black-box 네트워크는 (RNN과 같은) 다음 단어의 distribution을 생성한다.

계속해서 다음 단어를 생성하고 여기서 선택을 하고 시스템이 피드백을 준다.

전체의 distribution은 뉴럴 네트워크의 내부 states로부터 결정이 된다.
효과적이지만, 뉴럴 디코더에 의존하는 것은 x의 어떤 측면이 시스템 출력에 영향을 주는지 알기 어렵다.
이것은 생성 프로세스의 세밀한 측면을 컨트롤하고 모델의 실수를 이해하는데에 문제가 있다.
컨트롤성이 중요하는 예제로 그림 1의 records을 고려해보자.
이러한 입력이 주어지면 end-user은 특별한 constraint을 만족하는 출력을 생성하길 원할 것이다.

customer rating과 관련된 어떠한 정보도 언급되지 않은 것 과같은 constraint

standard encoder-decoder style 모델에서 encoder 혹은 decoder에서 이 정보를 필터링할 수 있지만, 실제로는 이것이 전체 시스템을 통하여 전파될 수 있는 예상치 못한 출력의 변화를 이끌어낼 수 있다.
실수를 해석하는 어려움의 예제로 그림 2의 encoder-decoder style 시스템이 실제로 생성하는 예제를 보자.

”frederick parker-rhodes (21 november 1914 - 2 march 1987) was an english mycology and plant pathology, mathematics at the university of uk.”
In addition to not being fluent, it is unclear what the end of this sentence is even attempting to convey:
it may be attempting to convey a fact not actually in the knowledge base (e.g., where Parker-Rhodes studied), or perhaps it is simply failing to fluently realize information that is in the knowledge base (e.g., ParkerRhodes’s country of residence).
쉽게 말해서 무엇을 얘기하고 싶은지 알기 어려운 예제라는 것

Traditional NLG systems (Kukich, 1983; McKeown, 1992; Belz, 2008; Gatt and Reiter, 2009), in contrast, largely avoid these problems.
Since they typically employ an explicit planning component, which decides which knowledge base records to focus on, and a surface realization component, which realizes the chosen records, the intent of the system is always explicit, and it may be modified to meet constraints.
The goal of this work is to propose an approach to neural NLG that addresses these issues in a principled way.
We target this goal by proposing a new model that generates with template-like objects induced by a neural HSMM (see Figure 1).
Templates are useful here because they represent a fixed plan for the generation’s content, and because they make it clear what part of the generation is associated with which record in the knowledge base.
위 말들을 쉽게 정리하면, 템플릿과 비슷한 neural HSMM을 이용하여 새로운 모델을 제시했는데, 템플릿은 어떤 content을 생성할지 명확하고 knowledge base의 record와 연관되어있는지 명확하기에 유용하다고 말한다.

4 Background: Semi-Markov Models

생략

5 A Neural HSMM Decoder

우리는 참신한 neural parameterization of an HSMM으로 likelihood의 확률을 지정한다.
그림 3에 나와있는이 전체 모델을 사용하면 HSTM 구조를 유지하면서 신경 텍스트 생성을 효과적으로 만드는 LSTM 및주의와 같은 모델링 구성 요소를 통합 할 수 있습니다.

5.1 Parameterization

5.2 Learning

5.3 Extracting Templates and Generating

5.4 Discussion

컨트롤성과 해석의 discussion으로 돌아와서 우리는 제안된 모델은 다음이 가능하다.

a) it is possible to explicitly force the generation to use a chosen template $z^{(i)}$ , which is itself automatically learned from training data
(b) that every segment in the generated $\hat{y}^{(i)}$ is typed by its corresponding latent variable.

이러한 속성들은 다른 text applications에 비해 유용하고 그들은 어떻게 텍스트의 latent variable modeling에 접근하는 추가 관점에 대해 제공한다.
텍스트에 대한 continuous latent variable representation을 배우는 데 최근에 관심이 많았지만 (섹션 2 참조), 배울 latent variables가 무엇을 포착할 것인지는 분명하지 않습니다.
반면에, 여기서 유도하는 latent template와 유사한 구조는 그럴듯하고 확률적인 잠재 변수 스토리를 나타내며보다 컨트롤 가능한 생성 방법을 허용합니다.
마지막으로,이 모델에서 가능한 한 가지 중요한 문제, 즉 해당 잠재 변수와 x가 주어지면 세그먼트가 서로 독립적이라는 가정을 강조합니다.
여기서 우리는 x를 조절할 수 있다는 사실이 매우 강력하다는 것을 알 수 있습니다. 실제로, clever 인코더는 인코딩에서 생성 될 세그먼트들 (e.g., the correct determiner for an upcoming noun phrase) 사이에 필요한 상호 의존성을 상당 부분 포착 할 수 있어, x가 주어지면 세그먼트 자체가 다소 독립적으로 디코딩 될 수있다.

6 Data and Methods

7 Results

8 Conclusion and Future Work

우리는 HSMM 디코더를 기반으로 뉴럴 template-like 생성 모델을 제시했다.

이것은 dynamic program을 통하여 backpropagating을 통해 학습을 다루기 쉽다.

이 방법은 state sequences의 형식으로 template-like latent objects을 원칙적으로 추출한다음 그들을 생성할 수 있다.
이 접근법은 대규모 텍스트 데이터 세트로 확장되며 encoder-decoder 모델과 거의 경쟁이 치열합니다.
조금 더 중요하게, 이 접근법은 생성의 다양성을 컨트롤할 수 있고 생성하는 동안 해석가능한 states을 만들 수 있다.
우리는 이 작업을 첫 번째 단계로 생각하는데

일반적으로 latent variable 텍스트 모델들 학습하는 관점에서 좀 더 어려운 생성 task를 위해 discrete latent variable template models 학습하는 방향의 첫 번째 단계

향후 연구는 모델이 장래에 paragraphs and documents과 같은 larger textual phenomena의 템플릿과 hierarchical 템플릿을 명시적으로 장려하지 않는 최대한 다른 (또는 최소한의) 템플릿을 배우도록 장려할 것입니다.

Reference

https://arxiv.org/pdf/1808.10122.pdf

인공지능, AI, NLP, 논문 리뷰, Natural Language, Leetcode

AI Information

Short-007, Learning Neural Templates for Text Generation (2018-EMNLP)

■ Comment

0. Abstract

1 Introduction

2 Related Work

3 Overview: Data-Driven NLG

4 Background: Semi-Markov Models

5 A Neural HSMM Decoder

5.1 Parameterization

5.2 Learning

5.3 Extracting Templates and Generating

5.4 Discussion

6 Data and Methods

7 Results

8 Conclusion and Future Work

댓글

댓글 쓰기