NL-036, A Neural Conversational Model (2015-ICMLW)

0. Abstract

Conversational modeling은 language modeling에서 중요한 task이다.
이전의 연구들은 specific domain과 이에 따른 hand-crafted rule 기반으로 해결하였다. (이 논문이 딥러닝으로 첫 시도한 것인가?)
이 논문에서는 seq2seq을 이용하였고 대화에서 previous sentence에서 next sequence을 예측하는 식으로 하였다.
이 모델의 강점은 end-to-end이기 때문에 hand-crafted rule이 필요없다.
모델은 large conversational training dataset에서 학습을 하였고 간단한 모델이다.
그 결과, obj function이 잘못되었음에도 불구하고 (대화를 하는 obj가 아님에도 불구하고라는 뜻인 듯) 제대로 conversation을 한다.
Domain specific dataset와 large noisy and general domain dataset에서 지식을 추출하여 진행한다.

domain specific dataset은 IT helpdesk dataset을 사용
open-domain dataset은 영화 스크립트 데이터세트를 사용하였다.

그 결과 모델은 간단한 상식을 잘 추론하나, lack of consistency의 문제점이 있다.

1. Introduction

최근 딥러닝(이 당시 2015년) 연구 발전에 대해 설명함.
이 논문에서 말하길, 이러한 방법은 (딥러닝 학습) 도메인에 대한 지식이 없어서 잘 이용하기 힘들 때, 즉 rule 등을 디자인 하기 힘들어도 연구자들이 task에 대한 작업을 할 수 있게 해준다.
Conversational modeling은 seq2seq의 formulation을 활용할 수 있다. 왜냐하면 이는 queries와 response가 대응이 되는 형식이기 때문이다.

이전에는 mapping의 복잡성 때문에, 매우 좁은 영역이어서 직접 feature 디자인하고 되게 힘들었다고 함...

이 연구에서는 sentence가 주어지면, 그 다음 sentence을 예측하는 식으로 seq2seq을 적용하였고 놀랍도록 꽤 잘된다.
두 가지 데이터 세트에 대해서 실험하였다.

IT helpdesk에서 실험

이는 사용자에게 유용한 답변을 내어준다.

영화 자막에서 실험

자연스러운 대화흐름을 만들어 낸다.
떄때로 common sense, 상식적으로 추로한 가능한 답변을 낸다.

이 두가지 데이터에서 모두 n-gram model 보다 PPL 성능이 좋다.

질적인 측면에서 이 논문 모델은 자연스러운 대화가 가능하다.

2. Related Work

Seq2seq에 대한 설명

영어-프랑스어 간의 번역 문제에서 효과적인 성능을 보임
Parsing, image captioning 등에서도 사용 됨.
Vanila RNN은 gradient vanshing 문제가 있기 때문에 LSTM 등을 사용하여 연구하는 추세

과거 수십년의 conversational agent에 대한 연구는 이 논문에서의 scope에서 벗어난다.

즉 이 논문 모델은 end-to-end 접근법으로 lacks domain knowledge에 적용할 수있다.
확률 모델이 maximize the probability of the answer given some context 하도록 학습을 하는 것이다.

3. Model

Seq2seq 모델을 사용하였따.

token이 들어왔을 때, 다음 token Maxmize cross entropy을 가지도록 학습하는 식임.
Inference할 때, greedy와 beam-search (less greedy) 하도록 할 수 있음. (여기서는 greedy, 즉 가장 큰 확률을 가지는 token을 출력)

Figure 1 처럼 학습을 한다.

근데 seq2seq에서는 reply 부분이 역순으로 하는게 좋았다고 햇던 거 같은데 여기서는 그렇게는 안했나봄.

이 모델은 상당히 간단하여 다른 task에도 적용 가능하다. (번역, QA 등)

또한 입력 문장을 이전 대화의 concatenation으로 할 수도 있음.
하지만 어떻게 구성했는지는 안 써있음.
CoQA의 SDNet 경우도 max_length 설정해서 입력 문장을 구성했었음.

하지만 seq2seq 모델은 번역문제를 푸는 것과 달리 conversational model이 사람의 대화처럼 할 수가 없다.

왜냐하면, objection function은 다음 문장을 예측하는 형식으로 구성되는데, 실제로 사람의 소통은 정보가 교환되는 식이기 때문이다.
즉 next step prediction 만으로는 consistency와 general word knowledge을 보장할 수가 없다고 말한다.

4. Datasets

두 가지 데이터세트 사용

Closed-domain IT helpdesk 문제해결 데이터세트
Open-domain movie transcript 데이터세트

4.1 IT Helpdesk Troubleshooting dataset

일반적으로 400 단어의 길이로 되어있음.
Training set은 30M tokens / validationd은 3M tokens
전처리로 removing common names, numbers and full URLs

4.2 OpenSubtitiles dataset

영화 자막으로 XML format이라고 한다.
간단한 프리프로세싱으로 removing XML과 obvious non-conversational text (hyperlink 같은 것)은 업앴다.
Training dataset: 62M 문장 (923M tokens)
Validation dataset: 26M 문장 (395M tokens)
데이터에 noisy한 부분들이 껴있기는 하다.

5. Experiments

Rule-based CleverBot과 비교를 했다고 함.

5.1 IT Helpdesk Troubleshooting experiments

Single-layer 1024 LSTM
SGD로 학습
Vocab: 20k words (특별한 token 포함)
Model PPL: 8
N-gram model PPL: 18
(예시) Conversation 1: VPN issues.
Describe your problem: i am having issues accessing vpn
Machine: hi
Human: hello
Machine: could you please let me know what are the operating systems you are using ?
Human: i am using linux
Machine: ok it was in the ssl vpn on which you stated it was broken
Machine: please check if you are logged in to corp network
Machine: what is the error that you are running please
Human: i am seeing an error related to vpn
Machine: what is the error message that you are getting when connecting to vpn using network connect ?
Human: connection refused or something like that
Machine: may i know the version of network connect you connect ?
Human: i am not sure i know that
Machine: is the network connect prompting for any user certificate
Machine: ?
Human: yes
Machine: are you at home or at work at the moment ?
Machine: office ?
Human: i am at home
Machine: try this
Machine: goto page and enter your name and password
Human: done Machine: try logging in with and check if its working fine now
Human: yes , now it works !
Machine: great
Machine: anything else that i can help
Human: no , that was very useful
Machine: you are other than : )
Machine: have a nice night
논문에 더 많은 예시 있으니 참고!

5.2 OpenSubtitles experiments

Two-layered LSTM 4096 memory cells
AdaGrad로 학습
Vocab: 자주 발생하는 100K words
Speed 때문에 Softmax는 classifier의 입력을 2048 차원으로 projection 한다고 하는데 정확히 몬지 몰겠음
Val PPL: 17
Smoothed 5-gram PPL: 28

5.2.1 Model Samples

Conversation 3: General knowledge Q&A
Human: who is skywalker ?
Machine: he is a hero .
Human: who is bill clinton ?
Machine: he ’s a billionaire .
Human: is sky blue or black ?
Machine: blue .
Human: does a cat have a tail ?
Machine: yes .
Human: does a cat have a wing ?
Machine: no
Human: can a cat fly ?
Machine: no .
Human: how many legs does a cat have ?
Machine: four , i think .
Human: how many legs does a spider have ?
Machine: three , i think .
Human: how many legs does a centipede have ?
Machine: eight .
Human: what is the color of the sky ?
Machine: blue .
Human: what is the color of water ?
Machine: water .
Human: what is the color of blood ?
Machine: it is the same as a black eye .
Human: what is the usual color of a leaf ?
Machine: it is a green one .
Human: what is the color of a yellow car ?
Machine: yellow .
Human: how much is two plus two ?
Machine: four .
Human: how much is ten minus two ?
Machine: seventy - two .
논문에 더 많은 예시 있으니 참고!
여기서 보여주는 예시들은 첫 번째 질문 말고는 학습 데이터세트에 없는 질문들이라고 함.
알 수 있는 단점은 대답이 때로는 불만족스럽다는 것이다. (완벽하지 않다는 것)
또한 모델이 일관성이 적은 경향이 있고 의미론적으로 똑같은 질문이 아닐 경우 대답이 일관성이 없다.
This is expected due to the simplicity of our model and the dataset in our experiments.

이러한 문제는 데이터와 모델이 단순하기 때문에 일어나는 것.. 이라고 유추하는 것 같다.

5.2.2 Human Evaluation

4명의 사람이 이 논문 모델과 CleverBot과 200개의 질문에 대한 답변을 비교하여 평가하였음.
Our model (NCM): 200개중 97개 선택됨
CleverBot: 200개중 60개 선택됨
20개는 동점
23개는 사람들의 의견 불일치
아래는 이것에 대한 샘플이다.
Human: What’s thirty plus forty ?

NCM: Sixty ?
CB: What should we talk about ?
동점

하지만 conversational model을 평가하는 빠른 측정하는 좋은 방법은 open research이다. (맨날 하는 말..지금까지도)

즉 사람에게 맡기면 시간도 오래걸리고 바로바로 알기가 어렵다는 것..

또한 이 모델은 명백한 단점이 있기 때문에 다른 방향으로 연구가 되길 원한다고 함!

아주 솔직하구만..

6. Discussion

아직은 모델의 한계가 존재하기 때문에 정말 realistic한 대화를 하려면 많은 부분을 수정해야 할 것이다.
하지만 rule에 대한 정보 없이 data driven 방식으로 했다는 것에 의의가 있다.

Reference

논문: https://arxiv.org/pdf/1506.05869.pdf

인공지능, AI, NLP, 논문 리뷰, Natural Language, Leetcode

AI Information

NL-036, A Neural Conversational Model (2015-ICMLW)

0. Abstract

1. Introduction

2. Related Work

3. Model

4. Datasets

4.1 IT Helpdesk Troubleshooting dataset

4.2 OpenSubtitiles dataset

5. Experiments

5.1 IT Helpdesk Troubleshooting experiments

5.2 OpenSubtitles experiments

5.2.1 Model Samples

5.2.2 Human Evaluation

6. Discussion

댓글

댓글 쓰기