◼️ Comment

ref 논문으로 삼기 위해, 간단히 몇 부분만 읽어보았다.
이 논문은 commonsense knowledge을 이용하여 모델 성능을 향상 시킨 것이 가장 큰 기여점이라고 생각한다.

지식으로는 ConceptNet과 NRC_VAD을 이용하였다.
NRC_VAD는 감정관련 지식이고 ConceptNet는 일반적인 지식 개념이다.

모델 구조는 MRC 느낌처럼 설계를 하였고, 그래프 기반의 모델링 구조가 있다.

큰 흐름은 context와 commonsense knowledge을 고려한 모델링이다.
개인적으로는 context을 고려할 때, bidirectional 한 것보다 unidirectional 한 모델링을 하는게 실제 상황과 비슷하지 않나?라고 생각한다.
대화는 한 방향으로 흐르기 때문에... 근데 여기서는 bidirectional 한 정보를 사용한다.

이 논문을 제출당시 SoTA 라고 한다.

0 Abstract

인간 대화에서 메세지는 본질적으로 감정을 전달한다.
textual 대화에서 감정을 검출하는 테스크는 social 네트워크에서 opinion mining과 같은 넓은 범위의 어플리케이션으로 이끈다.
그러나, 대화속 감정을 분석할 수 있도록 하는 것은 챌린지하고 부분적으로는 사람들이 종종 context와 commonsense knowledge에 의존하여 감정들을 표현하기 때문이다.
이 논문에서, 우리는 Knowledge Enriched Transformer (KET)을 제안하여 이 문제를 설명한다.

KET의 contextual utterances들은 hierarchical self-attention을 사용하여 해석되고 external commonsense knowledge는 context-aware affective graph attention mechanism을 사용해서 활용된다.

여러 개의 텍스트 대화 데이터세트에서 실험들은 context와 commonsense knowledge 둘다에 대한 실험들은 감정 검출 성능에 유익함을 보여준다.
게다가, 실험적인 결과들은 우리의 KET 모델이 대부분의 테스트 데이터세트에서 F1 score이 SoTA보다 좋음을 보여준다.

1 Introduction

감정들은 사람들에서 "generated states"로 환경, 자아, 다른 소셜 에이전트의 판단을 반영하는 것이다. (Hudlicka, 2011)
사람 대화에서 메신저는 본질적으로 감정들을 전달하는 것이다.
Facebook Messenger와 같은 소셜 미디어 플랫폼과 Amazon Alexa와 같은 대화형 에이전트가 널리 보급됨에 따라 기계가 자연스러운 대화에서 인간의 감정을 이해해야 할 필요성이 커지고 있다.
이 연구는 텍스트 대화에서 감정 검출 (e.g., happy, sad, angry, etc.) 을 설명하며, 발화의 감정은 대화형 context에서 검출된다.
대화에서 감정을 효과적으로 감지 할 수 있다는 것은 소셜 미디어 플랫폼의 오피니언 마이닝 (Chatterjee et al., 2019)에서 감정 인식 대화 에이전트 구축 (Zhou et al., 2018a)에 이르기까지 다양한 응용 분야로 이어집니다.
그러나, 기계가 대화에서 사람의 감정을 분석하게끔하는 것은 챌린지하고 인간이 종종 context와 commonsense knowledge에 의존해서 감정을 표현하기 때문에 감정 포착이 어렵다.
그림 1은 감정 검출을 할 때, 대화에서 context and commonsense knowledge의 중요성을 보여주는 것이다.
contextual information을 모델링하여 대화속 감정을 검출하는 최근 여러 연구들이 있다.

Poria et al. (2017) and Majumder et al. (2019)은 RNN으로 시퀀스에서 contextual utterances을 모델링하여 활용합니다.
각 utterance은 초기 단계에서 컨벌루션 신경망 (CNN)에 의해 추출 된 feature vector로 표현됩니다.

유사하게, Hazarika et al. (2018a,b)은 memory networks에서 추출한 CNN features 으로 contextual utterances을 모델링한다.
그러나, 이러한 방법들은 feature extraction과 tuning이 분리되어있고, 이는 real-time application에 이상적인 방법이 아니다.
추가적으로, 우리가 아는한, external knowledge bases로부터 commonsense knowledge을 결합하여 텍스트 대화에서 감정을 검출하는 경우는 없었다.
Commonsense knowledge은 대화를 이해하고 적절한 응답을 생성하는데 기초적이다 (Zhou et al., 2018b).
끝으로, 우리는 KET을 제안하여 효과적으로 contextual information와 external knowledge base을 통합하여 언급된 챌린지를 해결한다.
Transformer은 강력한 representation learning model을 번역, language understanding과 같은 많은 NLP 테스크에서 보여져왔다.
Transformer의 self-attention와 cross-attention 모듈은 intra-sentence, inter-sentence correlation을 각각 캡쳐한다.
이 두 모듈의 정보흐름의 짧은 패스는 gated RNNs와 CNNs에 비하여 KET가 contextual infromation을 좀 더 효율적으로 모델링하도록 한다.
추가적으로, 우리는 hierarchical self-attention 메커니즘을 제안하여 KET가 hierarchical structure of conversations을 모델링하게끔 한다.
우리의 모델은 context와 response을 분리하여 encoder와 deocder에 각각 넣고, 이는 다른 Transformer-based 모델들과 다르다.

BERT는 context와 response을 concat하여 직접적으로 넣고나서 encoder 파트만을 이용하여 LM을 학습한다.

더욱이, commonsense knowledge을 활용하기 위해, 우리는 related knowledge entities을 참조함으로써 external knowledge base을 활용하여 발화의 각 단어를 이해하는데 더욱 용이하게 한다.
referring 프로세스는 dynamic하고 retrieved knowledge entities의 relatedness와 affectiveness 사이의 밸런스를 context-aware affective graph attention 메커니즘으로 잡는다.
Contribution

For the first time, we apply the Transformer to analyze conversations and detect emotions. Our hierarchical self-attention and crossattention modules allow our model to exploit contextual information more efficiently than existing gated RNNs and CNNs.
We derive dynamic, context-aware, and emotion-related commonsense knowledge from external knowledge bases and emotion lexicons to facilitate the emotion detection in conversations.
We conduct extensive experiments demonstrating that both contextual information and commonsense knowledge are beneficial to the emotion detection performance. In addition, our proposed KET model outperforms the state-of-the-art models on most of the tested datasets across different domains.

2 Related Work

Emotion Detection in Conversations:

생략

Knowledge Base in Conversations:

최근에는 개방형 대화 시스템과 같은 생성 적 대화 시스템에 지식 기반을 통합하는 연구가 증가하고 있습니다 (Han et al., 2015; Asghar et al., 2018; Ghazvininejad et al., 2018; Young et al., 2018; Parthasarathi and Pineau, 2018; Liu et al., 2018; Moghe et al., 2018; Dinan et al., 2019; Zhong et al., 2019)

task-oriented dialogue systems (Madotto et al., 2018; Wu et al., 2019; He et al., 2019)
question answering systems (Kiddon et al., 2016; Hao et al., 2017; Sun et al., 2018; Mihaylov and Frank, 2018).

Zhou et al. (2018b)는 입력 문장의 interpretation 을 강화하고 graph attentions를 사용하여 knowledge-aware responses 을 생성하는 데 도움이되도록 구조화 된 지식 그래프를 채택했습니다.
knowledge interpreter (Zhou et al., 2018b)의 그래프주의는 정적이며 인식된 관심 개체만 관련이 있습니다.
By contrast, our graph attention mechanism is dynamic and selects context-aware knowledge entities that balances between relatedness and affectiveness.

Emotion Detection in Text:

전통적인 기계 학습 방법 (Pang et al., 2002; Wang and Manning, 2012; Seyeditabari et al., 2018)에서 딥 러닝 방법 (Abdul-Mageed and Ungar, 2017; Zhang et al., 2018b)으로 이동하는 추세가 있습니다. )
텍스트의 감정 감지를 위해. Khanpour와 Caragea (2018)는 딥 러닝 기능과 어휘 기반 기능을 모두 사용하여 온라인 건강 커뮤니티의 건강 관련 게시물에서 감정 감지를 조사했습니다.

Incorporating Knowledge in Sentiment Analysis:

전통적인 어휘 기반 방법은 텍스트를 구성하는 단어 또는 구의 감정이나 감정을 기반으로 텍스트에서 감정이나 감정을 감지합니다 (Hu et al., 2009; Taboada et al., 2011; Bandhakavi et al., 2017).
딥 러닝 방법에서 지식 기반의 사용을 조사한 연구는 거의 없습니다.
Kumar et al. (2018)은 WordNet (Fellbaum, 2012)의 지식을 사용하여 LSTM에서 생성 된 텍스트 표현을 풍부하게하고 향상된 성능을 얻을 것을 제안했습니다.

Transformer

생략

3 Our Proposed KET Model

자세한 건 생략.. 그림보면 대충 느낌은 올 것이다.

context, response에서 각각 word embedding과 knowledge base (KB)의 feature을 뽑는다.
knowledge는 ConceptNet (Speer et al., 2017)과 an emotion lexicon NRC VAD 의 데이터를 사용한다.
그리고 각 정보를 결합한 것을 그래프를 태워서 concept representation을 만든다고 한다.
거기에 다시 word embedding과 짬뽕시키고, 그것을 multi-head self-attention & FF을 통과시킨다. (즉 Transformer)
여기서는 인트로에서 말했듯이 context, response을 합쳐서 하지 않고 context끼리만 attention을 한다
그렇게 각각 뽑은 최종 출력을 서로 cross-attention을 하여 감정을 예측하는 플로우다.
뭔가 느낌이 MRC 하는 느낌으로 한 듯. 즉, 이렇게 하면 sequential 정보가 잘 담기려나? 물론 transformer에서 담기긴 하겠지만, 대화는 uni-directional 한 것인데 이렇게하면 bi-directional 한 것이지 않나?

4 Experimental Settings

4.1 Datasets and Evaluations

4.2 Baselines and Model Variants

cLSTM
CNN (Kim, 2014)
CNN+cLSTM (Poria et al., 2017)
BERT BASE (Devlin et al., 2018)
DialogueRNN (Majumder et al., 2019)
KET SingleSelfAttn
KET StdAttn

5 Result Analysis

6 Conclusion

우리는 knowledge-enriched transformer을 제안하여 textual conversation의 감정을 검출한다.
우리의 모델은 hierarchical self-attention을 통하여 conversation representations구조를 학습하고 동적으로 external, context-aware, emotion-related knowledge entities을 knowledge base로부터 참조한다.
실험 분석은 contextual information과 commonsense knowledge가 모델 성능에 유익함을 입증한다.
relatedness와 affectiveness 사이의 tradeoff는 중요한 역할을 한다.
추가적으로, 우리의 모델은 다양한 사이즈와 도메인을 가지는 대부분의 test dataset에서 SoTA을 달성한다.
다른 언어에서 NRC_VAD와 비슷한 emotion lexicons와 다국어 지식 베이스의 ConceptNet이 주어지면, 우리의 모델은 다른 언어에도 손쉽게 적용이 가능하다.
추가적으로, NRC_VAE가 유일하게 emotion-specific component로 주어지기 때문에, 우리의 모델은 일반적인 대화 분석에 적용할 수 있다.

Reference

https://arxiv.org/pdf/1909.10681.pdf

인공지능, AI, NLP, 논문 리뷰, Natural Language, Leetcode

AI Information

Short-013, Knowledge-Enriched Transformer for Emotion Detection in Textual Conversations (2019-EMNLP)

◼️ Comment

0 Abstract

1 Introduction

2 Related Work

3 Our Proposed KET Model

4 Experimental Settings

4.1 Datasets and Evaluations

4.2 Baselines and Model Variants

5 Result Analysis

6 Conclusion

댓글

댓글 쓰기