◼ Comment

ERC 분야의 새 논문이라 살펴보았는데, 그래프적인 접근이다. GCN을 이용했다고 한다.
원래는 ERC가 메인 테스크라기보단, relation extraction에 초점을 맞춘거 같은데 ERC에도 적용한 거 같다.
방법론은 대화 히스토리가 주어지고, 주어:화자, 목적어: 문장, relation: 감정 이렇게 S, O, R이 주어진다.

argument라고 하는 것이 subject, object을 의미하는 것이다.

모델링 적으로 보면, 화자랑 텍스트가 concat해서 들어가서 classification하지 않고

먼저 BERT와 같은 것을 통해서 각 노드에 대한 representation을 추출한다.
노드는 총 4개 dialogue node, turn node, subject node, object node가 주어진다.
그리고 edge을 어떻게 연결할지 정의해서 GCN을 태운다.

근데 가장 크게 문제점으로 보이는건 미래의 발화를 사용하는 것이다.

ERC에서 든 예시를 보면, 현재 발화의 감정을 예측할 때 모든 대화 히스토리르 사용하는 개념이다.
만약 사용하지 않는다면, 그래프에서 연결자체가 큰 의미가 없다.

GCN을 정확하게 알지는 못하지만, 찾아보고 개념을 적용해보면

edge가 결국 node간의 연결 관계를 표현한다. (=Adjacency)
여기서 node에 해당하는 representation은 (각 노드에 따라 다르지만), text 입력의 [cls] 토큰의 출력이나, 토큰들의 평균 이런것을 사용한다.
즉 노드의 출력들이 연관성이 있는 것끼리 뉴럴네트워크를 한번더 태운다고 보면되는데..
사실 attention network가 이런 것을 자동으로 학습해 주지 않나? 싶다.
물론 학습을 용이하게 하고 의미를 부여하기 위해서는, GCN을 태우는 게 나을거 같긴 하지만 성능만을 위하면 꼭? 이라는 궁금증이 생긴다.

흐름 정리: https://docs.google.com/presentation/d/1hj0opk8wXgiQeg8ee5Yyg1V9qUus-2df/edit?usp=sharing&ouid=115755261540686011862&rtpof=true&sd=true

0 Abstract

대화 기반의 relation extraction (RE)는 대화에서 나타난 두 인수간의 relations을 추출하는데 중점이 있다.
대화들은 personal pronoun (인칭대명사)가 많이 나타나는 특성과 정보의 density가 적기 때문에, 대화들에서 대부분의 relational facts은 어떠한 single sentence 지지되지 않는다.

대화 기반의 relation extraction은 대화의 포괄적인 이해를 요구한다.

이 논문에서 우리는 TUrn COntext awaRE Graph Convolutional Network (TUCORE-GCN)을 소개하고, 이는 사람들이 대화들을 이해하는 방법에 집중한다.
게다가, 우리는 새로운 접근법을 제안하고, 이는 대화기반의 RE로써 ERC 테스크를 다룬다.
대화 기반의 RE 데이터세트와 3개의 ERC 데이터세트에서의 실험들은 우리의 모델이 다양한 대화 기반의 NLU 테스크에서 효과적임을 입증한다.
3가지 실험들에서, TUCORE-GCN는 SoTA을 달성한다.
Our code is available at https://github. com/BlackNoodle/TUCORE-GCN.

1 Introduction

relation extraction (RE) 테스크는 텍스트로부터 (sentence, document, dialogue와 같은) 인자들 (arguments) 사이의 semantic relations을 식별하는데 목표를 가진다.
그러나, 많은 수의 relational facts은 여러 문장들로 표현되고, sentence-level RE는 불가피한 제약을 받는다.
따라서 같은 문장에서 언급되지 않은 두 arguments 간의 relations 또는 단일 문장으로 뒷받침될 수 없는 관계를 식별하는 것을 목표로 하는 Cross-sentence RE는 대규모 말뭉치에서 자동으로 지식 기반을 구축하는 데 필수적인 단계입니다.
이런 점에서 대화는 문장 간 관계를 쉽게 나타내므로(Yu et al., 2020) 대화에서 관계를 추출하는 것이 필요합니다.
대화에서 나타난 두 개의 arguments 사이의 relations의 예측을 지지하기 위해, Yu (2020은 최근에 DialogRE을 제안하고, 이는 사람이 어노테이트한 dialogue-based RE 데이터세트이다.
테이블 1은 DialogRE의 예시를 보여준다.

DialogRE와 같은 대화형 텍스트에서, 그것의 높은 개인적인 pronoun frequeny와 적은 information density 때문에, formal written texts와 비교하여, 많은 relational triples은 대화에서의 여러 문장에 대한 추론을 요구한다.
DialogRE의 relational triples의 65.9% 는 같은 turn에 나타나지 않는 arguments을 포함한다.
그래서, 멀티턴 information은 dialogue-based RE에서 중요한 role로써 역할을 한다.
dialogue에서 효과적인 relation을 추출하는 것은 실제로 사람이 대화에서 이해하는 방법으로부터 영감받은 많은 챌린지가 있다.
첫 째로, 대화는 speakers을 가지고, 각 발화를 말하는 사람이 중요하다.

그 이유는 relational triples의 subject와 object가 발화를 누가 말하는지에 의존하기 때문이다.
예를 들어, 만약 S3가 “Hey Pheebs.”에대해 "Hey"라고 대답하면, relational triple (S2, per:alternate_names, Pheebs)는 (S3, per:alternate_names, Pheebs)로 수정될 것이다. (테이블 1에서)

두 번째로, 대화에서 각 턴의 의미를 이해할 때, surrounding turns의 의미를 아는 것이 중요하다.

예를 들어, 만약 테이블에서 우리가 "No, but he is always late" 보면, 우리는 누가 늦는다는지 알 수가 없다.
그러나, 만약 우리가 이전의 턴을 보면, 우리는 S2의 형제가 항상 늦는다는 것을 알 수가 있다.

세 번째로, 대화는 여러 턴으로 구성된다.

턴들은 연속적이며, arguments는 다른 턴에서 나타날 것이다.

따라서 두 arguments 간의 관계를 파악하기 위해서는 멀티턴 정보를 파악하는 것이 중요합니다.
이것은 대화의 연속적인 특성을 사용하여 수행할 수 있다.
그래서, 우리는 대화에서 더 좋은 relations을 추출하기 위해 이러한 챌린지를 해결하는 것을 목표로 한다.
이 논문에서, 우리는 대화 기반의 RE에 대해 TUrn COntext awaRE Graph Convolutional Network (TUCOREGCN)을 제안한다.

이것은 미리 챌린지에서 해결하려고 고안되었었다.

TUCORE-GCN은 BERT(Yu et al., 2020) 및 SA-BERT의 화자 임베딩(Gu et al., 2020)을 적용하여 대화에서 화자 정보를 반영하도록 입력 시퀀스를 인코딩합니다.

그러고나서, 인코딩된 입력 시퀀스로부터 각 턴의 representations을 잘 추출하기 위해, Maksed Multi-Head Self-Attention이 주변의 턴 마스크를 사용해 적영된다.
그 다음, TUROE-GCN은 대화에서 arguments 사이의 relational information을 캡쳐하기위해 heterogeneous dialogue graph을 구축한다.

이것은 4가지 타입의 nodes (dialogue node, turn node, subject node, object node)로 구성되고 3가지 타입의 edges (speaker edge, dialogue edge, argument edge)로 구성된다.

그리고 나서, turn nodes들의 연속적인 특성들이 고려되어야 한다.

각 노드에 대한 주변의 turn-aware representation을 얻기 위해서, 우리는 BiLSTM을 turn node에 적용하고. Graph Convolutional Networks을 heterogeneous dialogue graph에 적용한다.

마침내, 우리는 획득한 features로부터 arguments사이의 relations을 분류한다.

ERC의 테스크는 대화의 감정을 식별하는데 목표가 있다.

ERC는 잠재적인 응용으로 인해 최근 인기를 얻은 도전적인 작업입니다(Poria et al., 2019).
사용자 행동을 분석하고(Lee and Hong, 2016) 가짜 뉴스를 탐지하는 데 사용할 수 있습니다(Guo et al., 2019).
표 2는 ERC 작업에서 널리 사용되는 데이터 세트인 EmoryNLP(Zahiri and Choi, 2018)의 예를 보여줍니다.
우리는 ERC 작업을 대화 기반 RE로 취급하는 새로운 접근 방식을 제안합니다.
주어가 대상을 말할 때 특정한 감정(기쁨, 중립, 무서움)으로 각 발화의 감정 관계를 정의하면 대화에서 각 발화의 감정은 삼중(발화의 화자, 감정, 발화)는 표 3과 같다.
우리가 아는 한, 이 접근 방식은 이전 연구에서 도입되지 않았습니다.

In summary, our main contributions are as follows:

We propose a novel method, TUrn COntext awaRE Graph Convolutional Network (TUCORE-GCN), to better cope with a dialogue-based RE task.
We introduce a surrounding turn mask to better capture the representation of the turns.
We introduce a heterogeneous dialogue graph to model the interaction among elements (e.g., speakers, turns, arguments) across the dialogue and propose a GCN mechanism combined with BiLSTM.
We propose a novel approach to treat the ERC task as a dialogue-based RE.

2 Related Work

3 Model

TUCORE-GCN mainly consist of four modules: encoding module (Sec 3.1), turn attention module (Sec 3.2), dialogue graph with sequential nodes module (Sec 3.3), and classification module (Sec 3.4), as shown in Figure 1.

3.1 Encoding Module

우리는 인코딩 모듈의 입력 시퀀스로 BERTs을 따른다.
대화 $d = s_1 : t_1, s_2 : t_2, ..., s_M : t_M$ 와 이것과 연관있는 argument pair (a1, a2)가 주어지면, (여기서 si와 ti는 speaker ID와 i번째 turn의 텍스트를, M은 전체 턴의 수를 의미한다) BERTs은 $\hat{d} = \hat{s_1} : t_1, \hat{s_2} : t_2, ..., \hat{s_M} : t_M$ 을 구성하고 $\hat{s_i}$ 는 다음과 같다.

[S1]과 [S2]는 스페셜 토큰들이다.
게다가, 만약 ( $s_i$ = $a_k$ )인 i에 대해서는 $\hat{a_k}$ (k ∈ {1, 2})는 [ $S_k$ ]이고 그 외는 $a_k$ 이다.
그리고나서, 우리는 $\hat{d}$ 와 ( $\hat{a_1}$ , $\hat{a_2}$ )을 classification token [CLS]와 seperator token [SEP]을 함께 concat하고 (BERT처럼) 이를 입력시퀀스로 사용한다.
즉: [CLS] $\hat{d}$ [SEP] $\hat{a_1}$ [SEP] $\hat{a_2}$ [SEP]
여기서, argument는 subject와 object을 말한다.

speaker change information을 모델링하기 위해, SA-BERT을 따라, 우리는 추가적인 speaker embeddings을 token representations을 더한다.
만약 $\hat{a_k}=[S_k]$ 이면, $\hat{a_k}$ 의 각 token representation에 $\hat{s_i}:t_i,E_s(\hat{a_k})$ (k ∈ {1, 2})의 token representation을 더해서 $E_s(\hat{s_i})$ 가 된다.

여기서 $E_s(\cdot )$ 는 speaker embedding layer을 의미한다.

$E_s(\#)$ 은 speaker information없이, token representation의 embedding 출력이다.
A visual architecture of our input representation is illustrated in Appendix.

감정인식을 보면 입력이 다음과 같은거 같다.
[CLS] speaker1, 발화1, speaker2, 발화2 [SEP] 화자1 [SEP] 발화3
발화3에 대한 감정을 분류한다고하면, 이렇게 입력을 넣어줘서 [CLS] 토큰의 결과를 활용한다. --> graph 입력으로
추가적으로, speaker embedding을 넣어준다는 것, 뒤의 [CLS], [SEP]나 subject에서는 speaker 정보를 없음의 개념으로 Es(#)을 더해준다는 것

Then, token representations containing speaker change information are fed into an encoder to extract the speaker-sensitive token representations.
The encoder can be BERT or BERT variants (Liu et al., 2019; Conneau and Lample, 2019; Lan et al., 2020).

3.2 Turn Attention Module

각 턴에 대해 turn context-sensitive representation을 얻기 위해, 우리는 Masked Multi-Head Self-Attention을 주변의 turn mask들 사용해서 인코더의 출력에 적용한다.
surrounding turn의 범위를 window라 부르고, 앞과 뒤의 turns의 수를 surrounding turn으로 보고, surround turn window size라 부른다.

The surround turn window size c is a hyper-parameter.

X = [x1, x2, x3, ..., xN ]을 인코딩 모듈의 출력으로 하고, xj는 j번째 token representation이고 N는 토큰들의 수이다.

논문에 설명이 복잡하게 있지만.. 직관적으로 그림3처럼, 그냥 window 사이즈만큼 앞뒤로만 attention 하겠다 의미임. 이를 mask을 통해서 컨트롤

3.3 Dialogue Graph with Sequential Nodes Module

dialogue-level 정보를 모델링하기위해, turns와 arguments 사이의 interactions와 turns와 heterogeneous dialogue graph사이의 interactions이 제작되었다.
우리는 그래프에서 4개의 다른 nodes의 타입을 형성했다.

dialogue node
turn node
subject node
object node

dialogue node는 전체 dialogue information을 담기 위한 목적을 가진 node이다.

우리의 연구에서, dialogue node의 initial representation은 turn attention module의 출력에서 [CLS]에 해당하는 feature을 사용한다.

Turn nodes들은 대화에서 각 turn에 대한 information을 표현하고, 대화의 전체 turns수 만큼 생성된다.
subject node와 object node는 각 argument에 대한 information을 표현한다.

i번째 turn node, subject node, object node의 initial representation은 turn attention module의 출력에서 각각 $\hat{s_i}:t_i,\hat{a_1},\hat{a_2}$ 에 해당하는 token representations의 평균을 사용한다.

여기에는 3가지의 다른 edge가 있다.

dialogue edge: 모든 turn nodes은 dialogue edge가 있는 dialogue node와 연결되어서, dialogue node는 turn-level information이 어디있는지를 알면서 학습된다.
argument edge: turns와 arguments 사이의 interaction을 모델링 하기 위해, i번째 turn node와 argument node (즉, subject node와 object node)들은 만약 arugment가 $\hat{s_i}:t_i$ 에서 언급되면, arugment edge와 연결된다.
speaker edge: 같은 화자의 다른 turns 사이의 interaction을 모델링 하기위해, 같은 화자가 말한 turn nodes들은 speaker edges로 fully connected된다.

다음으로, 그래프 컨볼루션 네트워크(GCN)(Kipf and Welling, 2017)를 적용하여 이웃의 특성에서 각 노드 특성을 집계합니다.
이때 턴 노드에 순차적인 정보를 주입하기 위해 턴 노드가 양방향 LSTM(Schuster and Paliwal, 1997) 레이어를 통과한 후 GCN을 적용한다.
l번째 GCN 계층에서 노드 u가 주어졌을 때, h(l)u와 h^(l)u는 각각 순차 정보를 주입하기 전의 노드의 표현이고 순차 정보를 주입한 후의 노드의 표현을 나타낸다.
hˆ(l)u는 다음과 같이 정의할 수 있습니다.

where Ti represents an i th turn node and h˙ (l) Ti represents turn node feature injected sequential information in the dialogue by concatenating the hidden states of two directions. W (l) α ∈ R d×2d , b (l) α ∈ R d , and d is the dimension. Then, the graph convolution operation can be defined as:

where κ are different types of edges, Nk(u) denotes neighbors for node u connected in the k th type edge, W (l) k ∈ R d×d , and b (l) k ∈ R d .

3.4 Classification Module

대화 노드, 주제 노드 및 개체 노드를 연결하여 인수 간의 관계를 분류합니다.
또한 GCN의 각 계층에서 모든 다른 추상 레벨의 기능을 포함하기 위해 다음과 같이 각 GCN 계층의 숨겨진 상태를 연결합니다.

where G is the number of GCN layers and d, s, and o denote the dialogue node, subject node, and object node, respectively.

For each relation type r, we introduce a vector Wr ∈ R^(3(G+1)d) and obtain the probability Pr of the existence of r between arguments by Pr = sigmoid(CWT r ).
We use crossentropy loss as the classification loss to train our model in an end-to-end way.

4 Experiments

5 Conclusion and Future Work

In this paper, we propose TUCORE-GCN for dialogue-based RE. TUCORE-GCN is designed according to the way people understand dialogues in practice to better cope with dialogue-based RE.
In addition, we propose a way to treat the ERC task as dialogue-based RE and showed its effectiveness through experiments.
Experimental results on a dialogue-based RE dataset and three ERC task datasets demonstrate that TUCORE-GCN model significantly outperforms existing models and yields the new state-of-the-art results on both tasks.
Since TUCORE-GCN is modeled for the dialogue text type, we expect it to perform well in dialogue-based natural language understanding tasks.
In future work, we are going to explore the effectiveness of it on other dialogue-based tasks.

Reference

https://arxiv.org/pdf/2109.04008.pdf

인공지능, AI, NLP, 논문 리뷰, Natural Language, Leetcode

AI Information

NL-124, Graph Based Network with Contextualized Representations of Turns in Dialogue, (2021-EMNLP)

◼ Comment

0 Abstract

1 Introduction

2 Related Work

3 Model

3.1 Encoding Module

3.2 Turn Attention Module

3.3 Dialogue Graph with Sequential Nodes Module

3.4 Classification Module

4 Experiments

5 Conclusion and Future Work

댓글

댓글 쓰기