NL-159, Towards Evaluation of Multi-party Dialogue Systems, INLG 2022

Main Motivation

  • Prolific research in NLG evaluation 
    • Multiple taxonomies presented[1, 2, 3, 4] 
    • Studies towards importance of automatic and human metrics[5, 6, 7, 8] 
    • + Confusion surrounding inconsistent evaluation methods used[9] 
  • However, not much work towards evaluation specifically for Multi-party Conversation (MPC) evaluation 
    • = Need for discussing MPC specific challenges and needs 

MPC Challenges

  • 여러 참가자의 존재는 대화 모델링 관점에서 새롭고 흥미로운 과제를 소개합니다.
    • 참가자 역할 - 대화 모델링과 함께 speaker-specific 및 addressee-specific 정보를 유지해야 합니다.
    • 대화 구조 - 순차보다 그래프에 가깝습니다.
    • 대화 내 스레드 - 하위 그룹 내에서 여러 주제 스레드가 공존할 수 있음

Contributions

  • Propose an expanded taxonomy focusing on the specific challenges introduced by multi-party dialogue, or group conversations 
    • Such as the need to maintain speaker-specific context and recognize the proper addressees 
  • Synthesize evaluation measures utilized in existing MPC research, and relate them to the expanded taxonomy introduced 
    • Report important inconsistencies in current research 


















Reference

댓글