Seminar by Emrah İnan

Seminar by Emrah İnan

  • Date & Time:
    • September 27, 2022
    • 11:00
  • Venue:

    IZTECH Computer Engineering Department - Meeting Room

Speaker: Dr. Emrah İnan
Title: DUDU: A TURKISH DIALOGUE CORPUS GENERATOR FOR MEDICAL DOMAIN

Abstract:
Domain-oriented dialogue datasets are essential for building dialogue systems. TV subtitles enable the construction of these datasets because of including large amounts of dialogue material but lack speaker information and the turn structure. In this paper, we present a novel method to build a Turkish and domain-specific dialogue corpus called as DuDu (Dialogue Dataset) from monolingual scripts and their corresponding Turkish subtitles. Initially, we align each sentence with its translation considering timeline information for Turkish subtitles. Then, we employ our similarity method using knowledge base embeddings. This method matches the utterances between scripts and subtitles and assigns the speaker and dialogue boundaries for aligned subtitles. Finally, we evaluate the quality of the constructed corpus with existing mainstream methods. We examine a combination of both feature and embedding-based response retrieval method and experiments of this combined mainstream method achieve 0.675 F1 score.

Short Bio:
Emrah Inan received his Ph.D. degree from the Department of Computer Engineering, Ege University. His Ph.D. thesis consists of a domain oriented Entity Linking method based on sequence-to-sequence learning using knowledge bases. He worked as a research intern to enrich the DBpedia knowledge base with specific domains at the Data and Web Science Research Group, Mannheim, Germany in 2017. He received his MSc degree from the Department of Computer Engineering, Izmir Institute of Technology, Turkey, in 2012. He was as a teaching assistant at the Department of Computer Engineering at Ege University, Izmir, Turkey. He is now working as a postdoctoral researcher at the National Centre for Text Mining at the University of Manchester, UK. His research interests include machine learning, natural language processing, and knowledge graphs.