Ensemble Learning with CNN-LSTM Combination for Speech Emotion Recognition

Yükleniyor...
Küçük Resim

Tarih

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Springer International Publishing Ag

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Speech plays the most significant role in communication between people. The voice enables a speaker's unique characteristics to be mapped with biometric properties as well as carrying emotions. Emotion contains many non-linguistic signals to express ourselves as humans. Emotion recognition in human speech is a challenging task in different applications in fields such as healthcare, services, telecommunications, video conferencing, and human-computer interaction (HCI). Deep learning techniques are becoming a significant focus in recent research in the speech emotion recognition (SER) domain. In this paper, we present an ensemble learning approach based on various combinations of CNN and LSTM networks to address the limitations of the existing SER models. The proposed system is evaluated using the RAVDESS dataset. More specifically, the LSTM, CNN, and CNN and LSTM models achieved an accuracy rate of 0.64, 0.73, and 0.71, respectively. The simulation outcomes confirm that ensemble learning of the three deep model combinations contributes to the effectiveness of SER.

Açıklama

International Conference on Computing and Communication Networks (ICCCN) -- NOV 19-20, 2021 -- Manchester Metropolitan Univ, Manchester, ENGLAND

Anahtar Kelimeler

Speech Emotion Recognition, Deep Learning, Convolutional Neural Network, Long Short-Term Memory, Ensemble Learning

Kaynak

Proceedings of International Conference On Computing and Communication Networks (Icccn 2021)

WoS Q Değeri

Scopus Q Değeri

Cilt

394

Sayı

Künye

Onay

İnceleme

Ekleyen

Referans Veren