Author
Wang, B
Liakata, M
Ni, H
Lyons, T
Nevado-Holgado, A
Saunders, K
Journal title
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOI
10.21437/Interspeech.2019-2624
Volume
2019-September
Last updated
2024-04-07T21:36:20.287+01:00
Page
1661-1665
Abstract
Copyright © 2019 ISCA Automatic speech emotion recognition (SER) remains a difficult task within human-computer interaction, despite increasing interest in the research community. One key challenge is how to effectively integrate short-term characterisation of speech segments with long-term information such as temporal variations. Motivated by the numerical approximation theory of stochastic differential equations (SDEs), we propose the novel use of path signatures. The latter provide a pathwise definition to solve SDEs, for the integration of short speech frames. Furthermore we propose a hierarchical tree structure of path signatures, to capture both global and local information. A simple tree-based convolutional neural network (TBCNN) is used for learning the structural information stemming from dyadic path-tree signatures. Our experimental results on a widely used benchmark dataset demonstrate comparable performance to complex neural network based systems.
Symplectic ID
1073987
Favourite
Off
Publication type
Conference Paper
Please contact us with feedback and comments about this page. Created on 24 Nov 2019 - 22:05.