Author
Sharma, P
Abrol, V
Thakur, A
Assoc, I
Journal title
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6
DOI
10.21437/Interspeech.2018-1481
Volume
2018-September
Last updated
2021-03-13T07:45:16.987+00:00
Page
3299-3303
Abstract
© 2018 International Speech Communication Association. All rights reserved. In this paper, we propose a deep learning framework which combines the generalizability of Gaussian mixture models (GMM) and discriminative power of deep matrix factorization to learn acoustic scene embedding (ASe) for the acoustic scene classification task. The proposed approach first builds a Gaussian mixture model-universal background model (GMM-UBM) using frame-wise spectral representations. This UBM is adapted to a waveform, and the likelihood for each spectral frame representation is stored as a feature matrix. This matrix is fed to a deep matrix factorization pipeline (with audio recording level max-pooling) to compute a sparse-convex discriminative representation. The proposed deep factorization model is based on archetypal analysis, a form of convex NMF, which has been shown to be well suited for audio analysis. Finally, the obtained representation is mapped to a class label using a dictionary based auto-encoder consisting of linear and symmetric encoder and decoder with an efficient learning algorithm. The encoder projects the ASe of a waveform to the label space, while the decoder ensures that the feature can be reconstructed, resulting in better generalization on the test data.
Symplectic ID
925640
Download URL
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000465363900689&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=4fd6f7d59a501f9b8bac2be37914c43e
Publication type
Conference Paper
ISBN-13
978-1-5108-7221-9
Publication date
2018
Please contact us with feedback and comments about this page. Created on 04 Feb 2019 - 14:11.