Author
Thakur, A
Abrol, V
Sharma, P
Rajan, P
Journal title
The Journal of the Acoustical Society of America
DOI
10.1121/1.5042241
Issue
6
Volume
143
Last updated
2020-10-29T18:14:34.63+00:00
Page
3819-3819
Abstract
This paper proposes a multi-layer alternating sparse-dense framework for bird species identification. The framework takes audio recordings of bird vocalizations and produces compressed convex spectral embeddings (CCSE). Temporal and frequency modulations in bird vocalizations are ensnared by concatenating frames of the spectrogram, resulting in a high dimensional and highly sparse super-frame-based representation. Random projections are then used to compress these super-frames. Class-specific archetypal analysis is employed on the compressed super-frames for acoustic modeling, obtaining the convex-sparse CCSE representation. This representation efficiently captures species-specific discriminative information. However, many bird species exhibit high intra-species variations in their vocalizations, making it hard to appropriately model the whole repertoire of vocalizations using only one dictionary of archetypes. To overcome this, each class is clustered using Gaussian mixture models (GMM), and for each cluster, one dictionary of archetypes is learned. To calculate CCSE for any compressed super-frame, one dictionary from each class is chosen using the responsibilities of individual GMM components. The CCSE obtained using this GMM-archetypal analysis framework is referred to as local CCSE. Experimental results corroborate that local CCSE either outperforms or exhibits comparable performances to existing methods including support vector machine powered by dynamic kernels and deep neural networks.
Symplectic ID
925650
Publication type
Journal Article
Publication date
June 2018
Please contact us with feedback and comments about this page. Created on 04 Feb 2019 - 14:11.