Audio Sentiment Analysis using Spectrogram and Bag-of- Visual- Words

Sophina Luitela(Author)
,
Mohd Anwara(Author)

aNorth Carolina Agricultural and Technical State University

Research Output: Chapter in Book/Report/Conference proceeding Conference contribution

Abstract

Audio sentiment analysis has many applications in a present-day context, such as call center environments, conversational agents, and human-robot interactions. However, analyzing sentiment using audio signals is a significant challenge due to the difficulty of accurately determining the robust feature set needed to detect sentiments expressed within the audio signal. It is novel to use spectrogram and bag-of-visual-words for representing robust audio features for sentiment analysis. Therefore, we propose using keypoints of the spectrogram (a 2D image representation of frequencies in a signal) to classify sentiment. We converted audio signals to spectrograms using Short Time Fourier Transform (STFT). The Oriented FAST and Rotated BRIEF (ORB) algorithm extracted the salient keypoints scattered on a regular grid over the spectrogram. Then, using the Bag-of-Visual-Words (BoVW) technique, the descriptors from each column of the spectrogram of the audio signal are converted to histograms. We have trained and tested models using a multilingual dataset. We have applied hyperparameter tuning while generating the histograms to increase the accuracy of the model. These histograms are then passed to classifiers, a nd the sentiment of each audio signal is classified. O ur experimental result shows that Random Forest was the best classifier with an accuracy of 76% and an F1 score of 78% and demonstrates the prospect of our approach in language-agnostic audio sentiment analysis.

Access to documents

10.1109/IRI54793.2022.00052

Audio Sentiment Analysis using Spectrogram and Bag-of- Visual- Words

Links