Audio Sentiment Analysis using Spectrogram and Bag-of- Visual- Words
- ,
- Mohd Anwara(Author)
- aNorth Carolina Agricultural and Technical State University
Abstract
Audio sentiment analysis has many applications in a present-day context, such as call center environments, conversational agents, and human-robot interactions. However, analyzing sentiment using audio signals is a significant challenge due to the difficulty of accurately determining the robust feature set needed to detect sentiments expressed within the audio signal. It is novel to use spectrogram and bag-of-visual-words for representing robust audio features for sentiment analysis. Therefore, we propose using keypoints of the spectrogram (a 2D image representation of frequencies in a signal) to classify sentiment. We converted audio signals to spectrograms using Short Time Fourier Transform (STFT). The Oriented FAST and Rotated BRIEF (ORB) algorithm extracted the salient keypoints scattered on a regular grid over the spectrogram. Then, using the Bag-of-Visual-Words (BoVW) technique, the descriptors from each column of the spectrogram of the audio signal are converted to histograms. We have trained and tested models using a multilingual dataset. We have applied hyperparameter tuning while generating the histograms to increase the accuracy of the model. These histograms are then passed to classifiers, a nd the sentiment of each audio signal is classified. O ur experimental result shows that Random Forest was the best classifier with an accuracy of 76% and an F1 score of 78% and demonstrates the prospect of our approach in language-agnostic audio sentiment analysis.
