Improvement of CNN Network Parameters in Turkish Music Emotion Recognition

Sürücü, Murat; Surucu, Murat

doi:10.54856/jiswa.202212230

AKILLI SİSTEMLER VE UYGULAMALARI DERGİSİ
JOURNAL OF INTELLIGENT SYSTEMS WITH APPLICATIONS
J. Intell. Syst. Appl.

E-ISSN: 2667-6893

This work is licensed under a Creative Commons Attribution 4.0 International License.

Improvement of CNN Network Parameters in Turkish Music Emotion Recognition

Türk Müziği Duygu Tanımasında CNN Ağ Parametrelerinin İyileştirilmesi

How to cite: Sürücü M. Improvement of cnn network parameters in turkish music emotion recognition. Akıllı Sistemler ve Uygulamaları Dergisi (Journal of Intelligent Systems with Applications) 2022; 5(2): 126-131.

Full Text: PDF, in English.

Total number of downloads: 381

Title: Improvement of CNN Network Parameters in Turkish Music Emotion Recognition

Abstract: Music has been an integral part of humanity throughout history. People have conveyed their emotional expressions through music, and musical styles have evolved alongside communities. Despite the diversity of styles, music has always existed within an emotional context. Therefore, measuring the emotional expressions conveyed by music has given rise to a broad field of study encompassing art, science, history, and sociology. Additionally, with the proliferation of electronic music platforms, the ability to automatically identify the emotional genres of music has become a prominent feature sought after by end users. In this context, while numerous studies have been conducted in various languages, there is a scarcity of research specifically tailored to the Turkish language. For successful execution of processes that can be automated through machine learning, several factors need to be considered: the proper selection of data preprocessing methods, determination of the structure and complexity of the model to be trained, accurate selection of training and testing data, and more. Optimal performance cannot be achieved solely through the correct choice of a model, as flawed data preprocessing can hinder results, and conversely, accurate data preprocessing cannot compensate for a faulty model. This article aims to enhance the performance of a rare music emotion recognition study conducted in the Turkish language by constructing a "problem-specific network model." To achieve this goal, data subjected to various normalization techniques were analyzed using Convolutional Neural Network (CNN) models of different dimensions and complexities. The achievements were compared with two different classifiers to establish a reference point in comparison with previous studies. At the end of the study, it was observed that for data subjected to MinMax normalization, a success rate of 86.67% was achieved with the Softmax classifier and 80% with the SVM classifier. Similarly, with Z-Score normalization, success rates of 84.17% and 81.67% were obtained, respectively. These values are higher than the highest achievement value of 74.2% obtained for the same data group in the reference study. Furthermore, it is believed that applying the additional performance-enhancing procedures used in the reference study to the models in this study would lead to even higher achievements.

Keywords: CNN, model selecting, hyperparameters, normalization

Başlık: Türk Müziği Duygu Tanımasında CNN Ağ Parametrelerinin İyileştirilmesi

Özet: Müzik, tarih boyunca insanlığın ayrılmaz bir parçası olmuştur. İnsanlar duygusal ifadelerini müziğin aracılığıyla aktarmış ve topluluklarla birlikte müzik tarzları da evrimleşmiştir. Farklı tarzlarda olmalarına rağmen, müzik her zaman duygusal bir bağlamda var olmuştur. Bu nedenle, müziğin hangi duygusal ifadeleri taşıdığının ölçülmesi, sanattan bilime, tarihten sosyolojiye geniş bir çalışma alanı oluşturmuştur. Ayrıca, elektronik müzik platformlarının yaygınlaşmasıyla birlikte, müziğin duygusal türlerini otomatik olarak belirleyebilmek, son kullanıcıların aradığı özellikler arasında öne çıkmaktadır. Bu bağlamda, farklı dillerde bu konuda birçok çalışma yapılmış olsa da, Türkçe diline özgü çalışmalar oldukça sınırlıdır. Makine öğrenmesi sayesinde otomatikleştirilebilen işlemlerin başarılı bir şekilde gerçekleştirilebilmesi için, veri ön işleme yöntemlerinin doğru bir şekilde seçilmesi, eğitilecek modelin yapısının ve karmaşıklığının belirlenmesi, eğitim ve test verilerinin doğru bir şekilde seçilmesi gibi faktörler üzerinde çalışmak gerekmektedir. Doğru bir model seçimi ile hatalı veri ön işlemesi sonucunda en yüksek başarı elde edilemeyeceği gibi, tersi durumda doğru veri ön işlemesi ile hatalı bir model de başarılı sonuçlar üretemeyecektir. Bu makalede, Türkçe dilinde yapılan nadir müzik duygu tanıma çalışmalarından birine yönelik olarak, "problem özgü ağ modeli" oluşturarak başarımın arttırılması amaçlanmıştır. Bu amaç doğrultusunda, farklı veri normalizasyon yöntemlerine tabi tutulmuş veriler, farklı boyut ve karmaşıklıkta Evrişimli Sinir Ağı (CNN) modelleri kullanılarak analiz edilmiş ve önceki çalışma ile referans olması adına iki farklı sınıflandırıcı ile olan başarımları incelenmiştir. Çalışmanın sonucunda, MinMax normalleştirmeye tabi tutulmuş veriler için Softmax sınıflandırıcının %86,67 ve SVM sınıflandırıcının %80 başarı elde ettiği gözlenmiştir. Benzer şekilde, Z-Skor normalleştirme ile elde edilen sonuçlar ise %84,17 ile %81,67 olarak bulunmuştur. Bu değerler, referans çalışmasında aynı veri grubu için elde edilen en yüksek başarı değeri olan %74,2'den daha yüksektir. Ayrıca, referans çalışmasında kullanılan diğer performans artırıcı işlemlerin bu çalışmanın modellerine uygulanmasıyla daha yüksek başarılar elde edilebileceği düşünülmektedir.

Anahtar kelimeler: CNN, model seçimi, hiperparametre, normalleştirme

Bibliography:

Sachs C. The History of Musical Instruments. Mineola: Dover Publications. 2006
Kim J, Andre E. Emotion recognition based on physiological changes in music listening. IEEE Transactions on Pattern Analysis and Machine Intelligence 2008; 30(12): 2067-2083.
Music Information Retrieval Evaluation eXchange (MIREX) Wiki [Internet]. Available from: https://www.music-ir.org/mirex/wiki/MIREX_HOME
Last.fm [Internet]. Last.fm. Available from: https://www.last.fm/
Eerola T, Vuoskoski JK. A comparison of the discrete and dimensional models of emotion in music. Psychology of Music 2010; 39(1): 18-49.
Mo S, Niu J. A novel method based on OMPGW method for feature extraction in automatic music mood classification. IEEE Transactions on Affective Computing 2019; 10(3): 313-324.
Youngmoo E. Kim, Schmidt EM, Migneco R, Morton BG, Richardson P, Scott JJ, Speck JA, Turnbull D. Music emotion recognition: A state of the art review. International Society for Music Information Retrieval Conference 2010; 255-266.
Yang Y-H, Chen HH. Machine Recognition of Music Emotion: A Review. Association for Computing Machinery Transactions on Intelligent Systems and Technology. 2012;3(3):1-30.
Bilal Er M, Aydilek IB. Music emotion recognition by using chroma spectrogram and deep visual features. International Journal of Computational Intelligence Systems 2019; 12(2): 1622.
Surucu M, Isler Y, Perc M, Kara R. Convolutional neural networks predict the onset of paroxysmal atrial fibrillation: Theory and applications. Chaos: An Interdisciplinary Journal of Nonlinear Science 2021; 31(11): 113119.
Surucu M, Isler Y, Kara R. Diagnosis of paroxysmal atrial fibrillation from thirty-minute heart rate variability data using convolutional neural networks. Turkish Journal of Electrical Engineering and Computer Sciences 2021; 29(SI-1): 2886-2900.
Narin A, Isler Y. Detection of new coronavirus disease from chest x-ray images using pre-trained convolutional neural networks. Journal of the Faculty of Engineering and Architecture of Gazi University 2021; 36(4): 2095-2107.
Liu X, Chen Q, Wu X, Liu Y, Liu Y. CNN based music emotion classification. arXiv. 2017.
Altan G, Kutlu Y, Pekmezci AO, Nural S. The diagnosis of asthma using Hilbert-Huang transform and deep learning on lung sounds. Journal of Intelligent Systems with Applications 2019; 2(2): 100-105.
Balli O, Kutlu Y. Effect of deep learning feature inference techniques on respiratory sounds. Journal of Intelligent Systems with Applications 2020; 3(2): 134-137.
Yang YH, Chen HH. Music Emotion Recognition. CRC Press. 2011
Russell JA. A circumplex model of affect. Journal of Personality and Social Psychology 1980; 39(6): 1161-1178.
Thayer RE. The Biopsychology of Mood and Arousal. Oxford University Press, 1990.
Lartillot O, Toiviainen P, Eerola T. A Matlab toolbox for music information retrieval. Conference paper in Studies in Classification, Data Analysis, and Knowledge Organization 2008; 261-268.
Feng Y, Zhuang Y, Pan Y. Popular music retrieval by detecting mood. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2003; 375-376.
Yading S, Dixon S, Pearce M. Evaluation of musical features for emotion classification. International Society for Music Information Retrieval Conference 2012; 523-528.
Turkish Music Emotion Dataset [Internet]. [cited 2021 Aug 30]. Available from: https://www.kaggle.com/datasets/blaler/turkish-music-emotion-dataset
Bergstra JA, Daniel Y, David DC. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. International Conference on Machine Learning, Proceedings of Machine Learning Research 2013; 28(1): 115-123.
Cortes C, Vapnik V. Support-vector networks. Machine Learning 1995; 20(3): 273-297.