2024-06-19T06:46:22Z
https://meral.edu.mm/oai
oai:meral.edu.mm:recid/2576
2021-12-13T03:19:49Z
1582963390870:1582967549708
user-uy
STUDY OF HYBRID METHODS FOR MELODY EXTRACTION OF POLYPHONIC SIGNALS FOR PHILIPPINE INDIGENOUS MUSIC
Disuanco, Jason Edward D.
Tan, Vanessa H.
Leon, Franz A. de
In this paper, we present a hybrid method for extracting the melody from polyphonic signals. Automatic melody extraction could help ethnomusicologists in music transcriptions and music classification. This is performed by blind source separation, followed by melody extraction.
Blind source separation (BSS) is an algorithm that separates a mixed audio track to its individual sources without any prior information of the track. A BSS algorithm that we considered is Harmonic-Percussive Source Separation (HPSS). In HPSS, the objective is to separate harmonic components from the percussive components of an audio track. Harmonic signals are defined as a sound with pitch. Hence, a song with purely harmonic instrument or voice has contours parallel in the time domain [1]. A problem arises when a percussive instrument is pitched and is mistaken as a harmonic component. This problem can be addressed by separating the harmonic from percussive components through the quality or timbre of the sound produced. An algorithm called Shifted-Non Negative Matrix Factorization (SNMF) clusters the frequencies of each source to extract the pitch of the percussive instrument from the pitches of the harmonic instrument.
We investigate two approaches for melody extraction: salience-based and data-driven. Salience-based approaches start with filtering the audio signal to enhance the frequency content where the melody could be found while attenuating distortions or noise [2]. After preprocessing, a spectral transform is applied to the signal. The two commonly used spectral transforms are the Short-Time Fourier Transform (STFT) and the Multi-Resolution Fast
Fourier Transform (MRFFT). Next, a salience function estimates the prominent pitch values over time. The peaks of this function are taken as the possible candidates for the melody. The peaks from the salience function are then grouped as pitch contours or trajectories based on time, pitch and salience continuity [3][4]. Given the pitch contours, the melody is selected by using tracking techniques. Data driven approaches use machine learning to train a classifier and estimate the melody directly from the power spectrum [2]. Different features are extracted from the audio track [5][6]. The features are then used to train a Support Vector Machine (SVM) classifier. The last stage for this approach is voicing detection. A global threshold based on the power spectrum is used for the detection [1][5].
We developed two hybrid methods for automatic melody extraction of polyphonic music. The first method includes a salience-based approach and a Harmonic Percussive Source Separation (HPSS) algorithm using STFT and MRFFT for spectral transforms. The second method includes the concept of the data driven approach and the Shifted Non-negative Matrix Factorization (SNMF) algorithm. The algorithms are implemented in MATLABĀ® and evaluated using samples from the Jose Maceda collection of the University of the Philippines Center for Ethnomusicology. Ten songs were separated through the HPSSSTFT, HPSS-MRFFT and SNMF. The separated tracks were evaluated through listening and determining the dominance of harmonic and percussive components in each separated signal. Results from subjective tests show that SNMF perform better than HPSS for harmonic and
percussive source separation. Moreover, objective tests for melody extraction indicate that the salience-based approach has higher accuracy in identifying the melody than the data driven approach.
2015
http://hdl.handle.net/20.500.12678/0000002576
https://meral.edu.mm/records/2576