Abstract
There are insufficient datasets of singing files that are adequately annotated. One of the available datasets that includes a variety of vocal techniques (n = 17) and several singers (m = 20) with several WAV files (p = 3560) is the VocalSet dataset. However, although several categories, including techniques, singers, tempo, and loudness, are in the dataset, they are not annotated. Therefore, this study aims to annotate VocalSet to make it a more powerful dataset for researchers. The annotations generated for the VocalSet audio files include fundamental frequency contour, note onset, note offset, the transition between notes, note F0, note duration, Midi pitch, and lyrics. This paper describes the generated dataset and explains our approaches to creating and testing the annotations. Moreover, four different methods to define the onset/offset are compared.
FAQs
AI
What type of singing techniques does the Annotated-VocalSet dataset cover?
The Annotated-VocalSet covers 17 singing techniques, including breathy voice, articulated forte, and more.
How were pitch contours for the dataset verified for accuracy?
An expert musician reviewed estimated pitch contours, correcting inaccuracies found during multiple checks and adjustments.
What is the significance of the 24.5% discarded files from the dataset?
Discarding 24.5% of files removed incorrect pitch contours, ensuring higher reliability for subsequent analyses.
What tools were used for annotation, especially onset and offset detection?
The onset/offset detection employed an algorithm by Faghih and Timoney, specifically designed for singing signals.
How do average and median F0 calculations impact note estimation?
The study found that using average or median methods does not significantly affect the estimated F0 of notes.
References (34)
- Choi, S.; Kim, W.; Park, S.; Yong, S.; Nam, J. Children's Song Dataset for Singing Voice Research Soonbeom. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Montréal, QC, Canada, 11-16 October 2020.
- Rosenzweig, S.; Cuesta, H.; Weiß, C.; Scherbaum, F.; Gómez, E.; Müller, M. Dagstuhl ChoirSet: A Multitrack Dataset for MIR Research on Choral Singing. Trans. Int. Soc. Music Inf. Retr. 2020, 3, 98-110. [CrossRef]
- Cuesta, H.; Gómez, E.; Martorell, A.; Loáiciga, F. Analysis of Intonation in Unison Choir Singing. In Proceedings of the 15th International Conference on Music Perception and Cognition (ICMPC), Graz, Austria, 23-28 July 2018.
- Bittner, R.M.; Pasalo, K.; Bosch, J.J.; Meseguer-Brocal, G.; Rubinstein, D. Vocadito: A Dataset of Solo Vocals with F0, Note, and Lyric Annotations. In Proceedings of the International Society for Music Information Retrieval, Online, 8-12 November 2021.
- Rosenzweig, S.; Scherbaum, F.; Shugliashvili, D.; Arifi-Müller, V.; Müller, M. Erkomaishvili Dataset: A Curated Corpus of Traditional Georgian Vocal Music for Computational Musicology. Trans. Int. Soc. Music Inf. Retr. 2020, 3, 31-41. [CrossRef]
- Wilkins, J.; Seetharaman, P.; Wahl, A.; Pardo, B. VocalSet: A Singing Voice Dataset. In Proceedings of the 19th ISMIR Conference, Paris, France, 23-27 September 2018; pp. 468-472. [CrossRef]
- Hsu, C.-L.; Jang, J.-S.R. On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset. IEEE Trans. Audio Speech Lang. Process. 2010, 18, 310-319. [CrossRef]
- COFLA (COmputational Analysis of FLAmenco Music) Team. TONAS: A Dataset of Flamenco a Cappella Sung Melodies with Corresponding Manual Transcriptions. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Curitiba, Brazil, 4-8 November 2013. [CrossRef]
- Mora, J.; Gómez, F.; Gómez, E.; Escobar-Borrego, F.; Díaz-Báñez, J.M. Characterization and Melodic Similarity of a Cappella Flamenco Cantes. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Utrecht, The Netherlands, 9-13 August 2010; pp. 351-356.
- Gómez, E.; Bonada, J. Towards Computer-Assisted Flamenco Transcription: An Experimental Comparison of Automatic Transcription Algorithms as Applied to A Cappella Singing. Comput. Music J. 2013, 37, 73-90. [CrossRef]
- Chang, S.; Lee, K. A Pairwise Approach to Simultaneous Onset/Offset Detection for Singing Voice Using Correntropy. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4-9 May 2014; pp. 629-633.
- Heo, H.; Sung, D.; Lee, K. Note Onset Detection Based on Harmonic Cepstrum Regularity. In Proceedings of the 2013 IEEE International Conference on Multimedia and Expo (ICME), San Jose, CA, USA, 15-19 July 2013; pp. 1-6.
- Molina, E.; Barbancho, A.M.; Tardón, L.J.; Barbancho, I. Evaluation Framework for Automatic Singing Transcription. In Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR 2014), Taipei, Taiwan, 27-31 October 2014; pp. 567-572.
- Chan, T.-S.; Yeh, T.-C.; Fan, Z.-C.; Chen, H.-W.; Su, L.; Yang, Y.-H.; Jang, R. Vocal Activity Informed Singing Voice Separation with the IKala Dataset. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia, 19-24 April 2015; pp. 718-722.
- Bittner, R.; Salamon, J.; Tierney, M.; Mauch, M.; Cannam, C.; Bello, J. MedleyDB: A Multitrack Dataset for Annotation-Intensive MIR Research. In Proceedings of the 15th International Society for Music Information Retrieval Conference, ISMIR 2014, Taipei, Taiwan, 27-31 October 2014; pp. 155-160.
- Bittner, R.M.; Wilkins, J.; Yip, H.; Bello, J.P. Medleydb 2.0: New Data and a System for Sustainable Data Collection. In Proceedings of the International Conference on Music Information Retrieval (ISMIR-16), New York, NY, USA, 7-11 August 2016; pp. 2-4.
- Bozkurt, B.; Baysal, O.; Yüret, D. A Dataset and Baseline System for Singing Voice Assessment. In Proceedings of the International Symposium on Computer Music Multidisciplinary Research (CMMR), Matosinhos, Portugal, 25-28 September 2017; pp. 430-438.
- Dzhambazov, G.; Holzapfel, A.; Srinivasamurthy, A.; Serra, X. Metrical-Accent Aware Vocal Onset Detection in Polyphonic Audio. In Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017, Suzhou, China, 23-27 October 2017; pp. 702-708.
- Meseguer-Brocal, G.; Cohen-Hadria, A.; Peeters, G. DALI: A Large Dataset of Synchronized Audio, Lyrics and Notes, Automati- cally Created Using Teacher-Student Machine Learning Paradigm. In Proceedings of the The 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, 23-27 September 2018; pp. 431-437.
- Cannam, C.; Landone, C.; Sandler, M. Sonic Visualiser. In Proceedings of the International Conference on Multimedia-MM'10, Firenze, Italy, 25-29 October 2010; ACM Press: New York, NY, USA, 2010; p. 1467.
- Sibelius. Available online: https://www.avid.com/de/sibelius (accessed on 3 August 2022).
- Mauch, M.; Dixon, S. PYIN: A Fundamental Frequency Estimator Using Probabilistic Threshold Distributions. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4-9 May 2014; pp. 659-663. [CrossRef]
- Villavicencio, F.; Bonada, J.; Yamagishi, J.; Pucher, M. Efficient Pitch Estimation on Natural Opera-Singing by a Spectral Correlation Based Strategy; Information Processing Society of Japan (IPSJ): Tokyo, Japan, 2015.
- Raffel, C.; Ellis, D.P.W. Intuitive Analysis, Creation and Manipulation of MIDI Data with Pretty_midi. In Proceedings of the 15th International Society for Music Information Retrieval Conference; Taipei, Taiwan, 27-31 October 2014.
- Salamon, J.; Gomez, E. Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Trans. Audio Speech Lang. Process. 2012, 20, 1759-1770. [CrossRef]
- Ewert, S.; Muller, M.; Grosche, P. High Resolution Audio Synchronization Using Chroma Onset Features. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, 19-24 April 2009; pp. 1869-1872.
- Muller, M.; Kurth, F.; Röder, T. Towards an Efficient Algorithm for Automatic Score-to-Audio Synchronization. In Proceedings of the ISMIR, Barcelona, Spain, 10-15 October 2004.
- Kim, J.W.; Salamon, J.; Li, P.; Bello, J.P. Crepe: A Convolutional Representation for Pitch Estimation. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15-20 April 2018; pp. 161-165. [CrossRef]
- Mauch, M.; Cannam, C.; Bittner, R.; Fazekas, G.; Salamon, J.; Dai, J.; Bello, J.; Dixon, S. Computer-Aided Melody Note Transcription Using the Tony Software: Accuracy and Efficiency. In Proceedings of the First International Conference on Technologies for Music Notation and Representation (TENOR 2015), Paris, France, 28-30 May 2015; Volume 8. [CrossRef]
- McFee, B.; Metsai, A.; McVicar, M.; Balke, S.; Thomé, C.; Raffel, C.; Zalkow, F.; Malek, A.; Dana; Lee, K.; et al. Librosa/Librosa: 0.9.1. 2022. Available online: https://librosa.org/doc/latest/index.html (accessed on 3 August 2022).
- Faghih, B.; Timoney, J. Real-Time Monophonic Singing Pitch Detection. Preprint 2022, 1-19. [CrossRef]
- Faghih, B.; Timoney, J. An Investigation into Several Pitch Detection Algorithms for Singing Phrases Analysis. In Proceedings of the 2019 30th Irish Signals and Systems Conference (ISSC), Maynooth, Ireland, 17-18 June 2019; pp. 1-5.
- Faghih, B.; Timoney, J. Smart-Median: A New Real-Time Algorithm for Smoothing Singing Pitch Contours. Appl. Sci. 2022, 12, 7026. [CrossRef]
- Faghih, B.; Chakraborty, S.; Yaseen, A.; Timoney, J. A New Method for Detecting Onset and Offset for Singing in Real-Time and Offline Environments. Appl. Sci. 2022, 12, 7391. [CrossRef]