Deep Learning-based F0 Synthesis for Speaker Anonymization
2023, arXiv (Cornell University)
https://doi.org/10.48550/ARXIV.2306.16860Abstract
Voice conversion for speaker anonymization is an emerging concept for privacy protection. In a deep learning setting, this is achieved by extracting multiple features from speech, altering the speaker identity, and waveform synthesis. However, many existing systems do not modify fundamental frequency (F0) trajectories, which convey prosody information and can reveal speaker identity. Moreover, mismatch between F0 and other features can degrade speech quality and intelligibility. In this paper, we formally introduce a method that synthesizes F0 trajectories from other speech features and evaluate its reconstructional capabilities. Then we test our approach within a speaker anonymization framework, comparing it to a baseline and a state-of-the-art F0 modification that utilizes speaker information. The results show that our method improves both speaker anonymity, measured by the equal error rate, and utility, measured by the word error rate.
References (22)
- N. Tomashenko et al., "Introducing the VoicePrivacy initiative," in Proc. Interspeech Conf., 2020.
- N. Tomashenko et al. "2nd VoicePrivacy chal- lenge evaluation plan." (2022), [Online]. Available: https://arxiv.org/abs/2203.12468.
- H. Turner, G. Lovisotto, and I. Martinovic, "Speaker anonymization with distribution-preserving x-vector generation for the VoicePrivacy challenge 2020," in VoicePrivacy Challenge Submission, 2020.
- P. Champion, D. Jouvet, and A. Larcher, "A study of f0 modification for x-vector based speech pseudonymiza- tion across gender," in 2nd AAAI Workshop on Privacy- Preserving AI, 2021.
- U. E. Gaznepoglu and N. Peters, "Exploring the impor- tance of f0 trajectories for speaker anonymization using x-vectors and neural waveform models," in Workshop on MLSLP, 2021.
- C. O. Mawalim, K. Galajit, J. Karnjana, S. Kidani, and M. Unoki, "Speaker anonymization by modifying fundamental frequency and x-vector singular value," Computer Speech & Language, vol. 73, 2022.
- F. Fang et al., "Speaker anonymization using x-vector and neural waveform models," in Proc. 10th ISCA Speech Synthesis Workshop, 2019.
- N. Tomashenko et al., "The VoicePrivacy 2020 chal- lenge: Results and findings," Computer Speech & Lan- guage, vol. 74, 2022.
- L. Tavi, T. Kinnunen, and R. González Hautamäki, "Improving speaker de-identification with functional data analysis of f0 trajectories," Speech Communication, vol. 140, 2022.
- U. E. Gaznepoglu, A. Leschanowsky, and N. Pe- ters, "Voiceprivacy 2022 system description: Speaker anonymization with feature-matched F0 trajectories," in VoicePrivacy Challenge Submission, 2022.
- N. Tomashenko et al. "The VoicePrivacy 2022 challenge results." (2022), [Online]. Available: https://www.voiceprivacychallenge.org/results-2022/docs/VoicePriva
- S. A. Zahorian and H. Hu, "A spectral/temporal method for robust fundamental frequency tracking," The Journal Acoust. Soc. of America, vol. 123, no. 6, 2008.
- J. W. Kim, J. Salamon, P. Li, and J. P. Bello, "CREPE: A convolutional representation for pitch estimation," in Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2018.
- L. Ardaillon and A. Roebel, "Fully-convolutional net- work for pitch estimation of speech signals," in Proc. Interspeech Conf., 2019.
- D. N. Tran, U. Batricevic, and K. Koishida, "Robust pitch regression with voiced/unvoiced classification in nonstationary noise environments," in Proc. Interspeech Conf., 2020.
- R. Vaysse, C. Astésano, and J. Farinas, "Performance analysis of various fundamental frequency estimation algorithms in the context of pathological speech," The Journal Acoust. Soc. of America, vol. 152, no. 5, 2022.
- N. Tomashenko et al. "1st VoicePrivacy chal- lenge evaluation plan." (2020), [Online]. Available: https://arxiv.org/abs/2205.07123.
- P.-G. Noé et al., "Towards a unified assessment frame- work of speech pseudonymisation," Computer Speech & Language, vol. 72, 2022.
- P. Champion, D. Jouvet, and A. Larcher, "Speaker infor- mation modification in VoicePrivacy 2020 toolchain," in VoicePrivacy Challenge Submission, 2020.
- A. Paszke et al., "PyTorch: An imperative style, high- performance deep learning library," in Proc. of 33rd Int. Conf. on Neural Inf. Proc. Sys. (NeurIPS), 2019.
- V. Fomin, J. Anmol, S. Desroziers, J. Kriss, and A. Tejani, High-level library to help with training neural networks in PyTorch, 2020.
- T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, "Optuna: A next-generation hyperparameter optimiza- tion framework," in Proc. ACM SIGKDD, 2019. B. Gfeller, C. Frank, D. Roblek, M. Sharifi, M. Tagliasacchi, and M. Velimirović, "SPICE: Self- supervised pitch estimation," IEEE/ACM TASLP, vol. 28, 2020.