Kernel techniques for generalized audio crossfades

William A. Sethares; James A. Bucklew; Kok Lay Teo

doi:10.1080/23311835.2015.1102116

Outline

Kernel techniques for generalized audio crossfades

William Sethares

2015, Cogent Mathematics

https://doi.org/10.1080/23311835.2015.1102116

visibility

…

description

22 pages

link

1 file

Abstract

This paper explores a variety of density and kernel-based techniques that can smoothly connect (crossfade or "morph" between) two functions. When the functions represent audio spectra, this provides a concrete way of adjusting the partials of a sound while smoothly interpolating between existing sounds. The approach can be applied to both interpolation-crossfades (where the crossfade connects two different sounds over a specified duration) and to repetitive-crossfades (where a series of sounds are generated, each containing progressively more features of one sound and fewer of the other). The interpolation surface can be thought of as the two dimensions (time and frequency) of a spectrogram, and the kernels can be chosen so as to constrain the surface in a number of desirable ways. When successful, the timbre of the sounds is changed dynamically in a plausible way. A series of sound examples demonstrate the strengths and weaknesses of the approach.

Figures (5)

Figure 1: Audio crossfades generate sounds that change smoothly between a source and a destination sound. In interpolation crossfades (a), the sound be- gins as A and over time smoothly becomes like B. The total duration of the output sound is independent of the duration of A and B and the cross only de- pends on the sound in the starting and ending frames. The overall effect is one of stretching time under the constraint that the sound must emerge continuously from A and merge continuously into B. In repetitive crossfades (b), a series of intermediate sounds M; merge aspects of A and B, analogous to the intermedi- ary photographs of an image morph that merges various aspects of the starting and ending photographs. The duration of each output sound /V/; is equal to the common duration of A and B. Thus interpolation crosses begin as one sound and end as another while in a repetitive cross, each V/; contains features of both of the original sounds. For instance, an interpolation crossfade might start with the attack portion of a cymbal and end with the final moments of a lion’s roar. The interpolation crossfade is the transition that occurs over a user specified time. In contrast, each intermediate sound in a repetitive crossfade merges aspects of both the complete lion sound (from start to end) with those of the complete cymbal (from attack through decay).

Figure 2: A crossfade surface can be defined by Laplace’s equation V7u(x, y) = 0 with boundary conditions given by the spectra of two sounds A and B. The x-axis (representing time) proceeds from time 0 to time t* while the y-axis (representing frequency) covers the range from DC (at 0) to the Nyquist rate (at 1). The surface is formally analogous to a spectrogram and can be inverted back into the time domain using any of a variety of standard techniques. film is, in essence, reinterpreted as a spectrogram. frame is dipped into a pool of soapy water and carefully retracted, a smooth sheet forms that is characterized as the surface that minimizes the surface energy where the height of the sheet at each point is u(x, y). Mathematically, this can be stated as the PDE (1) with the specified boundary conditions. Reinterpreting the contour of the soap film (i.e., the field values) as sound provides the audio output, which can be heard to smoothly interpolate from the left hand spectrum to the right hand spectrum. This views the crossfade function as the solution to a boundary value problem over a two-dimensional domain defined by the spectrum of the sound in the y dimension and the duration of the crossfade in the x direction. The soapy film is, in essence, reinterpreted as a spectrogram.

Figure 3: Sinusoids of frequencies w;, = 5 and wy, = 12 are crossed with frequencies Wr, = 6 and wr, = 11 using the Poisson kernel and three different G(«) functions (see text for details). Though the ridges connecting the nearby frequencies appear in all three figures, the drop in (a) is likely to be heard as a drop in volume over the course of the first half of the crossfade. This is plotted in Fig. 3(a). The boundaries at the left and right show the two sinusoids (as delta functions at their respective frequencies) while the surface gradually descends to the middle where they meet. Observe that there are two shapes that connect the nearby frequencies w,, to wr, and wy, to wr,. These are local maxima (in the y direction) which form a connected set as x varies over its range; call these ridges. Observe that there is a significant loss of height in the ridges of Fig. 3(a). Since the magnitude of the surface corresponds to the amplitude of the spectral components, this may be perceptible as a drop in the volume towards the middle of the crossfade region.

Figure 4: The ridges in the crossfade surface on the left are equally wide irrespec- tive of the absolute frequency. In some situations, it may be advantageous to allow the width of the ridges to become wider at higher frequencies, as shown on the right. This can be accomplished by defining the kernels as suggested in Example 3.

References (17)

F. Boccardi and C. Drioli, "Sound Morphing With Gaussian Mixture Models," Proc. 4th COST G-6 Workshop on Digital Audio Effects, Limerick, Ireland, Dec. 2001.
M Dolson, "The phase vocoder: a tutorial," Computer Music Journal, Spring, Vol. 10, No. 4, 14-27, 1986.
T. Erbe, Soundhack Manual, Frog Peak Music, Lebanon, NH, 1994 (pp. 7- 40).
E. Farnetani and D. Recasens, "Coarticulation and Connected Speech Pro- cesses," in Handbook of Phonetic Sciences, 2cnd Edition, Ed. W. J. Hardcas- tle, J. Laver, F. E. Gibbon, Blackwell Pubs. 2010 (pp. 316-352).
K. Fitz, L. Haken, S. Lefvert, and M. O'Donnell, "Sound morphing using Loris and the reassigned bandwidth-enhanced additive sound model: Practice and applications," in International Computer Music Conference, Gotenborg, Sweden, 2002.
K. E. Gustafson, Introduction to Partial Differential Equations and Hilbert Space Methods, John Wiley and Sons, Hoboken, NJ, 1980 (pp. 1-35).
W. Hatch, High-Level Audio Morphing Strategies, MS Thesis, McGill Uni- versity, Aug. 2004.
J. Laroche and M. Dolson, "Improved phase vocoder time-scale modification of audio," IEEE Trans. on Audio and Speech Processing, Vol. 7, No. 3, May 1999.
R. J. McAulay and T. F. Quatieri, "Speech analysis/synthesis based on a si- nusoidal representation," IEEE Trans. on Acoustics, Speech, and Signal Pro- cessing ASSP-34(4), 744-754, 1986.
F. Oberhettinger, Fourier Transforms of Distributions and Their Inverses Academic Press, New York, 1973 (pp. 15-17).
A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing, 3rd Edition, Prentice-Hall, New Jersey, 2009 (pp. 730-742).
L. Polansky, and M. McKinney, "Morphological mutation functions: ap- plications to motivic transformations and to a new class of cross-synthesis techniques," Proc. of the ICMC, Montreal, 1991.
X. Serra, "Sound hybridization based on a deterministic plus stochastic de- composition model," in Proc. of the 1994 International Computer Music Con- ference, Aarhus, Denmark, 348351, 1994.
W. A. Sethares, Rhythm and Transforms, Springer-Verlag, London, UK 2007 (pp. 111-145)
W. A. Sethares, A. Milne, S. Tiedje , A. Prechtl and J. Plamondon, "Spectral tools for dynamic tonality and audio morphing," Computer Music Journal, Vol. 33, No. 2, Pages 71-84, Summer 2009.
M. Slaney, M. Covell, and B. Lassiter, "Automatic audio morphing," Proc- ceedings of the 1996 International Conference on Acoustics, Speech, and Sig- nal Processing, Atlanta, GA, May 1996.
E. Tellman, L. Haken, B. Holloway, "Timbre morphing of sounds with un- equal numbers of features" Journal of the Audio Engineering Society, Vol. 43, No. 9, 678-689, Sept. 1995.

Kernel techniques for generalized audio crossfades

Sign up for access to the world's latest research

Abstract

Related papers

References (17)

Related papers

Related topics