Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.

Log In
Sign Up

Figure 1 – uploaded by Xuehao Liu

See full PDF downloadDownload figure

Fig. 1: An overview of three kinds of structures used in this paper, and a GAN example. ral networks. In neural style transfer of audio, feature extraction will be done automatically by the network. Our neural style transfer work in this paper is inspired by image style transfer networks. In such networks, specific layers from the CNN are associated with content (objects) versus style (texture). Gatys et al.[6] demonstrated this by reconstructing an image, preserving its content, but changing the texture of the image to the style of Van Gogh’s Starry Night. John- son et al.[13] produced a faster version of Gatys et al.’s network, reducing time for one image blend from hundreds of seconds to less than one second. Frige et al.[5] proposed a new style transfer method basing on Johnson et al.’s work, splitting the content and style images into small grids (adaptive quadtrees) and doing the style transfer operations on those similar small parts from the content and style images. Our work is also inspired by image translation networks which focus on translating just a specific portion of the image. For example, Isola et al. used conditional GANs to translate street maps to satellite maps. [12] In the work of Zhu et al.[24], they transform style on a portion of the image content using a cyclical generative adversarial network termed cycleGAN. An example of their work is the transformation of a horse in the image to a zebra, without changing the background of the content image, as shown in Figure 1(d) Next. we provide a more detailed explanation of three image style transfer networks that represent a good coverage of the range of networks available and which will feature in our approach: — Figure 1 An overview of three kinds of structures used in this paper, and a GAN example. ral networks. In neural style transfer of audio, feature extraction will be done automatically by the network. Our neural style transfer work in this paper is inspired by image style transfer networks. In such networks, specific layers from the CNN are associated with content (objects) versus style (texture). Gatys et al.[6] demonstrated this by reconstructing an image, preserving its content, but changing the texture of the image to the style of Van Gogh’s Starry Night. John- son et al.[13] produced a faster version of Gatys et al.’s network, reducing time for one image blend from hundreds of seconds to less than one second. Frige et al.[5] proposed a new style transfer method basing on Johnson et al.’s work, splitting the content and style images into small grids (adaptive quadtrees) and doing the style transfer operations on those similar small parts from the content and style images. Our work is also inspired by image translation networks which focus on translating just a specific portion of the image. For example, Isola et al. used conditional GANs to translate street maps to satellite maps. [12] In the work of Zhu et al.[24], they transform style on a portion of the image content using a cyclical generative adversarial network termed cycleGAN. An example of their work is the transformation of a horse in the image to a zebra, without changing the background of the content image, as shown in Figure 1(d) Next. we provide a more detailed explanation of three image style transfer networks that represent a good coverage of the range of networks available and which will feature in our approach:

Related Figures (2)

Fig. 2: (a)-(d): The spectrogram examples of slow transfer and faster fixed style transfer from the same source flute and keyboard. (a) the source flute, (b) the source keyboard, (c) the output spectrogram of slow transfer, (d) The output spectrogram of Faster Fixed Style transfer. (e)-(h): Spectrogram examples of the cycleGAN transfer. (e) the original flute x, (f) the original keyboard y, (g) the x fare output, (h): the yfare output.

Fig. 3: t-SNE mapping of pool5 T-SNE[3, 16] is a clustering method used in visualization tasks of classifi- cation networks[7, 10], which we use to examine whether our new style transfer classes appear as clear clusters - and as separate clusters from the other classes. It is using a student-t distribution on the Stochastic Neighbor Embedding (SNE). SNE calculates the similarity by computing the conditional probability using Euclidean distances between instances. We apply t-SNE on the Pooling Layer 5[10] of every classification network.

Related topics:

Computer Science Audio morphing

Connect with 287M+ leading minds in your field

Discover breakthrough research and expand your academic network

Explore
Papers
Topics

Features
Mentions
Analytics
PDF Packages
Advanced Search
Search Alerts

Journals
Academia.edu Journals
My submissions
Reviewer Hub
Why publish with us
Testimonials

Company
About
Careers
Press
Help Center
Terms
Privacy
Copyright
Content Policy

580 California St., Suite 400

San Francisco, CA, 94104

© 2025 Academia. All rights reserved