Academia.eduAcademia.edu

Fig. 1: An overview of three kinds of structures used in this paper, and a GAN example.  ral networks. In neural style transfer of audio, feature extraction will be done automatically by the network. Our neural style transfer work in this paper is inspired by image style transfer networks. In such networks, specific layers from the CNN are associated with content (objects) versus style (texture). Gatys et al.[6] demonstrated this by reconstructing an image, preserving its content, but changing the texture of the image to the style of Van Gogh’s Starry Night. John- son et al.[13] produced a faster version of Gatys et al.’s network, reducing time for one image blend from hundreds of seconds to less than one second. Frige et al.[5] proposed a new style transfer method basing on Johnson et al.’s work, splitting the content and style images into small grids (adaptive quadtrees) and doing the style transfer operations on those similar small parts from the content and style images. Our work is also inspired by image translation networks which focus on translating just a specific portion of the image. For example, Isola et al. used conditional GANs to translate street maps to satellite maps. [12] In the work of Zhu et al.[24], they transform style on a portion of the image content using a cyclical generative adversarial network termed cycleGAN. An example of their work is the transformation of a horse in the image to a zebra, without changing the background of the content image, as shown in Figure 1(d) Next. we provide a more detailed explanation of three image style transfer networks that represent a good coverage of the range of networks available and which will feature in our approach:

Figure 1 An overview of three kinds of structures used in this paper, and a GAN example. ral networks. In neural style transfer of audio, feature extraction will be done automatically by the network. Our neural style transfer work in this paper is inspired by image style transfer networks. In such networks, specific layers from the CNN are associated with content (objects) versus style (texture). Gatys et al.[6] demonstrated this by reconstructing an image, preserving its content, but changing the texture of the image to the style of Van Gogh’s Starry Night. John- son et al.[13] produced a faster version of Gatys et al.’s network, reducing time for one image blend from hundreds of seconds to less than one second. Frige et al.[5] proposed a new style transfer method basing on Johnson et al.’s work, splitting the content and style images into small grids (adaptive quadtrees) and doing the style transfer operations on those similar small parts from the content and style images. Our work is also inspired by image translation networks which focus on translating just a specific portion of the image. For example, Isola et al. used conditional GANs to translate street maps to satellite maps. [12] In the work of Zhu et al.[24], they transform style on a portion of the image content using a cyclical generative adversarial network termed cycleGAN. An example of their work is the transformation of a horse in the image to a zebra, without changing the background of the content image, as shown in Figure 1(d) Next. we provide a more detailed explanation of three image style transfer networks that represent a good coverage of the range of networks available and which will feature in our approach: