Neural Style Transfer: Past, present, future

Lefteris
6 min readOct 30, 2021
Neural Style Transfer: A photograph of a landscape ( Tübingen, Neckarfront) is transformed to look like it is painted by Wassily Kandinsky (using as style image the famous “Composition VII” painting)

Definition and Origins

Capture a landscape photograph of the view from your window. Now re-imagine the photograph as if instead of being captured by you, with your smartphone or camera, it was painted by Vincent van Gogh. Bring to mind the characteristic and distinctive style of Van Gogh’s “The Starry Night” and map it to the content of the ordinary — no offense — photograph you just took. Neural Style Transfer (NST) is the research area that is connected to the family of algorithms that are capable of performing this operation, not only on 2D images but also in other forms of visual data, such as videos and 3D models.

The operation is better illustrated below: The content image (left) is a photograph of a landscape in Sheffield, UK (image by Harrison Qi on Unsplash). The style image (center) is the famous “Starry night” by Vincet van Gogh. The resulting stylized result (right) preserves the contents of the photograph while it embodies the artistic influences (texture and color) of the painting.

Producing artistic stylizations on digital art — mostly images —, has been a long-standing research problem that initially concerned the field of Non-Photorealistic Rendering (NPR). Initial image stylization systems were focused on heuristic algorithms that attempt to simulate the placement of brush strokes on a digital canvas (see the Image Analogies paper). NPR emerged at the intersection of computer science and art and made possible the creation of synthetic imagery artworks from images, stylized computer games, and films such as Waking Life and Disney’s Tarzan.

The Deep Learning era

The relatively recent emergence of the field of NST can be largely attributed to the evolution of Deep Learning and the development of Convolutional Neural Networks (CNNs). Empowered by the human-level (and sometimes superhuman) performance of neural networks in object classification, the task of image stylization has now been advanced with algorithms that are capable of mimicking a broad spectrum of styles.

The seminal work produced by Gatys et al. was the one that introduced the field. Based on the VGG-19 network, and on the idea that as we go deeper into a CNN, the input image is transformed into feature maps that increasingly care about the content of the image rather than any detail about the texture or color of pixels, their algorithm takes as input a ‘content’ image and a ‘style’ image and constructs a synthesized image that embodies the artistic influences (texture, color) of the style image, while preserving the contents of the content image.

Why NST? Real-world applications

This is definitely a fascinating and creative task, but why bother? Why is the task important and why researchers should be concerned with it? In fact, after the initial algorithm by Gatys et al., a myriad of researches appeared that aimed to optimize the process, improve it and also extend it to different forms of media. Photorealistic style transfer for example attempts to transfer the style of one photograph onto another, a process applicable when one intends to transform a photograph into a picture that looks like it is captured at a different time of the day, or under different weather conditions or illumination. Yes, your daylight photographs can be transformed to look as if they have been taken at night.

Example of Photorealistic Style Transfer. The photograph on the left is stylized to look as it is captured at night (the photograph in the center serves as the style image). The stylized result (right) was generated using the WCT2 method.

The rest of this article is concerned with the applicability of NST to actual, tangible, real-life applications rather than demonstrating its capabilities. Though, since these are intertwined, below are listed some concrete examples of real-world applications where the capabilities of the NST algorithms can help to advance. The provision of such a list also serves as support to the statement that the research in the area is quite significant.

Filmmaking

In 2017, the life of Vincent van Gogh was depicted in an animated biographical drama film (Loving Vincent). Except for the success of the film in delivering the troubled artist’s final days, this work is recognized as outstanding and unique due to the impressive visual effect that resembles Van Gogh’s painting style. The success of the film, though, is heavily attributed to the monumental and labor-intensive work of more than 50 artists that painted each of the film’s 65,000 frames.

Neural Style Transfer algorithms that consider the problem of temporal incoherence can aid similar efforts in the film production industry. Cutting-edge technology is already being used extensively in filmmaking. Take as an example the revolutionary technology that is utilized for the recent Star Wars TV series The Mandalorian. NST systems can significantly enhance the production of the three-dimensional environments that are portrayed in the huge LED screens that surround the actors during filming.

Games

Wandering around in an open-world photorealistic game might get a bit boring sometimes. But what if the whole world was being transformed into an artistically painted canvas where you could still navigate in three dimensions? Unity recently announced a neural style transfer system that is applied in a game environment allowing the user to swap between different artistic styles that artistically paint the whole game’s scene in real-time.

Except for the user side of things — that is given the ability to choose the preferred stylization artwork —, game developers can also be benefitted. Novel experiences can be created with dynamic scenes that are stylized based on the mood of the player’s character, or based on the image manually created and uploaded by the user.

Augmented Reality (AR)/Virtual Reality (VR)

AR experiences embed virtual layers in the real world, while VR is composed of a completely computer-generated environment where the user does not have any sense of the real world. Benjamin Bardou’s art of matte painting on feature films like Megalopolis can be thought of as an example of how an AR experience could transform the real world in real-time. Style transfer can be a fascinating tool for the generation of novel real-time visualizations of real-world or virtual environments.

Creative and social communication tools

In a way, NST has already made it into our day-to-day lives, as it has been infused into various social and creative communication tools, such as Prisma. Creators, and also everyday social media users benefit from such applications in that they attribute an artistic and aesthetic effect to their photographs or digital creations.

Neuroaesthetics

The act of creating is itself intriguing, and in a sense, it can be proof or an evaluation for what we call ‘intelligence’. The field of neuroaesthetics is concerned with justifying the factors that cause the experience of beauty or more precisely the aesthetic experience. Synthesizing stylized images, videos or games is definitely a creative process. Understanding what makes a resulting stylization visually more appealing to another could be valuable in understanding the intelligence behind our own creative selves.

The potential outputs of the research in NST are not restricted only to the above list. On the other hand, the progress in the field can be of benefit not only to the thriving creative industries sector, including film, TV, photography, music, advertising, museums, galleries, and digital creative industries, but it can also serve as a tool (e.g., Deep Dream Generator) for our own artistic expressions.

--

--