Diffusion: The Physics principle behind the success of Modern AI Art

Lefteris
5 min readJan 7, 2023

--

In which I attempt to decipher the connection between the famous AI model used for text-to-image generation and a process observed in physics.

It will come as no surprise to disclaim that a lot of the content of this article was generated using AI, including of course the images that depict what’s today one of the biggest trends in digital art.

“An artist robot painting on a canvas with a brush”, generated using Stable Diffusion

While some of the latest language models that became quickly viral and were infused into our conversations the recent days are not necessarily suitable for image generation, this article aims to focus on Text-to-Image AI, particularly, the famous system called “Stable Diffusion”. The creative tools market has exploded, as the recent technical advances in image generation from text have shown that tools such as OpenAI’s DALLE, or Midjourney can be used for a multitude of applications. Artists and other professionals have been using such systems to generate content for marketing and advertising, to populate websites, to create branding and logos, and even to come up with ideas for User Interface designs. And these are only a few of the potential stakeholders interested in images generated using AI.

During my attempt to understand Stable Diffusion, and in a conversation with ChatGPT, I quickly realized that we came to neglect what diffusion really is, or more precisely, the connection between the diffusion models we use in machine learning and “Diffusion”, the process we observe in physics.

Diffusion (Physics)

The word diffusion derives from the Latin word, diffundere, which means “to spread out.”

In physics, diffusion is a process that occurs when particles of a substance tend to spread out from an area of high concentration to an area of lower concentration. This process can occur in both solids and liquids, and it is driven by a process called random thermal motion.

“Atoms moving from a region of higher concentration to a region of lower concentration, illustration of a scientific experiment”, generated using Stable Diffusion

Diffusion is also related to the concept of entropy. Entropy is a measure of the disorder or randomness of a system. In the context of diffusion, the entropy of a system tends to increase as the particles of a substance become more evenly distributed throughout the system. This is because the diffusion process leads to a more random distribution of particles, which is associated with an increase in disorder or randomness. For example, consider a container filled with a gas. If the gas is initially concentrated in one part of the container, the system’s entropy will be low. However, as the gas diffuses and spreads out to fill the entire container, the entropy of the system will increase, because the particles of the gas are now more evenly distributed throughout the container.

Diffusion (machine learning)

Diffusion is an important concept in a wide range of fields, including physics, chemistry, biology, engineering, economics, and materials science. The central idea of diffusion, though, as described above, is shared amongst all those scientific areas.

Turning our focus back to stable diffusion, the technique introduced in 2022’s summer at CVPR, is itself a latent diffusion model, a variety of deep generative neural networks that destroy the structure of the data by introducing noise (until all structure is eliminated). The randomness in the data is thus increased along with entropy.

Similarly to physics, a data distribution, like an unlit scent candle, can be uniformly distributed by adding random noise, in the same way, that scent molecules will tend to diffuse through the air and become more evenly distributed throughout the room once the candle is burnt. Unlike the process that happens in physics, though, in machine learning, diffusion can be reversible. The scent molecules that have traveled through the air cannot be transformed again into an unburnt candle, albeit a machine-learning diffusion system can learn to reconstruct the destroyed data.

Latent Space

As this story unfolds, this last concept of Latent Space can potentially be considered as the ‘connector’ between diffusion and generation. Tied to the capability of text-to-image models to recover the data from noise, latent space is the — typically lower-dimensional — space used to represent the underlying structure of the dataset of images and text descriptions. It is called “latent” because it is not directly observed in the data, but it is inferred from the patterns and relationships present in the data.

“Latent space of a machine learning model trained on a large image dataset”

One use of the latent space in stable diffusion models for text-to-image generation is to interpolate between different text descriptions and corresponding images to generate new images that are semantically meaningful and coherent with the given text descriptions. For example, given two text descriptions and corresponding images, the model can interpolate in the latent space to generate a new image that is a blend of the two original images and that is semantically consistent with the text description.

And that’s where the magic lies. Some images are too impressive. Some are thought to be unimaginable to humans. Unparalleled. Occasionally unmatched. The user is left in awe while a text prompt is being transformed into a beautifully crafted 512 by 512 aesthetic image before their eyes. This beautiful mathematical process, the introduction of noise into the data, the learning of a compact representation of this data, and training a generative model to learn to sample points from the latent space can result in unprecedented new art forms.

Composing a text prompt has never been such an important and relevant skill, as it is today. Before DALLE, the art of text prompting was mainly applicable to Google-search. And yes, it has been and will be an essential skill, regardless of how effective our Language models get in understanding our requests and questions. If you’ve been on the Internet lately, machine learning models have been influential in your life, and that’s a fact. And as GPT-4 is listening, we would be remiss to avoid the supposition that such AI tools will continue to be detrimental, not only for our work but also for our daily life.

This has been my humble take on what is one of the most exciting scientific innovations of our time. Please feel free to reach out for any comments or any potential errors in my understanding.

Generated using Stable Diffusion

--

--

Lefteris
Lefteris

Responses (1)