GANs — An engine of lies
“Lying, a telling of beautiful untrue things, this is the proper aim of art” — Oscar Wilde.”
The smartest thing a human brain can do is lie. It takes a lot of practice and a powerful mind to make up a lie that is actually believable. And since humans are smarter than machines, we will be able to tell when a machine lies…right?
This video is created entirely by Artificial Intelligence. If I showed you this video without context, you would think it was actually Barack Obama reciting those lines. We have all seen AI-generated art, poetry, music. But how does AI create this?
Generating new content may look like magic at first sight; in this blog, I will show you how the spell is cast.
“Generative Adversarial Networks is the most interesting idea in the past decade of Machine Learning” — Yann LeCun, Director of Facebook AI.
GANs or Generative Adversarial Networks is a beautiful class of algorithms that pits two AI models against each other. Like a competition between two warriors. Only one will prevail, and the loser will have to change its properties.
A GAN contains two models:
a) A Generator: This system is given a lot of samples of a pattern, and its job is to learn to produce similar patterns as its input.
b) A Discriminator: The sole aim of this system is to tell apart real patterns and patterns generated by the generator.
And therein lies the competition. This results in a zero-sum game, where there is always a winner and a loser.
The data from the second system is fed back into the first one, so it can refine its efforts to produce better fakes, while its data is fed into the second system, so it can get better at detecting fakes.
They are like two adversaries who constantly make each other better. It is a battle of bots where you take a young generator and train it to become a master of forgery.
The process ends when the generator gets so good at generating fake samples that the discriminator cannot discriminate. This means we now have a generator capable of fooling its adversary.
Imagine an AI capable of such a high degree of deception that it can deceive extremely smart systems built to catch fakes. This AI can be used for creating fake images, fake texts, fake videos, etc., but there is a bigger picture. Imagine if it can create a new image with an input of a different but slightly related image. It would mean that if you feed it a frame from a video, it can be used to predict what happens next. Think Minority Report.
It could help us predict evolution cycles, encryption algorithms even generate a 3D model from a simple photograph. One of the widely used applications of this technology is image augmentation. If an image is blurry and has some vital pieces of information missing, GANs can essentially recreate this image by filling in the blank spaces and generate a high quality image.
How does it do this?
The magic lies in the math. The GAN essentially represents images in a latent space. A latent space can be thought of as a graph where every image it has seen before is represented as a dot. When a generator represents a picture of a car in a graph, it doesn’t mean it picks a random dot in the graph. It will pick a dot that falls in the nearby region of other car pictures. Thus creating a distribution of probabilities of an image being a car.
This latent space is the direct result of our generator learning some patterns. If I showed you a picture of a car, you would easily guess that it is a car because you have identified the basic features that define a car. This can be defined as the overall outline or features of a car. A car, in its essence, has a particular shape, some identifying curves, and some major identifiable parts like wheels and headlights. Our generator identifies these patterns in its provided samples and plots them on a curve.
So when a generator is asked to generate a picture of a car, our program can generate a “pseudo-random” variable. This is a random variable along a uniform distribution of variables that represent similar things, in our case — a car.
The fact that an AI can produce a large number of samples of images of a human face, pictures of animals, pictures of houses, pieces of art, etc., indicates that the AI has formed a deeper understanding of what a human body, a cat, or a house is. This can have very interesting applications:
For example, suppose you identified a vector representation of an outdoor cat, subtracted the vector of the outdoors and added the corresponding vectors for the color blue, a pair of sunglasses, a dining table; you would essentially produce an image of a blue cat wearing of sunglasses, sitting on a dining table even if you had never seen this before.
The ability to deeply understand something to the point of being able to replicate it so realistically that humans cannot tell whether it is artificial or if it is indeed real-life is powerful. The success achieved by GANs over time is simply astonishing, as seen in the picture below.
This is the magic of GANs, a model capable of conjuring lifelike images, surreal artworks, and soul-touching poetry that fool its very creator while just in its infancy of development. There lies an exciting journey ahead exploring the possibilities of GANs and testing the boundaries of Adversarial Training.
Lying, another human invention mastered by another human invention.