26 Dec Generative adversarial networks: What GANs are and how they…
Perhaps you’ve read about AI capable of producing humanlike speech or generating images of people that are difficult to distinguish from real-life photographs. More often than not, these systems build upon generative adversarial networks (GANs), which are two-part AI models consisting of a generator that creates samples and a discriminator that attempts to differentiate between the generated samples and real-world samples. This unique arrangement enables GANs to achieve impressive feats of media synthesis, from composing melodies and swapping sheep for giraffes to hallucinating footage of ice skaters and soccer players. In point of fact, it’s because of this prowess that GANs have been used to produce problematic content like deepfakes, which is media that takes a person in existing media and replaces them with someone else’s likeness.
The evolution of GANs — which Facebook AI research director Yann LeCun has called the most interesting idea of the decade — is somewhat long and winding, and very much continues to this day. They have their deficiencies, but GANs remain one of the most versatile neural network architectures in use today.
History of GANs
The idea of pitting two algorithms against each other originated with Arthur Samuel, a prominent researcher in the field of computer science who’s credited with popularized the term “machine learning.” While at IBM, he devised a checkers game — the Samuel Checkers-playing Program — that was among the first to successfully self-learn, in part by estimating the chance of each side’s victory at a given position.
But if Samuel is the grandfather of GANs, Ian Goodfellow, former Google Brain research scientist and director of machine learning at Apple’s Special Projects Group, might be their father. In a seminal 2014 research paper simply titled “Generative Adversarial Nets,” Goodfellow and colleagues describe the first working implementation of a generative model based on adversarial networks.
Goodfellow has often stated that he was inspired by noise-contrastive estimation, a way of learning a data distribution by comparing it against a defined noise distribution (i.e., a mathematical function representing corrupted or distorted data). Noise-contrastive estimation uses the same loss functions as GANs — in other words, the same measure of performance with respect to a model’s ability to anticipate expected outcomes.
Of course, Goodfellow was’t the only one to pursue an adversarial AI model design. Dalle Molle Institute for Artificial Intelligence Research co-director Juergen Schmidhuber advocated predictability minimization, a technique that models distributions through an encoder that maximizes the objective function (the function that specifies the problem to be solved by the system) minimized by a predictor. It adopts what’s known as a minimax decision rule, where the possible loss for a worst case (maximum loss) scenario is minimized as much as possible.
And this is the paradigm upon which GANs are built.
Again, GANs consist of two parts: generators and discriminators. The generator model produces synthetic examples (e.g., images) from random noise sampled using a distribution, which along with real examples from a training data set are fed to the discriminator, which attempts to distinguish between the two. Both the generator and discriminator improve in their respective abilities until the discriminator is unable to tell the real examples from the synthesized examples with better than the 50% accuracy expected of chance.
GANs train in an unsupervised fashion, meaning that they infer the patterns within data sets without reference to known, labeled, or annotated outcomes. Interestingly, the discriminator’s work informs that of the generator — every time the discriminator correctly identifies a synthesized work, it tells the generator how to tweak its output so that it might be more realistic in the future.
In practice, GANs suffer from a number of shortcomings owing to their architecture. The simultaneous training of generator and discriminator models is inherently unstable. Sometimes the parameters — the configuration values internal to the models — oscillate or destabilize, which isn’t surprising given that after every parameter update, the nature of the optimization problem being solved changes. Alternatively, the generator collapses, and it begins to produce data samples that are largely homogeneous in appearance.
The generator and discriminator also run the risk of overpowering each other. If the generator becomes too accurate, it’ll exploit weaknesses in the discriminator that lead to undesirable results, whereas if the discriminator becomes too accurate, it’ll impede the generator’s progress toward convergence.
A lack of training data also threatens to impede GANs’ progress in the semantic realm, which in this context refers to the relationships among objects. Today’s best GANs struggle to reconcile the difference between palming and holding an object, for example — a differentiation most humans make in seconds.
But as Hanlin Tang, senior director of Intel’s AI laboratory, explained to VentureBeat in a phone interview, emerging techniques get around these limitations. One entails building multiple discriminator into a model and fine-tuning them on specific data. Another involves feeding discriminator dense embedding representations, or numerical representations of data, so that they have more information from which to draw.
“There [aren’t] that many well-curated data sets to start … applying GANs to,” Tang said. “GANs just follow where the data sets are going.”
On the subject of compute, Youssef Mroueh, a research staff member in the IBM multi-modal algorithms and engines group, is working with colleagues to develop lightweight models dubbed “small GANs” that reduce training time and memory usage. The bulk of their research is concentrated in the MIT-IBM Watson AI Lab, a joint AI research effort between the Massachusetts Institute of Technology and IBM.
“[It’s a] challenging business question: How can we change [the] modeling without all the computation and hassle?” Mroueh said. “That’s what we’re working toward.”
Image and video synthesis