top of page

THE OTHER

THE GAN

A Generative Adversarial Network (GAN) is an image generation algorithm. It is designed to make new photographs. Photographs that have no link to the real. As with any algorithm, at its core is a dataset; a collection of information, in this case of photographs. For the GAN, like all algorithms, its dataset is its only knowledge and its absolute universe. A GAN sets two neural nets against each other to generate content, in this case, to generate photographs. 

 

One net, the generator, is trained on a dataset. The generator attempts to make new images that might fit within that dataset. The generator passes its made image, along with images taken from its dataset, to the discriminator. The discriminator then attempts to determine whether an image is ‘real’ or ‘fake’. It is rewarded for identifying correct fakes and the generator is rewarded for tricking the discriminator. As both nets ‘learn’ they make each other more proficient, as the generator makes increasingly convincing images to get past the ever more discerning discriminator images are generated that are indistinguishable from ‘real’ images. 

 

One iteration of the GAN, the BigGAN, is trained on the ImageNET dataset, a notoriously leviathan image dataset scraped without permission from image based social media sites like Flickr. ImageNET is responsible in large part for early and continued booms in computer vision and image generation. ImageNET, created in 2009 to ‘map out the entire world of objects’, is not merely a convenient example.(1) It is the ‘canonical’ dataset.(2) Within ImageNET, images are sorted into categories. Trevor Paglen and Kate Crawford explore in-depth and with rare clarity the racist, sexist, classist, and transphobic biases of the ImageNET dataset, demonstrating clearly that ImageNET stands as an, ‘object lesson, if you will, in what happens when people are categorized like objects’.(3) However, as with my approach to Abu Ghraib, to the smartphone, and to the GAN, reasserting the photograph specifically in these discussions and looking directly at the digital image underpins this text.

 

Ultimately, the project of ImageNET and other datasets is broken, broken before the biases that arise from categorisation. Image datasets are broken because photographs are the bridge between us and machines. Before we even try and categorise them the problem is photographs; the data for a functional dataset does not exist. We do not have another photography; some kind of visual representation of our experienced world with a firm link to neutrality and representation. It has not been invented and is probably ontologically impossible. As long as digital photographs and camera produced imagery serve as the translation between experience and representation we will never see for ourselves, never mind contrive vision for computers.

 

The BigGAN, however, trained on ImageNET, is still useful to us. Due to its collation the ImageNET dataset is full of human images and largely avoids the perspective of the disembodied image, such as the Google street-view car, or the surveillance camera. The BigGAN’s knowledge then is based on human centric imagery which means the BigGAN doesn't generate imagery as much as photographs specifically. With the human image centric ImageNET as its only knowledge, the BigGAN possesses a disordered photographic sight where it treats biased photographic imagery as objective fact. 

 

In practice, for the BigGAN, lobsters exist on plates, as that is how people photograph them, and birds exist in the sky photographed from below. In the BigGAN’s generations, an entire reality is mediated through the camera and the camera’s relationship to our body. The BigGAN performs the camera, generating lens-flare in photographs of light, blur in macro images of golf balls, and so on. The GAN in this text is seen as a model for our interaction with images, it is not an imperfect model but rather is a model made by a society with an imperfect relationship to images.(4)

 

On a software level the GAN’s generations are a corruption of its internal desires. According to the creator of GANs, what the GAN really wants is to memorise, to consume photographs, and indulge in them; to transfer their data to its own and live in photographic memory indistinguishable from the self.(5) This seems to be a very human way of interacting with images. We too are desperate to commit images to memory and to supplant memory with images. The GAN embodies us, it is is a literal and often comical extension of a refusal to accept the biases of photographic representation. It is a reckoning in that it reflects us. The GAN is a broken vision machine, a model of a photograph consumed. The GAN in this text serves as the natural end of what this paper aims to think around the digital image. It is a literal and metaphorical model of us and photographs. 

Anchor 1
Anchor 2
Anchor 3
Anchor 4_textstart

Notes

(1) Fei-Fei Li the creator of ImageNet quoted in, Dave Gershgorn ‘ImageNet: The Data That Spawned the Current AI Boom’, Quartz, July 2017. Accessed 29 January 2021, https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/.

 

(2) Trevor Paglen, Kate Crawford, ‘Excavating AI’, accessed 29 January 2021, https://excavating.ai/.

 

(3) Ibid. 

 

(4) Already here a blurring between ‘us’ and ‘GAN’ is present in my writing. Throughout this text whenever I talk about the GAN i frequently ascribe the GAN with personhood. Asking what it thinks, how its looks, and so on. This is not because I see the GAN as a consciousness. Rather, this is a formal choice. I want my language when talking about the GAN to be free and open. If I am only able to refer to the GAN in terms of the machine then I will loose all of the humanity that it represents. I will only have access to the very small frame of imagination, meaning, and symbolic language that comes naturally when talking about a machine.  The GAN is not human but it is a very human thing, it is entirely informed by human. As such when interrogating the GAN it seems necessary to give it a body to answer to, to ascribe it a system of meaning so that it can be interrogated. To give the machine a body so we can question the honesty of its face. 

 

Similarly my analysis is intentionally light on technical detail. This is a conscious choice. If my stance on the GAN was more based in its minutia this text could be out of date in the few hours it takes for a website to go live. Rather, my writing on the GAN aims to question the GAN on a macro level of concept and wider idea, to still interrogate these technologies but not be behold to their morphing logic 

 

(5) "Generative models.. all of them have this property that if they really did what we ask them to do they would do nothing but memorise the training data." Lex Fridman, Ian Goodfellow: Generative Adversarial Networks (GANs) | Lex Fridman Podcast #19, 2019, https://www.youtube.com/watch?v=Z6rxFNMGdn0.  00:33:49

 

Image List

(1) This is a video I generated of a StyleGAN trained on a dataset which I compiled. The dataset is an amalgamation of images from Abu Ghraib and thousands of digital family photographs. To make this video I chose around twenty key images generated by the GAN. I then ask the GAN to fill in the space between those images, to show the latent space. This is called a latent space walk or an interpolation video. 

bottom of page