conversations – Interview by Anika Meier – 05.12.2024
MERZMENSCH: "WE CANNOT STOP TEACHING VISUAL LITERACY"
AI AND ART
For nearly a decade, Merzmensch has been exploring the creative side of generative AI. He has experimented with the most accessible models and approaches, from Google's Deep Dream in 2016 to RunwayML in 2024. As a creative member of the OpenAI Community Ambassador Group since 2020, Merzmensch's goal has always been to unleash the creativity of AI by pushing these models to their limits. His 2021 series, EVE OF DIFFUSION (MONA LISA DRINKING WINE WITH DA VINCI), tested the cultural and theoretical knowledge of DALL-E 1, to which he gained early access.
Merzmensch has been inspired by the community of AI innovators he surrounds himself with—most notably the work of Ross Goodwin, Holly Herndon, and Mat Dryhurst. His book, KI-KUNST (AI ART), in the series DIGITALE BILDKULTUREN, explores the creative possibilities of generative AI and its crucial role in the context of art history. In both his art and his writing, Merzmensch considers not only the creative potential of AI but also how this rapidly evolving technology reflects past philosophies and discussions about the future of society.
In conversation with Anika Meier, Merzmensch discusses the early days of AI diffusion models, the inspiration for his series EVE OF DIFFUSION, and what it means to live in a "post-truth" society.
Anika Meier: Merzmensch, I just finished reading your essay on AI art. The text was published as part of the series Digital Image Cultures by Annekathrin Kohout and Wolfgang Ullrich. You describe how you came to artificial intelligence and how you delved deeper into the subject—both in theory and in practice.
You became aware of AI through the artist Ross Goodwin, who was a ghostwriter in the White House during Obama's presidency. In this election campaign, you had to look at some images multiple times and still weren't sure if they were AI-generated or not.
How do you explain AI today, almost two years after the release of your book?
Merzmensch: Indeed, it may sound like a cliché, but Ross Goodwin’s 2016 essay series ADVENTURES IN NARRATED REALITY was an eye-opener—an epiphany. His explorations of creative human-machine collaboration were groundbreaking and fascinating. The first AI film, SUNSPRING (2016), and the first AI book, 1 THE ROAD (2018), are just some of his footprints on the then-fledgling ground of Generative AI. His conclusion that AI is an inspiring augmentation of human creativity, but not a replacement for humans, is also crucial to our understanding of the paradigm shift we're currently experiencing.
At the same time, I began to experiment with Google Deep Dream locally on my computer. The moment when a machine kept interpreting images and changing them in its own unexpected way was magical. So I dove deep into the unfolding world of Generative AI—an endless rabbit hole, highly relevant to our global culture, perception, and understanding of creativity.
I just had to write a book that could combine technology and cultural studies—an anatomy of generative models in the context of contemporary art. These are indeed high-speed developments—a year in the GenAI timeline is like a century, and the time dilation is increasing continuously. I was aware from the start that writing a book about Generative AI was a Sisyphean task, and it made me a happy man. In the context of art history, DALL-E, Midjourney, GEN, etc., are just small ripples causing the Hokusai-esque wave of cultural transformation (I love this ambivalent ukiyo-e by the Japanese master, with the tsunami wave spreading its fingers for good or evil—it’s up to us whether we go under the water or surf the wave).
We have to zoom out and look at this development from a meta-perspective. This was the main task of my book, and I would say that even now—after RunwayML’s wonderful ML Labs Platform has unreasonably been sunsetted, after DALL-E 2 is over as well, and after we have so many new models, names, and approaches that we could never have dreamed of a few years ago—the task of the book is still legitimate. We surf the Hokusai wave.
AM: Surfing is a skill. How can we learn to master the art of living in a world where we move closer together with machines?
Merzmensch: We should engage with machines and their creations on the same level, at the same frequency. By this, I do not mean that we should anthropomorphize them or undertake a transhumanist shift towards machines. Instead, we should consider the works that machines produce as cultural heritage, aesthetic treasures, and narrative discoveries, rather than mere results of calculations and algorithms. Machines have a lot to tell us—about themselves, about us, and about our ever-changing society. Only by considering them on equal terms of creativity can we maintain balance. Otherwise, we will either succumb to anthropocentric narcissism or fetishize technology.
AM: What cultural transformations can we expect in the age of AI?
Merzmensch: I see three trends in global culture that we will be living with in the future. First, the "human purists" will represent what we know as "traditional art"—and by that, I mean even digital art—without any involvement of AI. This is intentional, as this group opposes AI and its role in human creative activities.
Next, we will see artists—also, but not exclusively, from the traditional field—using AI-driven tools and incorporating machines into their artistic workflows, discovering their transformative power in the context of their work.
Finally, as a third trend, we will see artists who embrace AI as an autonomous art entity in its own right. They will allow AI to create art without human involvement, taking on the roles of curator, moderator, and mediator between the machine and the human world, connecting them into a Gesamtkunstwerk. For me, this third trend is the most exciting because machines have incredible imagination. We just need to unleash it.
AM: You regularly give lectures on AI and conduct workshops. Have the reactions changed over time?
Merzmensch: My first workshop was in 2020 at the Museum für Kommunikation Frankfurt, and it was for children (from primary school to mid-level). The kids were fascinated by the possibilities of AI. Interestingly, at the beginning, I had them compare real photos with AI-generated images, and they struggled to distinguish between them. However, after the workshop, where they worked independently with Artbreeder, StyleGAN2 Colab notebooks, GPT-2, and GPT-3, they developed a sense of recognition and could easily detect an AI-generated image. They already had an intuition about generated media. We cannot stop teaching digital literacy.
Reactions are changing, that's for sure. In 2020, it was still an underground nerd topic; now that AI permeates all media, people are more aware. However, they either demonize it or evangelize it; often, the reactions are superficial—even superstitious. When I gave a talk on Generative AI at the Berliner Ensemble in February 2024, I noticed two groups of people in the audience: the AI doomers—mostly 30- to 40-year-olds—who were focused solely on the harmful, unethical, and antisocial visions of AI, and a very inquisitive segment of the audience: kids, students, and seniors. I suppose they did not harbor such existential fears; they approached the topic playfully, with a creative hunger to discover more. Especially in the art scene, I see more and more traditional artists becoming curious about new possibilities, embracing them, and discovering new ways of self-expression, even if they have already established their own during decades of artistic practice. It’s important to address the issues of generative AI and find new approaches, like Holly Herndon and Mat Dryhurst do by developing an IP-friendly training database for generative models (Spawning and Source.Plus).
AM: Compared to 2020, it’s even harder these days to determine whether an image was created with AI or not. What should one pay attention to in order to recognize the difference?
Merzmensch: I don't think that the difference will be so crucial in the future. On the contrary, we need to focus on the message. I am reminded of Vilém Flusser, who said that there will be no "true" or "false" images in the future, only probabilities and their approximations. I must thank the rise of artificial intelligence for leading us to the uncomfortable discovery that we can never fully trust images. Even a photograph is a very subjective matter—it doesn't represent the world; it represents a photographer's point of view. We have simply believed in visuals all the time for convenience. Now, we must say goodbye to this comfortable epoch of self-deception. We need to stop being superficial (it's hard, I know, and we probably won't even be able to ensure a profoundly thinking society). Instead of focusing on the image's origin, we should pay attention to the intention with which it is distributed and by whom. I wouldn't call it a "post-truth" age because there was never a "truth" age before, however much we might like to believe in it.
From an artistic point of view, instead of trusting images, we should play with the universes they orchestrate. Dogma is a constraint on creativity. Yet, I see many people becoming uncertain about the future because it represents an unfamiliar state of reality.
AM: What are some of the existential fears you just mentioned? How do people react when you address these fears with fact-based responses?
Merzmensch: It depends. Some people are already allergic to the term "AI." They cling to stereotypes from the 1990s, where AI is portrayed as being under the control of big corporations that are after our money, our data, and our lives. Meanwhile, we're on the verge of overcoming this. There are many open-source models that can be installed on your computer, freeing you completely from proprietary solutions. Yet, people believe in this ominous, Skynet-like AGI that will enslave and dominate humanity—a neo-fairy tale spread by Effective Altruists and even Turing Award winner Geoffrey Hinton, the Godfather of Deep Learning. I wonder why he propagates such narratives; he should know better. We shouldn't be afraid of AI—we should be afraid of humans misusing AI, a real danger from which the "AI is evil" myth distracts us.
AM: With the emergence of DALL-E, Stable Diffusion, and Midjourney, AI is now more accessible than ever. How did you work with AI in the past, and how do you use it today?
Merzmensch: I was thinking about Arthur C. Clarke’s Third Law: "Any sufficiently advanced technology is indistinguishable from magic." What is now the most obvious thing? Back in 2021, it was something impossible, something sublime—you could control the generation of images by typing in your text. Meanwhile, people take this pars pro toto and believe that prompting is everything in generative AI, which is certainly completely wrong. However, if we remember Vilém Flusser, who wrote in the 1980s about future societies that would be able to synthesize images—and that this would be the next cultural revolution—I can only agree with his vision.
When I was sitting in front of DALL-E 1, I understood that this was a new beginning. DALL-E 1, to be more precise, had a promising UI with social media elements—you could comment on other people’s generations and even collaborate. I suppose that with millions of users, it wouldn’t have been possible for OpenAI to moderate all the conversations, so these interactive features were missing in DALL-E 2.
AM: Your book features an image of the Mona Lisa with a wine glass. Before we started this interview, you mentioned that you could talk for hours about this topic. So, it’s quite possible that this will be my last question.
Merzmensch: There are no last questions; they lead to the next conversations. Indeed, this image of the Mona Lisa is my favorite from my DALL-E 1 phase. Back then, I began to use the prompt "Mona Lisa is drinking wine with da Vinci," which I still use in all prompt-driven models to gauge their creativity.
In the case of DALL-E 1, I didn't expect the result to be so striking: a refined hand holding a glass of wine. In the glass's reflection appeared the enigmatic smile of the Mona Lisa. But where was da Vinci? He was the one holding the glass. According to Lillian Schwartz's fascinating theory, the Mona Lisa might be a self-portrait of the maestro himself. DALL-E 1 had dreamed up a self-portrait of da Vinci, not simply as a trick of random algorithms but as a true exploration of meaning. DALL-E 1, built on GPT-3 and guided by CLIP, could find the link between form and meaning, connecting semantics to aesthetics—a semiotic virtuoso. This model understood the how and the what, embodying a vast collective of human knowledge.
Oh, I cannot stop talking about my experiences. That’s true; in the last eight years, I’ve witnessed and taken part in so many developments, discourses, and narratives that describing them alone would consume hours of speaking—time we could better use to creatively explore the ever-changing landscape of generative AI.
AM: And how has this prompt evolved over the past few years?
Merzmensch: This prompt remains the same—the models evolve. From abstract images of the Big Sleep Model to photorealistic visions with "Red Panda," it becomes increasingly figurative—an ambivalent development, as I don't appreciate perfectionism in visual dreams.
Yet, my favorites remain the DALL-E 1 image and the vision I created using xhairymutantx (a model by Holly Herndon and Matt Dryhurst).
AM: Speaking of narratives, before generative AI, the big topic for those interested in art and technology was generative art. We’ve seen a lot of colorful patterns over the last few years. In most cases, AI art is figurative. How would you explain the developments in generative art and generative AI over the past few years?
Merzmensch: Generative art has been abstract from the beginning, a play with concepts and ideas beyond semiotic explorations. With AI, developers and researchers are pursuing more realism, following their own perfectionism and the mainstream Zeitgeist. Meanwhile, it's quite tricky to create an abstract image using modern diffusion models—a challenge I am exploring with a broad audience in my current workshops at the Frankfurt City Library. However, we are still in the experimental phase, and I am sure that, with time, we will mature in our search for new aesthetics.
AM: What do you consider to be good AI art?
Merzmensch: For me, good AI art is a very human matter: it's where you forget about the digital provenance—the fact that it was generated with the help of AI. A good AI artwork should resonate within my heart and mind, inspire me to create stories, make me see our reality from new perspectives, and help me rediscover our everyday lives—much like the Dadaists did with their mixed media art.
Finally, I agree with the idea that Refik Anadol expressed when KUNSTFORUM asked him about who owns an AI artwork: "Ownership arises at the moment of perception by the viewer—the artwork becomes such when humans contemplate it—and that's the moment of the art for me." For me, too, a good AI artwork expands our imagination and opens doors to realms within us that we never knew were there.
AM: Is there a difference in how you define good art versus good AI art?
Merzmensch: Good art allows us to question our reality, provides new perspectives, and opens our eyes, minds, and hearts. It is more than decorative; good art can also be uncomfortable, provocative, and disruptive. However, I appreciate mind games more than mere provocation and épatage. Good art leaves a lasting impact on my soul, moving me days, weeks, and years after experiencing it—much like Dadaist and Surrealist art. It is even better if good art is combined with a compelling story, but that isn’t necessary, as our brains tell us stories even without any artist's specifications.
AM: What are your predictions for the future of AI and AI art? Are there reasons to be concerned?
Merzmensch: I don't see any reason to worry about art. Let me put it this way: art is something I worry about less than other aspects of our world.
AM: Thank you, Merzmensch!