Are AI images art? We're asking the wrong question
The better question is: can AI users develop their own styles?
There is something unique, we tend to think, about the human relation to art. Art, as vague of a concept as it may be, seems to be fundamentally tied to the decidedly human capacity for subjective expression — a waterfall is not art, but a painting of a waterfall is art because it expresses the human artist’s interpretation of the waterfall. Consequently, art, by popular conception, is fundamentally an interpretive act, not just a representational one.
This is where we start to see the problem as to whether AI images can be considered art. If art is wholly dependent on the human capacity to interpret and express an interpretation, then can a machine-learning model that lacks a mind and the (supposedly) inherently human capacity for interpretation ever produce art? Or can it only produce what we might call “mere representations” — depictions of events that bear no expressive or interpretive power, like a diagram in an instruction manual?
To take a crack at this question, it’s important to clarify what we mean by “interpretation”. As Ted Chiang put it in his recent New Yorker piece on AI, "art is something that results from making a lot of choices”. Although Chiang readily admits this is a generalization, if we substitute “interpretation” in place of “art” in that sentence, we arrive at a solid, common-sense conception of what interpretation is: the interpretive capacity is the ability to make choices about an object in the world that speak to something about how the artist approaches and views that object.
For example, when a cello player interprets a Bach suite, she approaches an object in the world (the piece of music) and makes choices regarding how to play it. On the broadest level, she may decide to play at an unusual tempo or to play forte at a section that normally is played piano. But these are only examples of the highest-level choices — each second of the performance includes countless micro-adjustments of the bow, tempo, dynamics, vibrato, and intonation that are ultimately “choices”, conscious or otherwise. Even those choices that are unconscious, such as how hard to press the bow, are the result of choices made during practice sessions that have now become habit. In sum, all these thousands of choices make up a unique performance that we consider an interpretation of Bach’s music — in other words, a piece of performance art.
But while it’s obvious that the ability to make choices is vital to art, what’s less obvious is how many choices are truly necessary for something to be considered interpretive. Simply put: there can always be more or fewer choices, and it’s not clear where we should set the bar for art. In fact, it’s not even clear whether it makes sense whether this should be a requirement at all: setting a threshold in terms of pure number of choices would lead us to some pretty bizarre conclusions, indicating that trying to classify art by this metric is not moving in the right direction.
For example, leaning on a classic joke among musicians about whether using samples counts as “cheating”, we could say that the since the cellist didn’t build her cello herself, she is foregoing all the important choices a luthier makes, like choosing which woods and glues to use, and therefore isn’t truly making art when she plays Bach’s music. After all, we consider Stradivarius violins an apex of musical achievement in themselves, and we would very much admire a cellist who is so selective about her sonic world that she refuses to use a cello she didn’t handcraft herself. If we can accept that building your own instruments can augment one’s art, then we should be able to accept that doing so can be a part of the art, and there is therefore no immediate reason not to accept that building one’s own instruments is necessary for a musical performance or recording to be considered art. In other words, if there were a musician who built all their own instruments, and it was generally accepted that this was an essential part of their music, then we should accept that instrument building can be an essential part of some music, and therefore it’s possible that instrument building is an essential part of all music. And yet, saying that Martha Argerich is not an artist because she isn’t also a luthier seems fundamentally wrong.
As it turns out, Ted Chiang’s view that art is dependent on choices and AI can’t be used to make art because it removes too many choices from the “artist”, is not a new idea — Heidegger said as much about the typewriter. In his 1942-43 lecture series on Parmenides, he stated that “the typewriter tears writing from the essential realm of the hand, i.e., the realm of the word…Mechanical writing deprives the hand of its rank in the realm of the written word and degrades the word to a means of communication. In addition, mechanical writing provides this "advantage," that it conceals the handwriting and thereby the character. The typewriter makes everyone look the same…”.
Interpreted within the context of Heidegger’s broader philosophy, his charge is essentially the same as Chiang’s: The hand is a distinctively human attribute, and when we write by hand, we make countless micro-choices that lead to our own unique handwriting. These choices are so many that no two people share the same handwriting. Consequently, it is these choices that lead to authenticity or authentic expression — very much what we’re looking for in art. By using the typewriter, we rid ourselves of the opportunity to make those choices, thus destroying our authenticity. Johannes Trithemius made a similar claim in his 1492 anti-printing-press manuscript, “In Praise of Scribes” (De Laude Scriptorum), saying that “printed books will never be the equivalent of handwritten codices…The simple reason is that copying by hand involves more diligence and industry.”
But we can see from today’s vantage point that a book written with a typewriter or printed on the printing press is no less art than one written by hand. Should that be the case, then every shopping list would be a more authentic expression of the human condition than War and Peace (perhaps there is some truth to this). The issue at hand is not that typewriters and printing presses do not require enough handiwork to qualify as art, but rather that handiwork is one of the most basic ways of ensuring there are enough choices being made to allow for interpretation to arise. Handiwork is an inclusive measure, but not an exclusive measure: if something requires a lot of handiwork, the scales tip in the favor of it being art, but if it doesn’t need handiwork, we can’t exclude it from being art.
If we take handiwork as a direct proxy for the number of choices made, we can clearly see that the bare number of choices is not a reasonable criterion for art. We would here mistake labor at face value with that which it represents — the opportunity for interpretation. Without at least some labor, there is no opportunity for interpretation — even entering a prompt into a generative AI model represents a small degree of labor, and thus serves as an opportunity for interpretation to arise. A handmade painting, with its thousands of individual brush strokes, obviously requires more labor, and thus gives more opportunity for interpretation to arise. But the amount of opportunity does not in itself say anything about whether interpretation arises or not, it only speaks to the chance that it will arise. There is very little opportunity for someone to win the lottery when buying a lotto card, but some people do win the lottery. Clearly, we can’t say that because it is unlikely that someone will win the lottery, past lottery winners didn’t actually win. What we can reasonably say is that it is less likely that AI images that only require a short prompt are art, but we can’t rule out the possibility entirely.
Tools and Mediums
If we can’t determine whether something is interpretive on the basis of the labor (which serves as a proxy for the number of choices) required to make it, then how can we distinguish between the supposedly rare cases in which an AI produces art and those in which it doesn’t?
Let’s return to the typewriter and the printing press for a moment. These are both examples of tools: the typewriter is a tool that a writer uses to improve their writing speed and make their work standardized and easily readable. We can divide tools that are used to create art into two categories: those that are interpretive and those which are corrective. What is important to notice here is that any given tool is not inherently in one category or the other — the categorization depends on how the tool relates to the artistic medium. For a calligrapher, the typewriter is an interpretive tool — it does the totality of the work of interpretation for them, leaving no room for their own calligraphic interpretation. For a novelist, a typewriter is merely corrective — it “interprets” what the writer might write as “a” as “a”, but in most cases, a difference in font won’t fundamentally alter the writer’s intention.
But why not? The answer is that the novelist’s medium of interpretation is not the same as the novelist’s medium of communication. The novelist’s job is to interpret what we could broadly call the linguistic-narrative space — an abstract conceptual field of language and story. This is their medium of interpretation. The novelist then communicates their interpretation via text, but text itself is not the thing being interpreted — the text is simply the novelist’s medium of communication. That’s why a novel can be transmitted via audiobook without necessarily being seen as a destruction of the author’s interpretation (the voice acting will add a new sonic interpretation, but it will leave the underlying narrative interpretation intact). Arguably, even language itself is not the novelist’s medium, as novels can be translated. Rather, they work within some broad linguistic space, hence why a film, which also operates in narrative space, is not the same as a book.
A calligrapher, on the other hand, does interpret text — their medium of interpretation is text itself, and their medium of communication is also text. For a calligrapher, a typewriter is not just corrective, but it is interpretive: It robs the calligrapher of all opportunity to interpret their medium of interpretation (let’s set aside “statement pieces” for now, where a calligrapher might say that their typewriter is their calligraphy as a sort of conceptual statement).
We can see from these examples that the medium of communication and the medium of interpretation can either be the same or different, but that we can’t simply assume that the medium of communication is the medium of interpretation. We can see further that some mediums of interpretation are inherently tied to certain mediums of communication, while others are not. Calligraphy is inherently tied to the written word, but a novelist could theoretically forego the written word entirely and only focus on audiobooks without necessarily being robbed of the title of novelist.
Out of the three major fields of art (visual, literary, and performance), we are used to the latter two being separable from any particular instantiation. A composer can write a piece of music, and it can be played by various ensembles without taking away from the composer’s work — the work exists prior to any specific instantiation in a medium of communication. We can even go so far as to say that a composer can have composed a piece before even writing it down or having played it (this is often the case with songwriters — Paul McCartney wrote the music to “Yesterday” before ever sitting down at the piano and instantiating it for the first time). Similarly, as we saw with the novelist, their work also does not depend on any particular instantiation. We can say the same for choreography and various other performance arts. But within performance art, we can see that there are some aspects that are tied to instantiation and others that aren’t — the art of playing the cello is tied to a specific instantiation in the form a cello performance, but a composition is not.
We can here abstract away a general concept of composition, which is a form of art that is separate from an instantiation. Composition should not be seen as simply referring to musical composition, but any medium of interpretation that doesn’t immediately require instantiation, such as a novel (because of various possible translation, font, audio instantiations), a screenplay, or a dance routine.
We may notice that while literary and performance arts readily accept the composition-instantiation divide, visual arts do not. While we readily speak about a painting’s composition, there are no equivalents to composers or novelists in the world of visual art — no one is renowned for creating “visual compositions” that painters then instantiate. The closest we come to this is the director, but even their work can’t be separated from an instantiation — Hitchcock’s artistic choices rely on the specific instantiations being presented to him, and another director can’t remake a Hitchcock film and then call it a Hitchcock film in the same way that two conductors could play a Sibelius piece and still call them Sibelius pieces. The reason for this special quality of the visual arts is unclear, and I won’t attempt to tease out the reasons, but perhaps it is because sight is so primary for humans that visual art becomes irreducible.
If generative AI imagery is able to become a medium of interpretation in its own right, it’s possible that it could fill in this gap of “visual composition” — the creation of prompts or workflows that are separate from any particular instantiation, but can be instantiated in many different ways and still be recognizable as a singular work. What this might actually look like and whether it’s possible in practice is unclear, but it is not impossible in principle.
But of course, this theoretical new art form isn’t what most people are talking about when debating whether AI art is really art. What we need to do is understand what type of tool AI is: is it an interpretive or corrective tool? And in which contexts is it one or the other?
But let’s make this question even clearer within the framework we have established: assuming that the medium of interpretation of AI art is simply standard visual art and the medium of communication is also standard visual art, do the tools that we use to make AI art, i.e. generative models, serve as interpretive or corrective tools?
The immediate response is that the question is absurd — it’s like asking if a painter who doesn’t paint is a painter. But closer examination reveals a more complicated picture. Suppose that a painter develops severe Parkinson’s disease and is no longer able to wield a paintbrush precisely enough to paint. To compensate, they give very specific directions to a friend of theirs who has absolutely no artistic skill of their own. The disabled painter gives instructions like “soft brush stroke with blue — no, higher, no, no, a little lower. Yes, there, perfect. Now, 1 inch diagonally upwards at a 45 degree angle.” Through a series of meticulous commands like this, the disabled painter constructs the image. Can we really say that they didn’t paint this painting simply by virtue of not physically lifting a paintbrush? Saying no seems almost cruel — we would view their friend as a corrective tool, not an interpretive one, as there is still quite a bit of room for the disabled painter’s interpretation to shine through. Technically, the painter was working with language, but they communicated via paint, and in the same way that we view someone who communicates via language as a writer and not a typist even though they work with typing, it makes sense that we would view this disabled artist as a painter and not an “order maker”.
Can we extend this to AI? What if the same person were to command a generative AI with the same level of detail instead? Should there be any difference simply because the painter used a machine instead of a person, even though the person was being used as if they were nothing more than a machine? In this case, there seem to be no grounds for a distinction.
One might argue that if generative AI required such careful and precise instruction, it would count as art, but that this is just a constructed thought experiment that isn’t based in how generative AI works. However, as we have seen, labor is not in and of itself determinative of interpretive value, even though it serves as a probability indicator. We could theoretically imagine a four-word prompt that is so incredibly creative that it creates an image so unique that it’s undeniably a work of art. After all, there is very little labor required for Hemingway’s famous six-word story "For sale: baby shoes, never worn”, but it is considered a work of art nonetheless. Chuck Person’s Ecco Jams simply slowed down snippets of other songs and is highly regarded as a work of art for its creative use of sampling. It is not out of the realm of possibility that someone could prompt AI with something similarly creative to create a great piece of art.
Flipping the Script
What we need is another metric. It may be more useful to look at whether an AI user is an artist than whether an AI generated image is art. If we can confirm that an AI user is an artist in their medium of interpretation of choice (AI), then we can confirm that their work in that medium is also art.
When we consider what makes an artist “great”, we often say that one important quality is having a recognizable voice or style. Styles are essentially the consistent patterns of choices that an artist makes across their body of work, which amount to a type of interpretation of the artist’s medium of interpretation. If an artist has a style, and style depends on choices, then having a recognizable style implies that whatever they are doing affords them enough choices to develop said style. I believe that Ted Chiang has it backwards — it’s not the choices themselves that are important but the style that they enable. The question we really need to ask is: does generative AI give enough creative wiggle room for the development of a style? If yes, then we can be assured that generative AI can count as art, and if no, we can write off generative AI images as likely not being art.
One of the fundamental misunderstandings here is that most people believe that AI art is nothing more than typing a text prompt into website and getting an image back. But this is hardly the case — in fact, most people who simply enter prompts likely won’t refer to themselves as artists. This simple prompt2img generation process is the most widely available, but it doesn’t show the full picture. The people who are more likely to consider themselves AI artists are engaging in an entirely different workflow: they combine various models, train their own models, create unique workflows, and fine-tune parameters to get precisely the results they want. For added control, some may use “in-painting”, wherein they specify a section of an image using the cursor and then generate objects only within the selected area of the canvas, which gives them nearly complete control over the composition. Some may even draw out sketches using img2img and then have the AI fill them out to varying degrees, or combine this with in-painting. With img2img generation of this sort, they would, in fact, have complete compositional control.
A more typical workflow used by someone who might call themselves an AI artist usually looks more like what you’ll see in the images in this ComfyUI v0.3.0 release post. Each box in those images represents an AI module, and the wires are the ways those different modules are linked together. Here we can see that there are far more choices than we would have thought: the user needs to write a prompt, select various modules, decide how to connect them, configure their parameters, and then generate dozens of images until they find one that they like. If we add img2img or in-painting to the mix, the number of choices that need to be made expands rapidly. Theoretically, in-painting could provide a situation comparable to that of the disabled painter, as a user can decide to in-paint areas as small as a single brush stroke.
While this clearly isn’t the traditional way of making art, it doesn’t necessarily detract from the artistic value of the creations. We can see this sort of artistry in modern guitar playing. Many famous and respected guitarists are known for their tone, which is largely a result of their equipment choices. A fundamental part of Jimi Hendrix’s style, for example, is simply playing a Fender Strat with the neck pickup selected through a Fender amp on the clean channel or a Marshall amp on the overdrive channel — if a guitarist makes these choices, they will frequently be recognized as playing something Hendrix-y. Deciding whether to use one amplifier or another is a creative choice in itself, and the decision to use one AI model over another is similar. However, it should be noted that amplifiers do standardize the incoming guitar signal by raising the volume, adding compression (which reduces dynamic range), and sculpting the sound in various other ways. In that sense, amplifiers are corrective tools in that they “correct” these factors, but these variations are not essential to the art, and so the amplifiers can still be used creatively for the purpose of interpretation.
Similarly, mixers are widely considered artists, but they don’t write the music themselves. Instead, they carefully choose pieces of audio processing equipment based on the qualities they bring to the audio signal, and then configure them to achieve a desired effect. Mixers are renowned for their discerning ears that pick up on sonic subtleties that most listeners aren’t attuned to — mixers will often speak of the “color” of one compressor compared to another, or the way a specific EQ distorts the signal. It is not hard to imagine that, over time, AI users will pick up on similar subtleties between models, employing them when needed in creative ways. In that case, the starting prompt may become almost irrelevant, as the true art may be found in the way it is processed. Interestingly, the mixer is an artist themselves, but they are also being wielded as a corrective tool by the performer. However, we don’t consider the performer as not truly being an artist because they offload the mixing to a third party that “corrects” the recording — we just accept that the nitty-gritty of the mixing is beyond the domain of what the artist deems essential to their art.
Whether these more complex AI workflows allow for enough freedom of choice for artists to develop their own style remains to be seen, but there is nothing that is in principle contradictory to AI art truly being art. Within the above workflow, the generative AI itself seems to be a corrective tool being wielded by an artist to further their interpretation. The artist has a vision which they “correct” using AI so that it is understandable to their audience, just like a typewriter “corrects” poor handwriting so that it is legible. If AI images are truly art, artists will be able to develop unique and recognizable styles despite these corrections. Or, artists will use these corrective tools as part of their interpretation, in the same way that T-Pain uses autotune as a creative device.
Of course, this doesn’t mean that if some AI creations are art, then all AI creations are art. It’s clear that even if AI is corrective in some instances, in other instances it will be wholly interpretive. An AI that writes an email for you is not aiming at art, and most AI image generators will likely turn out to be fully interpretive tools as well. But like the typewriter, the classification of AI as a tool will depend on the context it is deployed in.
It is important to be mindful of the fact that, in Heidegerrian terms, we are “thrown” into a world that already has a wide variety of art mediums. On the basis of that thrownness, we overlook the fact that even though these art forms were simply “given” or “passed down” to us, they weren’t originally simply there, but were discovered. Even painting, one of the oldest mediums, needed pioneers to uncover its possibilities. At first, there were just cave walls and pigmented rocks, but the pioneers of art were able to see the cave walls as canvasses and the pigmented rocks as paints. From that, we brought cave paintings into existence, and painting entered the human consciousness as something simply given to us. Throughout human history, the masters have continued to discover more about their mediums via the development of more advanced techniques and materials. We see that tradition continue with composers like John Cage who came up with the controversial piece “4’33”, asking us to rethink what a piece of music can be, Jackson Pollock, whose work beckons us to reconsider the role of the painter in the painting. More recently, Maurizio Cattelan’s “Comedian”, which is nothing more than a banana duct taped to a wall, sold for $6.24 million, despite the fact that the work itself needs to be “remade” each time the banana rots, meaning that the purchaser paid millions of dollars to “purchase” an intangible idea. Debate about the validity of “Comedian” aside, we can see the process of medium discovery at work here: the artist is not just interpreting a set medium, but discovering what that medium of interpretation is.
From this, we can see that it is partly the job of the artist to take materials and view them in different and creative ways — they are not just interpreting an object or idea within a medium, but they are interpreting the medium itself. To say that generative AI is incapable of being art is similar to saying that a cave wall and a pigmented rock are incapable of being art — at one point, the average caveman would have scoffed at the possibility, until someone picked up a rock and showed the world what was possible. Generative AI is simply a material, complex and abstract as it may be, and it is for today’s artists to discover and develop this technology that bears the promise of becoming a new medium, just as the cavemen did with caves and rocks thousands of years ago.