In 1993, a company called MAI Systems sued Peak Computer, a computer repair firm, for copyright infringement. Peak’s technicians had turned on MAI clients’ computers to diagnose them, which loaded MAI’s proprietary operating system into RAM. MAI argued that loading its software into memory—even briefly, even just to make sure the computer was working, even though there was no way to start the computer without loading the software—constituted an unauthorized copy. Because the Peak technician was not licensed to use that software, MAI said, Peak had committed copyright infringement.
They won.
To be fair to MAI, Peak was being more than a little shady. Peak possessed unlicensed copies of MAI’s software, and even loaned out computers containing this software to its own customers when their machines needed repair.
MAI, for their part, was being blatantly anti-competitive. MAI’s customers were dropping their maintenance contracts and having Peak repair their computers at lower cost. The copyright suit was a transparent attempt by MAI to protect their revenue stream.
Regardless, the ruling became a foundational principle of software copyright law: the RAM copy doctrine. Even a transient, functional copy of a program in a computer’s working memory is legally a copy under the Copyright Act.1
How temporary is too temporary or too minimal to count? In the normal course of its operation, any computer constantly makes copies of small chunks of information, from network to buffer to disk and back, and some of it is bound to be copyrighted.
Is your computer committing thousands of acts of copyright infringement just by browsing the web? (This article is copyrighted!) Technically yes, but in practice no, because it’s necessary for the web to work at all—publishing something on the web implies permission for the required copying.
Is training copying?
When an LLM is trained on a book, what actually happens? The text is read from storage, tokenized, fed through the network in batches, and used to nudge billions of numerical weights in tiny increments. But the book’s words aren’t stored anywhere in the final model. What’s stored is a vast, diffuse statistical residue of having processed those words—spread across parameters in a way that is, at present, essentially unreadable. You can’t open a trained model and find a chapter of Stephen King inside it, any more than you can find one in the brain of a human King fan.2
Under the RAM doctrine, technically LLM training is copying. At the moment of training, the text passes through memory. But that feels like a formalism that misses the point entirely. The copy doesn’t persist. The weights that result aren’t a reproduction of the work in any meaningful sense. They’re more like the impression a book leaves on the mind of someone who read it.
Which raises an interesting question…
Is reading copying?
You read a novel. It changes you: your vocabulary, your sense of narrative structure, your feel for how dialogue sounds. If you later write a novel yourself, some of that influence will be in there, invisibly, unattributably. You didn’t copy the book—you learned from it. It influenced the weights in your language model, so to speak. And nobody has ever suggested you owe the author any payment for that. In fact, most authors feel flattered to have influenced another writer’s work in exactly this way.
This analogy isn’t just rhetorical flourish. Courts have actually started using it. In a landmark June 2025 ruling in Anthropic v. Concord Music Group, Judge William Alsup held that using copyrighted materials to train LLMs was fair use—describing the technology as “among the most transformative many of us will see in our lifetimes” and explicitly comparing model training to a student reading to become a better writer.
Two days later, in Kadrey v. Meta Platforms, Judge Vince Chhabria3 reached the same conclusion, finding that Meta’s use of books to train its Llama models was “highly transformative” because the purpose—extracting statistical patterns to power a text generator—was fundamentally different from the purpose of the original work, which was to be read. Two similar decisions, in the same week, both centered on the legal doctrine of fair use.
Fair use and its limits
Fair use is a defense to a claim of copyright infringement that allows copyrighted works to be copied in specific circumstances. It does not deny that a work has been copied, but rather asserts that the copying was for a purpose generally considered fair that broadly benefited society. News reporting, criticism, and education are commonly used as examples of fair use.
Once a particular kind of copying is declared fair use by a court, it becomes part of case law, and most substantially similar kinds of copying become implicitly legal.
The four factors upon which fair use defenses to copyright claims are weighed by courts are:
- The purpose and character of the use. In the case of LLM training, this factor is strongly in favor of fair use due to the transformative nature of the copying.
- The nature of the original work. This factor is neutral for AI model training.
- The amount copied. This is strongly against LLM fair use claims, since entire works were copied.
- Impact on the market for the original work. One book is not a direct competitor for another, and a collection of model training weights is even less competition. This is a factor in favor of fair use. Judge Alsup even remarked that copyright law was never meant to “protect authors against competition.”
On the whole, that looks good for LLM developers. What sank (part of) Anthropic’s case was provenance. Anthropic had used pirated copies of books, obtained from “shadow libraries,” as part of its training corpus. Judge Alsup held that however transformative the use, piracy is “inherently, irredeemably infringing,” and that the fair use defense evaporates entirely when the underlying acquisition was itself unlawful.
That’s a reasonable line in the sand. Some companies have already addressed this issue; Adobe’s Firefly generative image model was trained mainly on a stock image library that the company already owns, along with known public-domain and permissively-licensed images. This protects both Adobe and its customers, who might have worried about the legal exposure of using Firefly if the legal wrangling doesn’t shake out favorably.
The open question
Humans are strictly limited in how much of a book they can reproduce from memory. You might recall the plot, character names, a handful of memorable phrases, maybe a passage that moved you or the opening line (“It was the best of times…”). You cannot, barring the mythical case of photographic memory, recall the full text.
LLMs, it turns out, sometimes can—at least for texts that appeared frequently enough in their training data. Ask certain models to complete the opening of A Tale of Two Cities and they’ll do it, verbatim, for pages and pages. In a rare apt case of AI anthropomorphism, this is called memorization.
That’s where the training-as-learning analogy breaks down. It’s almost certainly where future litigation will focus, and where courts’ reasoning will get harder. Transformativeness is a strong defense when the output is a collection of model weights. It becomes much weaker when those weights allow that very model to reproduce copyrighted works.
Where we find ourselves
The legal situation, as of mid-2026, is roughly this: training LLMs on copyrighted works is likely fair use, provided the works were acquired lawfully and the model doesn’t reproduce them verbatim on demand. That’s a narrower safe harbor than the AI industry would like, and broader than authors and publishers want. More cases are wending through the courts as you read this, and the related case law will keep evolving.
But the deeper question—what does it mean to copy?—is one that copyright law was never intended to answer. The law assumes that we know copying when we see it and are just arguing about when you can and can’t do it. The RAM copy doctrine was a legal hack to handle software licensing disputes. The fair use doctrine was designed for human creators sampling and transforming each other’s work. Neither maps cleanly onto a process where a text is consumed by a mathematical function, dissolved into weights, and reconstituted as something new.
And beyond copyright, there remains the matter of moral rights: whether creators have inherent non-economic rights over their creations that might prevent their works from being used to train AI at all. Copyright law has nothing to say about that.
We’re just getting started.
Leave a Reply