I think there are actually good legal arguments why LLMs should be allowed to train on copyrighted material, and why LLM output is not copyrightable. (So it’s not just that the courts have been bought.)
Furthermore, that even though the law can be changed to make it illegal to train on copyrighted material, (and hence make it possible to copyright the output of LLMs,) this would be extremely bad for us. (And good for the companies that control the large AI models.)
The legal argument in short is that mathematically processing copyrighted information is allowed. For example, scraping the net to create a search-engine index is allowed. Further, publishing such an index would also be allowed. There is no legal basis that I’m aware of that can distinguish this process from the process of training and publishing an LLM.
The social argument is that
- Granting copy-right holders permission to prohibit LLM training doesn’t get us workers anything long term because the middle-men business owners will just force us to “sell” them that right.
- Granting copy-right holders permission to prohibit LLM training will torpedo future projects to create lightweight useful free-software LLMs. I think those light LLMs hold a lot of potential to do good in our society, if in no other way than by reducing the power of OpenAi and Antropic.
- As long as it is impossible to get copy-right on something that is almost entirely produced by AI, investors will have to continue paying workers.
- If it becomes possible to get copy-right on LLM output, soon literally everything you could ever create will be copy-righted before you create it, because if there is one thing LLMs are good at it is flooding the zone.
As for those small scripts that are output by LLMs quite literally[1], I would argue that those are below the boundary of what should be copyrightable. Copyright has gone too far. By the letter of the law, all of us are guilty of breaking copyright law daily. But including a function of 30 lines in your software that someone else wrote first shouldn’t get you in trouble with copyright law, any more than including a sequence of 10 notes in a song should.
If you want to read the words of someone who is able to phrase things more elegantly and more completely than me, I do rather like this blog: Pluralistic: Supreme Court saves artists from AI (03 Mar 2026) – Pluralistic: Daily links from Cory Doctorow
Not sure if this is the right place to be discussing LLM copyright theory, but here we are ^^
I did read the OP ↩︎