This is an interesting comment which avoids the issue: when a user uses an LLM to violate copyright, who is liable, and how would you justify your answer?
Not OP, but I would say the answer is the same as it would be if you substitute the LLM with a live human person who has memorized a section of a book and can recall it perfectly when asked.
It depends on where the money changes hands, IMO (which is basically what I think youre getting at). If you pay someone to perfectly recite a copywrited work (as you pay ChatGPT to do), then it would definitely be a violation.
The situation is similar with image generation. An artist can draw a picture of Mickey Mouse without any issue. But if you pay an artist to draw you the same picture, that would also be a violation.
With generative tools, the users are not themselves writers or artists using tools - they are effectively commissioners, commissioning custom artwork from an LLM and paying the operator.
If someone built a machine that you put a quartner in, cranked a handle, and then printed out pictures of the Disney character you choose, then Disney is right in demanding them to stop (or more likely, force a license deal). Whatever technology drives the machine, whether an AI model or an image database or a mechanical turk, is largely immaterial.
> An artist can draw a picture of Mickey Mouse without any issue. But if you pay an artist to draw you the same picture, that would also be a violation.
I don't believe that's correct. The issue is not money changing hands but rather the reproduction itself. Even if I give it away for free I'm still violating IP law.
There's also a fundamental issue with your argument - LLMs aren't recognized as having legal agency. If I pay an artist to violate IP law then the artist, being a human, is presumably at fault in addition to myself. Same for a company (owned by people).
But tools are different. If I vandalize someone's car with a hammer the hardware store isn't at fault for selling it to me. I'm at fault for how I chose to use the tool that I purchased (or rented access to in the case of a hosted LLM).
> If someone built a machine that you put a quartner in
This is a flawed example because the machine was designed with the specific intention of reproducing a copyrighted work. That is different from a general purpose tool which can potentially be misused by the wielder.
but without money changing hands, it becomes a whole lot less interesting. at the end of it, the boy who lived isn't a story about broke washed up nobody with an uncouth uncle, standing in line for the soup kitchen; if money goes away, if it turns out this whole capitalism was for a tada, what then?
The company who trained the LLM. They're the one's who used the copyrighted material in their training set. Claiming they were unaware is not an excuse.
It's an interesting conundrum. If I take an extremely large panoramic photograph and then fail to censor out small copyrighted sections of it, am I violating copyright law?
It's not a perfect analogy by any means but it does serve to illustrate the difference in intent between distributing a particular work versus creating something that happens to incorporate copyrighted material verbatim but doesn't have any inherent need to or purpose in doing so.
> If I take an extremely large panoramic photograph and then fail to censor out small copyrighted sections of it, am I violating copyright law?
It depends. Is it just under copyright, or is the featured location trademarked too? Is the photograph for commercial purposes? Is the featured location generally accepted as being part of a cityscape / landscape?