The training model for Stable Diffusion has a lot of copyrighted images mixed together into an output which makes the plagiarism non-obvious, but let's reduce the set by 1 image. Shouldn't affect the output too much, right? Maybe some prompt will have a slightly different image.
Now let's reduce it by another image. Again, less options for what to display, fewer images to take pixels from, but still a lot of options, output may sound copyrightable.
Now lets do that N-1 times. What output will we get when the model was trained on a single image, let's say an image that is labeled 'dog'. If your prompt is "an image of a dog" you will get that image, the only image in the training set. When going from latent space to image space, taking pixels from that image in the output, despite it being done in convoluted ways, is that not an obvious copyright infringement? I think it is. There's a cloud of mumbo jumbo about latent space, but after the dust settles and it needs to generate pixels in the output image, Stable Diffusion has a step that is essentially copying pixels from the source image into the output. When there's only 1 image, it will reproduce large portions of that image, necessarily infringing on copyright.
So then adding back images one by one into the training set, each one being used as source for the pixels being copied, what makes that model OK? Just because the output is 50% image A and 50% image B, or 0.1% image A and 0.1% image B and 99.8% image C, doesn't suddenly make it OK.
Once there are millions of images, you end up with just tiny blobs of pixels being copied from many different images. That's still infringes on the copyright of all those images, because it's essentially a map-reduce process that maps pixels from copyrighted images and reduces them into a single image.
This viewpoint is about as coherent as "every image file is copyright infringing because every pixel in it exists somewhere in some other image somewhere".
Derivative works, when substantially changed, are not infringing. If I take an image of the Mona Lisa and rearrange all its pixels so it looks like a picture of a cat, that's not infringement.
If I sample lines and curves and colors and styles from several images and make something new, that's not infringement.
The actual problem with image models is that they can sometimes be coaxed into outputting images that are quite similar to an image they were trained on. That constitutes infringement.
You're not a computer program and your viewpoint is about as valid as "cars don't need speed limits because most humans can't run faster than 10mph and that speed is safe".
If 1 in 100 humans could run up to 100mph you bet your ass there'd be laws against doing so around other people; it's a safety concern. Hell, even now running in most indoor or crowded areas is, if not illegal, at least considered bad behavior and may get you reprimanded or thrown out.
Some people claim to have a photographic memory. Supposing this is true, is it illegal for these people to look at copyrighted material because they may reproduce it later from the copy in their head? Of course not, it's the actual act of producing that copy that isn't allowed.
Of course, we're not talking about a computer program that stores a copy of an image and reproduces it later (that's called an "image encoder"), we're talking about is a statistical software that identifies common patterns in images and associations between those patterns and human language descriptions of the images containing them. It doesn't store or make a copy of the images it learns from, and it should only be able to reproduce images or elements of images that are overrepresented in its training data. Like any other software tool, if someone manages to use it to make an unauthorized copy of someone else's work, whether it was present in the training data or otherwise, then the user has infringed the other person's copyright. The only real argument you could make is that distribution of a trained model constitutes distribution of a tool aimed at assisting users in unlawful copying, but IMO that would apply more easily to wget than StableDiffusion.
Copyright laws were made to encourage and promote the creation and practice of useful arts. Applying them to stop the creation and adoption of a tool that would make humans far more efficient in the creation of art is backwards.
Let's do the same hypothetical that you brought up and using other people's art, but instead are model just takes 1 single pixel from 1 million images.
Taking 1 single pixel from a million images, or the first letter from every book, and putting it into a new work is transformative fair use.
Transformative fair use is legal.
> Just because the output is 50% image A and 50% image B, or 0.1% image A and 0.1% image B and 99.8% image C, doesn't suddenly make it OK.
It quite literally does! Using .1% percent of an image is legal.
The amount of work that you take from someone else is one of the 4 factors of fair use.
Yes, the specific example you gave falls under what the courts literally use right now as one of the factors!
> Once there are millions of images, you end up with just tiny blobs of pixels being copied from many different images.
This is not how these neural nets work. They don't copy pixels from anywhere. They learn features.
The features represented internally are generally not easy to interpret to humans, but for sake of illustration, there could be an artificial neuron that fires when a subject should have blue eyes. Having a lot of blue eyes in the training data would help this neuron learn better when to fire (based on the values of other neurons, which may in turn represent other features). For example, it may learn to place more importance on an input that represents pale skin or Nordic origin.
It can learn concepts like cars have wheels, and wheels are round, etc. And then when you ask it to draw a car, it composes one from the concepts it learned. Some parts of the network will deal with the fine details that more directly influence pixels, but these aren't copying pixels from any image either. They're weighing a bunch of factors (eg is this pixel part of the iris and did the network decide to make a person with blue eyes?) and choosing pixel colors based on those factors.
Thank you for the explanation. Let me explain my position in similar terms.
I'm not replicating an image, I'm "using my brain to build a network of neurons that map electrical impulses from the optical nerve excited by wavelengths projected onto my retina in order to send other electrical signals to actuator tissues".
The complexity of the process is irrelevant imo. We can treat it as a black box and look at the inputs and outputs.
If the images in the database didn't exist, it wouldn't know what to draw, and those images are copyrighted.
Everyone's welcome to take a camera, run around the world and label every object for the neural net to learn, like a human does, but model authors didn't do that because using copyrighted images for free is much easier.
You're right that if you're replicating an existing copyright image, the process doesn't matter. Legally, if you lived in a cave your whole life and never saw any art and by amazing coincidence you just happened to paint and sell the exact same painting as some other artist, you'd be violating their copyright. Independent creation doesn't protect you.
On the other hand, under current copyright law, if Stable Diffusion generates an original image that doesn't look like a copy of any existing image, it's clear the new image doesn't violate any artist's copyright.
The debate is whether you can use copyright images/text to train an AI.
Stable Diffusion is of course trained on millions of photos of the real world, in addition to images made by artists. Of course, humans artists also see and digest both the real world and images by other artists and both influence their output. That's why you get trends like impressionism.
You are describing transformative use, which is permitted. Otherwise I could create a picture with every possible RGB pixel and then claim all other artists are infringing on 0.1% of my work.
It is impossible to 1:1 replicate the input as an output because the images are not stored. It isn't a database. It's basically aggregating summaries/abstractions/generalizations of a bunch of tags.
I personally feel the mere fact that it was fair use of mostly copyrighted images it's fairly self evident that anything produced from it should not be copyrightable, UNLESS the origin of the art used to train the model is 100% owned by the "artist". This could either via licensing for that purpose or they own the copyright to, that right not extendable to corporations, as a corporation can't be an artist. It doesn't matter how complex the prompt or series of prompts are, the key here is that the "artist" either owns the training material or licensed it through the proper chain of licensors.
I'm baffled at how someone in their right mind still argues as if using any other tool is free of copyright infringement of some kind.
It also boggles my mind how, in our line of work (which is often artistic in its own right), a lot of people make preconceptions on how art is made. Often reducing it to nothing but transformative generation. Such takes are deeply narcissistic, and downright wrong. At this point I'm led to believe they're AI generated.
Now let's reduce it by another image. Again, less options for what to display, fewer images to take pixels from, but still a lot of options, output may sound copyrightable.
Now lets do that N-1 times. What output will we get when the model was trained on a single image, let's say an image that is labeled 'dog'. If your prompt is "an image of a dog" you will get that image, the only image in the training set. When going from latent space to image space, taking pixels from that image in the output, despite it being done in convoluted ways, is that not an obvious copyright infringement? I think it is. There's a cloud of mumbo jumbo about latent space, but after the dust settles and it needs to generate pixels in the output image, Stable Diffusion has a step that is essentially copying pixels from the source image into the output. When there's only 1 image, it will reproduce large portions of that image, necessarily infringing on copyright.
So then adding back images one by one into the training set, each one being used as source for the pixels being copied, what makes that model OK? Just because the output is 50% image A and 50% image B, or 0.1% image A and 0.1% image B and 99.8% image C, doesn't suddenly make it OK.
Once there are millions of images, you end up with just tiny blobs of pixels being copied from many different images. That's still infringes on the copyright of all those images, because it's essentially a map-reduce process that maps pixels from copyrighted images and reduces them into a single image.