Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

I'm not confident in this stance - sharing it to have a conversation. Hopefully some folks can help me think through this!

The value of copyleft licenses, for me, was that we were fighting back against the notion of copyright. That you couldn't sell me a product that I wasn't allowed to modify and share my modifications back with others. The right to modify and redistribute transitively though the software license gave a "virality" to software freedom.

If training a NN against a GPL licensed code "launders" away the copyleft license, isn't that a good thing for software freedom? If you can launder away a copyleft license, why couldn't you launder away a proprietary license? If training a NN is fair use, couldn't we bring proprietary software into the commons using this?

It seems like the end goal of copyleft was to fight back against copyright, not to have copyleft. Tools like copilot seem to be an exceptionally powerful tool (perhaps more powerful than the GPL) for liberating software.

What am I missing?



Nobody is laundering away proprietary livenses, because that code is not open source and not in public github repos. And OSS capabilities are now present in copilot, which is neither free nor open. Furthermore these contributions are making their way into proprietary code and the OSS licensing becomes even further watered down. This is the epitome of what copyleft is against!


Indeed, the ability to 'launder away' proprietary licenses when source is available means that companies in the future (that would otherwise provide source under a non-permissive license) will shift in favour of not providing source code at all.


Code published on Github is not necessarily open source. There is a lot of code there that has no particular license attached, which means that all rights are reserved except for those covered in the Github TOS, which I believe just covers viewing the code on Github.


Copilot includes all public repos on GitBub, so this includes source-available and Proprietary code too.


I'm not sure this is true. Proprietary source code gets leaked and that can be used to train a NN. I find it likely that Copilot was trained against at least one non-OSS code base hosted on GitHub.

Second, if copyright is being laundered away we can get increasingly clever with how we liberate proprietary software. Today, decompiling and reverse engineering is a labor intensive process. That's the whole point of "open source" - that working in source is easier than working in bytecode. Given the hockey-stick of innovation happening in AI right now, I'd be surprised if we don't see AI assisted disassembly happening in the next decade. If you can go from bytecode to source code, that unlocks a lot. Even more so if you can go from bytecode to source code and feed that into a NN to liberate the code from its original license.


I follow you explanation but not your end statement.

What I think GP is getting at in my understanding is that all this OSS/licensing stuff was a cautious attempt to assert a radical idea into an atmosphere of extrem secrecy: That information wants to be free.

Now we have a fat cooperation making a public statement of putting the value of advancing humanity over the value of honoring weird old Victorian ideas of "intellectual property" - which is what we are always tried to do, no?

Not that there is nothing to criticize, but I think that's a good thing on the whole.


> [OSS] was a cautious attempt to assert a radical idea into an atmosphere of extrem secrecy: That information wants to be free.

Information may want to be free, but users of free information often want to enrich their private endeavors by shackling the information that was given to them freely.

The (A|L)GPL acknowledges the fact that some people and corporations like to use free-and-gratis work in their products and not reciprocate the courtesy shown to them by the authors of that work. (I choose the (L)GPL whenever I can so that folks who derive from my work are either required to either make it available as I have, or pay me enough so that I don't mind them shackling my work.)

The BSD license acknowledges the fact that some people and many corporations like to use gratis work in their closed-source products and never even do so much as bother to credit the authors of work that they used.

For as long as powerful folks continue to use and improve upon gratis information and software without contributing the products that used that information and/or improvements, the 'weird old Victorian ideas of "intellectual property"' are going to have to continue to be dealt with. Remember... you likely cannot reasonably afford an army of lawyers to ensure that pretty much noone uses your work without paying you, but big companies like Microsoft, RedHat, IBM, Oracle, etc, etc, and wealthy individuals can.

For as long as those wealthy entities can lock up and force you to pay for their work and ideas, but make it ruinously expensive for us little people to -individually- do the same to them, we'll need "weird old Victorian" things like licenses to help correct this imbalance of power.


It looks like you're missing the entire purpose of copyleft vs public domain.

The point is that copyleft source code cannot be used to improve proprietary software. That limitation is enforced with copyright.

Proprietary software is closed source. You can't train your NN on it, because you can't read it in the first place.

If someone takes your open source code and incorporates it into their proprietary software, then they are effectively using your work for their private gain. The entire purpose of copyleft is to compel that person to "pay it forward", by publishing their code as copyleft. This is why Stallman is a proponent of copyright law. Without copyright, there is no copyleft.


Copyleft wouldn’t need to exist without copyright because there would be no proprietary software to fight against.

Sure, there would be software with code not published, but if it was ever leaked which it often is, you could do whatever you want with it.

But in a world where copyright does exist, copyleft is a tool to fight back.


Yes, but we aren't here talking about whether copyright should exist. We're talking about whether Copilot violates it.


I'm replying to the comment that RMS supports copyright. I don't believe he does, I believe he would rather it not exist at all but since it does, you have to make use of it.


That's the full context of what I was saying. Copyleft is a hack for copyright. In a world where copyright is enforced, RMS doesn't consider the neutral ground of public domain licenses (like the popular MIT and BSD licenses) good enough. They do nothing to solve the problem of proprietary software.

The GPL is entirely dependent on copyright. Rather than pretend copyright doesn't exist, the GPL turns it in the other direction. By violating the GPL, Copilot is still violating copyright.


> If someone takes your open source code and incorporates it into their proprietary software, then they are effectively using your work for their private gain.

And then if we can close that loop by taking their proprietary software and feeding it into a NN to re-liberate it isn't that a net win for software freedom?

Today crossing the sourcecode->bytecode veil effectively obfuscates the implementation beyond most human's ability to modify the software. Humans work best in sourcecode. Nothing saying our AI overlords won't be able to work well in bytecode or take it in the other direction.

I guess what I'm saying is, today a compiler is a one-way door for software freedom. Once it goes through the compiler, we lose a lot of freedom without a massive human investment or the original source code. Maybe that door is about to become a two way door with copyright law supporting moving back and forth through that door?


> And then if we can close that loop by taking their proprietary software

From where? They aren't publishing it. That's literally the meaning of proprietary.


I can’t tell if you disagree with the rest of my comment or didn’t bother to read it…

That’s literally not the definition of proprietary.

You download proprietary software when you navigate to (nearly) every webpage. Just because a website like HN sends you (possibly unobfuscated) HTML, CSS, and JS over the wire in plain-text does not mean those files are not proprietary. Those files are covered by copyright in the U.S.

Access to the source code is not sufficient for that source code to be FOSS.

You also failed to acknowledge leaked source code and bytecode decompilation, which were a substantial portion of my comment.


That's not the meaning of proprietary, but otherwise you're right.


The definition gets a bit blurry around software, just like the definition of "ownership" does.

Colloquially, "proprietary software" means closed-source. You can definitely put it in context where it means "copyright without license"; but outside that context, the colloquial meaning is enough.


I think (1) you're mainly missing that copyleft vs non-copyleft is actually irrelevant for the copilot case. You also (2) may be missing the legal footing of copyleft licenses.

(1) The problem with copilot is that when it blurps out code X that is arguably not under fair use (given how large and non-transformed the code segment is), copilot users have no idea who owns copyright on X, and thus they are in a legal minefield because they have no idea what the terms of licensing X are.

Copilot creates legal risk regardless of whether the licensing terms of X are copyleft or not. Many permissive licenses (MIT, BSD, etc) still require attribution (identifying who owns copyright on X), and copilot screws you out doing that too.

(2) Whatever legal power copyleft licenses have, it is ultimately derived from copyright law, and people who take FOSS seriously know that. The point of "copyleft" licenses is to use the power of copyright law to implement "share and share alike" in an enforceable way. When your WiFi router includes info about the GPL code it uses, that's the legal of power of copyright at work. The point of copyleft licenses is not to create a free-for-all by "liberating" code.


Copying even a single code segment literally will have you lose a fair use trial. Look at Google taking 10k+ lines verbatim and winning the case.


You can "launder" away the license of any source code you have copied simply by deleting it! No snazzy neural network needed.. The litigants argument is that this is what GitHub CoPilot does. It allows others to publish derivative works of copyrighted works with the license deleted. Given that it apparently is trivial to get CoPilot to spit out nearly verbatim copies of the code that it was trained on, I don't think it satisfies the "transformative" requisite of the (American) Fair use doctrine.


Is stable diffusion any different when including a famous artwork or artist in the prompt? The images produced are eerily similar to training data.


probably not and likely open to similar law suits - this is not really a bad thing


It seems like the ideal way to proceed is to make the AI output unique and creative. Perhaps that requires AGI because currently the model has no understanding of art.


Maybe more importantly, the AI needs the ability to judge when its output amounts to plagiarism, like humans generally are able to. The AI needs to feel bad about ripping off someone else’s work. ;)


Farmers plant their crops out in the open too. Should Boston Dynamics be allowed to have their robots rob those fields empty and sell the produce without having to at least pay the farmer? They'd be walking and plucking just like any human would be.

Some source code might be published but not open source licensed. At least some such code has been taken with complete disregard of their licenses and/or other legal protections, and it's impossible to find and properly map out any similar violations for the purposes of a legal response.


This is literally the "you wouldn't steal a car" meme.

To spell it out: No, this analogy does not hold. "Stealing" data does not deprive the owner of anything, so it should not be treated remotely the same as physical stealing (usually not even of potential revenue, as piracy studies show).


While there might not be damages in the literal sense to the owners of the scraped repos, MS is making money from Copilot subscriptions, so what they're doing is closer to selling bootleg copies of a film than giving away pirated copies.


I partially have to concede on that point. The anology could have been better, and I should have put greater emphasis on the massive scale and the lack of recourse for those affected.

Nevertheless, stealing remains illegal so at the very least they have deprived the source code owners of their rights.


> It seems like the end goal of copyleft was to fight back against copyright, not to have copyleft.

Whether this was the original motivation depends on whom you are asking.

You may disagree, but the "Free Software" movement (RMS and the people who agree with him) essentially wants everything to be copyleft. The "Open Source" movement is probably more aligned with your views.


the problem is you can't launder copyrighted code with this because you don't see the copyrighted code in the first place.


The only thing you're missing is that some people lost the plot and think it is all about copy left.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: