A written work is the words and symbols in it, and the order in which they appear. Alaska for Looking  is clearly a derivative work of Looking for Alaska; clearly, only one of these is necessary to preserve that identity.
In other words: GitHub, stop laundering copyright. If you trained Copilot on GPL code, you are obliged to release that derived product as GPL. If you trained it on MPL code, you are in violation of that license.
@tindall don't worry, they tested it and it only gives back verbatim code that someone else wrote in 1 in every 1000 autocompletes
which is apparently a rare thing according to Microsoft
and also they had to specifically force it to not spit out the GPL if you autocompleted in an empty file because it had consumed so much GPL'd code
@tindall i'm waiting for someone to patiently explain to me why this isn't true in a way that sounds credible, but i think the answer boils down to: because copyright exists for the benefit of the megacorps, not in order to meaningfully restrain them.
courtroom, fantasy; (thread missing CW) software licensing, corporations
prosecutor: What is the project's licensing model?
Google employee: It's open-source.
prosecutor: Where is the source code located?
Google employee: The code is in a git repository at android.googlesource.com.. There are instructions for building it on the website at source.android.com..
prosecutor: Very well. Please download the source code onto this laptop and build the project.
Google employee: *sweating bullets*
@tindall no they are not. Whether they need to comply depends on them creating a work that is considered derived from a work to which they only had a license under gpl and them having no other right to use the software for this purpose. But I don’t see, where training your model is different from e.g a line count program and distributing your findings. Also it is questionable whether the short snippets „reproduced“ even constitute protectable work.
@tindall This is based on german copyright law, but should be similar enough to transfer to US law because copyright law is very similar around the world thanks to the revised berne convention on copyright. I study law and was often surprised how different lawyers see the world. Computer people (hackers) tend to apply their technical knowledge to law problems and it’s very often very wrong.
@aurorus if you don't see how training a model that can reproduce code snippets or even whole files a nonnegligible percentage of the time is different from running a word count we are living in different realities.
@tindall I think they're bypassing that by the fact that they're not distributing the model itself, but instead running it on their servers with access through an api.
Now, of course, the training data probably does include *AGPL* code too.
@ari @tindall for sure - to get to the point where the suggestions this thing generates are legally usable, it really needs to tell you (or let you filter suggestions by) the license - and give you all attributions required by the license of the derived snippit.
And given that most models of this sort are "throw in a bunch of training data and see what comes out", getting that sort of structure isn't really possible as far as I know?
@tindall my fear at the outset of Copilot is the result of the Oracle vs Google case.
I believe Microsoft will point to that, draw enough gray areas to confuse the courts and say something like "the code we trained our models on were APIs. the resultant generated code conforms to a similar API, but it is not the same API. technically, we're just doing what google did to oracle and you just said that was okay."
and if that's "not okay" then it means they could use AI to generate any conceivable API that doesn't exist and then copyright it?
idk. that's just my fear on how this shakes out, despite it being clearly derivative of the open-source works it was trained on.
@tindall @cwebber the entire area of AI created anything is fuzzy. Usually, the best guess is that the person who programs/trains the AI has ownership. I know there's some effort in the EU going on to sort out these legalities, but since they'll have an impact on international copyright law as well, I expect this to not resolve very quickly.
@tychi yeah, i mean, it's possible. i don't think most courts are quite that easy to deceive, even for microsoft.