So when are neural nets trained on images or text going to be confronted with th...

So when are neural nets trained on images or text going to be confronted with the same copyright concerns? At the point that GitHub has forced the issue into the spotlight with Copilot I feel that it's only a matter of time before this reaches the courts. Nobody seemed to care about copyright at the time people were having fun creating AI dream collages or nonexistent anime girls from a model trained on the Danbooru imageset. In the latter case it's not clear that 100% of the original Pixiv and Twitter creators gave their consent to have their work rehosted on a different site in the first place, much less be involved in ML experiments. That data was from 2018.

I'm almost tempted to believe that the people at GitHub knew this was going to blow up as much as it did as some kind of a challenge to the status quo of copyright and licensing, if only so that everyone would start talking about the issue. Why did the GitHub representative plainly state that Copilot was trained on all of GitHub's codebase without seeming to care about the pushback on Twitter and HN that was bound to happen as a result?