https://arxiv.org/abs/2306.08997 “arXiv will not consider removal for reasons su...

anonymouskimmer · on June 25, 2023

DMCA takedown, then? Or not applicable because the data is not part of the publication? Tables 17 and 18 in the Appendix could probably be removed as they seem to verbatim copy course descriptions, as well as, maybe, Figure 4.

> Iddo did not have permission from all the instructors to collect the assignment and exam questions that made up the dataset that was the subject of the paper.

skissane · on June 25, 2023

> DMCA takedown, then? Or not applicable because the data is not part of the publication?

Legally, you are allowed to refuse to comply with DMCA take down notices. If you do, you are increasing your risk of being sued for copyright infringement, and increasing the potential damages if you lose – but, if you decide (in any individual case) that is a risk worth taking, you are free to take that risk. If MIT tried to issue a DMCA takedown to arXiv over this, arXiv might decide that defending their own policies is worth the risk of being sued by MIT.

> Tables 17 and 18 in the Appendix could probably be removed as they seem to verbatim copy course descriptions, as well as, maybe, Figure 4.

Probably a sufficiently small extract from the source material, that it would fall under fair use? (Lack of acknowledgement of the specific source may be an issue; but that can be remedied by adding an acknowledgement, rather than removal.)

anonymouskimmer · on June 25, 2023

IANAL.

For the tables there's very little transformation, and a huge chunk of verbatim text. I don't see how there is any gain versus just publishing the course numbers and titles.

For figure 4 this might fall under "unpublished material" protections, which are: https://www2.archivists.org/publications/brochures/copyright...

> Generally, material is considered unpublished if it was not intended for public distribution or if only a few copies were created and distribution was limited.

> The law distinguishes between published and unpublished material and the courts often afford more copyright protection to unpublished material when an asserted fair use is challenged.

> Rather, courts evaluate fair use cases based on four factors, no one of which is determinative in and of itself:

2) > Courts give more protection to works that are “closer to the core of copyright protection,” such as unpublished

4) > The effect of the use upon the potential market for, or value of, the copyrighted work: This factor assesses how, and to what extent, the use damages the existing and potential market for the original.

Publication of the (possibly) previously unpublished copyrighted work in figure 4 fully and completely destroys its value. I don't know if a fair use claim can overcome such an impact, though that is up to a court to determine.

skissane · on June 25, 2023

> For the tables there's very little transformation, and a huge chunk of verbatim text.

IANAL either–but how is the copyright owner (MIT presumably) harmed by the reproduction of these course descriptions? It isn't like they harm the commercial value of the courses in any way; the course is the actual product here, the description is just sales and marketing collateral, and has minimal value apart from the product it is selling.

Furthermore, given the fact the paper was coauthored by MIT employees – arXiv could argue that MIT (through its employees acting as its agents) had granted them an implied license to reproduce it. Which is the other issue – even if this isn't fair use, MIT may have agreed to license it through its agents. You can still be bound by the actions of your employees, even if those actions violated your own internal policies–especially in dealings with third parties who had no reason to suspect there was any such violation.

> I don't see how there is any gain versus just publishing the course numbers and titles.

"Algebra I" and "Algebra II" don't mean much – what topics do they actually cover? A one sentence/paragraph course description adds a lot, because they tell you what topics are actually covered. Yes, someone could probably look it up on the MIT website – but it saves the reader a lot of effort doing that. Especially if someone is reading this 20 years from now, by which time the content of MIT courses may have changed a lot (despite having the same title), and finding what their content was 20 years ago may require a lot of research effort (if the reader even thinks to do that).

> Publication of the (possibly) previously unpublished copyrighted work in figure 4 fully and completely destroys its value

Figure 4 is likely not the "work", rather a small quote from a much larger work. How does a small quote from a work (even if allegedly unpublished) "fully and completely destroys its value"?

anonymouskimmer · on June 25, 2023

Re: the course descriptions. Yes, I can see a judge buying that defense. And yes, we don't know what license to use exists for MIT faculty. I could also see a judge buying that the research article here doesn't need to publish the course descriptions in order to make its point at all.

> IANAL either, but figure 4 is likely not the "work", rather a small quote from a much larger work. How does a small quote from a work (even if allegedly unpublished) "fully and completely destroys its value"?

Exams are often composites of multiple independent works. Said exams being recomposited periodically (i.e. using a database of questions to create an exam). The argument here is that the individual question is itself a complete work (equivalent to an independent chapter in a book of works on a topic). And here it is not just on its lonesome, but with its answer, too.

skissane · on June 25, 2023

> Exams are often composites of multiple independent works.

If figure 4 came from an exam. For all we know, figure 4 actually came from course notes, assignments, etc. Whether or not issuing those to students counts as "publication", they are easily available to future students in a way that past exam questions are often not, hence their publication does far less damage to their value.

Also, MIT says that "Iddo did not have permission from all the instructors" – for all we know, figure 4 is from one of those instructors for which he did have that permission.

anonymouskimmer · on June 25, 2023

Yep, those sorts of possibilities is what the "(possibly)" in my earlier post was for.

Based on a quick search it seems the figure 4 question and answer have to do with https://en.wikipedia.org/wiki/Markov_decision_process , which seem to be used in computer science. Iddo Drori is an associate professor of CS, so it seems quite likely it's his own question.

behnamoh · on June 25, 2023

That’s what I don’t like about arxiv. The person posting the paper must be able to take it down as well.