• 0 Posts
  • 44 Comments
Joined 2 years ago
cake
Cake day: June 1st, 2023

help-circle






  • In the movie industry, everyone usually signs a work for hire contract that specifies who will have the rights to the completed film.

    However, in a recent case the director (Alex Merkin) did not sign a contract and then tried to claim copyright afterwards. The court said that directors have no inherent copyright over film:

    We answer that question in the negative on the facts of the present case, finding that the Copyright Actʹs terms, structure, and history support the conclusion that Merkinʹs contributions to the film do not themselves constitute a ʺwork of authorshipʺ amenable to copyright protection. … As a general rule, the author is the party who actually creates the work, that is, the person who translates an idea into a fixed, tangible expression entitled to copyright protection. … But a directorʹs contribution to an integrated ʺwork of authorshipʺ such as a film is not itself a ʺwork of authorshipʺ subject to its own copyright protection.



  • Simple question:

    If you are college student, learning to write professionally, is it fair use to download copyrighted books from Z-Library in order to become a better writer? If you are a musician, is it fair use to download mp3s from The Pirate Bay in order to learn about musical styles? How about film students, can they torrent Disney movies as part of their education?

    I’m certain that every court in the US would rule that this is not fair use. It’s not fair use even if pirated content ultimately teaches a student how to create original, groundbreaking works of writing, music, and film.

    Simply being a student does not give someone free pass to pirate content. The same is true of training an AI, and there are already reports that pirated material is in the openAI training set.

    If openAI could claim fair use, then almost by definition The Pirate Bay could claim fair use too.


  • Again, it’s not a question of reproducing books in an LLM. The allegation is that the openAI developers downloaded books illegally to train their AI.

    You need to pay for your copy of a book. That’s true if you are a student teaching yourself to write, and it’s also true if you are an AI developer training an AI to write. In the latter case, you might also need to pay for a special license.

    Is it possible that the openAI developers can bring the receipts showing they paid for each and every book and/or license they needed to train their AI? Sure, it’s possible. If so, the lawyers who brought the suit would look pretty silly for not even bother to check.

    But openAI used a whole lot of books, which cost a whole lot of money. So I wouldn’t hold my breath.


  • the purpose and character of the use, the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect of the use upon the potential market.

    Yes, and I named three of those factors:

    the key questions are often whether the use of the work (a) is commercial, or (b) may substitute for the original work. Furthermore, the amount of the work copied is also considered.

    And while you don’t need to meet all the criteria, the odds are pretty long when you fail three of the four (commercial nature, copying complete work rather than a portion, and negative effect on the market for the original).

    Think of it this way: if it were legal to download books in order to train an AI, then it would also be legal to download books in order to train a human student. After all, why would a human have fewer rights than an AI?

    Do you really think courts are going to decide that it’s ok to download books from The Pirate Bay or Z-Library, provided they are being read by the next generation of writers?


  • If a musician doesn’t have the right to their own work, it’s because someone offered to pay them for the rights and they accepted.

    Is that in their favor? I think so, considering the alternative is to not get paid and not have rights to their work.

    And not to go too far off topic, but publicly funded research is generally not aimed at drug development, it is aimed at discovering the basic science behind how the body works (human body or otherwise).

    If you want a clinical trial that proves a particular drug can actually help patients, you will need to find a company to pay for it. The government almost never pays for clinical trials (I think the COVID vaccine might have been an exception). Clinical trials are far more expensive than basic science, and patents are the carrot to get the private sector to pay for them.




  • I know the model doesn’t contain a copy of the training data, but it doesn’t matter.

    If the copyrighted data is downloaded at any point during training, that’s an IP violation. Even if it is immediately deleted after being processed by the model.

    As an analogy, if you illegally download a Disney movie, watch it, write a movie review, and then delete the file … then you still violated copyright. The movie review doesn’t contain the Disney movie and your computer no longer has a copy of the Disney movie. But at one point it did, and that’s all that matters.


  • If they bought physical books then the lawsuit might happen, but it would be much harder to win.

    If they bought e-books, then it might not have helped the AI developers. When you buy an e-book you are just buying a license, and the license might restrict what you can do with the text. If an e-book license prohibits AI training (and they will in the future, if they don’t already) then buying the e-book makes no difference.

    Anyway, I expect that in the future publishers will make sets of curated data available for AI developers who are willing to pay. Authors who want to participate will get royalties, and developers will have a clear license to use the data they paid for.


  • When determining whether something is fair use, the key questions are often whether the use of the work (a) is commercial, or (b) may substitute for the original work. Furthermore, the amount of the work copied is also considered.

    Search engine scrapers are fair use, because they only copy a snippet of a work and a search result cannot substitute for the work itself. Likewise if you copy an excerpt of a movie in order to critique it, because consumers don’t watch reviews as a substitute for watching movies.

    On the other hand, openAI is accused of copying entire works, and openAI is explicitly intended as a replacement for hiring actual writers. I think it is unlikely to be considered fair use.

    And in practice, fair use is not easy to establish.


  • The question “what is sufficient” basically amounts to convincing an official that the final work reflects some form of your creative expression.

    So for instance, if you are hired to take AI-generated output and crop it to a 29:10 image, that probably won’t be eligible for copyright. You aren’t expressing your creativity, you are doing something anyone else could do.

    On the other hand, if you take AI-generated output and edit it in photoshop to the point that everyone says “Hey, that looks like a ThunderingJerboa image”, then you would almost certainly be eligible for copyright.

    Everyone else falls in between, trying to convince someone that they are more like the latter case. Which is good, because it means actual artists will be rewarded.