Research Guides: Artificial Intelligence: AI and Copyright

Do I Need to Consider Copyright When Using AI?

There are two ways that copyright is related to AI.

Training data used to train AI models
Copyrightability of output created by AI models

As with many aspects of the rapidly advancing field of AI, the impact of copyright law on how AI models are created and how their output can be used is still unsettled.

Confused about copyright? LNDL's Copyright Information Center can help. Schedule a copyright consultation or attend a copyright workshop.

Does AI Infringe Copyright?

The development of generative AI models, particularly large language models (LLM) like ChatGPT or image creation and recognition AI tools such as DALL-E, Midjourney, and Stable Diffusion are reliant on using copyrighted works as training data. Most AI creators have ingested this data from online collections containing copyrighted works without permission from or compensation to copyright holders. Whether or not the use of this data constitutes a violation of copyright law is currently undetermined. AI creators are claiming their use is considered a fair use under copyright law, while copyright holders are claiming it is an infringement of their rights and have sued the AI creators. It may take years for these court cases to resolve and provide a more definitive answer.

Arguments that use of copyrighted materials in training AI is a fair use:

The way that generative AI models work means that they are not creating derivative works and thus not infringing on the right of the copyright holder to control derivatives created based off of their works.
In the Author's Guild v. Google Books case, Google's use of numerous copyrighted works to develop their Google Books search engine was found to be a fair use because it was transformative and used the original works in a new way that was not intended by their creators. Training of generative AI is a similar use.

Arguments that use of copyright materials in training AI is not a fair use:

AI is not human and thus does not qualify as an "author" under copyright law. Therefore, the goals of copyright law and fair use to incentivize the creation of "authorship" do not apply.
The difference between generative AI training and the Google Books case is that the output created by Google Books did not serve as a direct substitute for the copyrighted work and actually promoted use of the work. Generative AI on the other hand competes with the copyrighted works it uses as training data.

Who Owns the Copyright in AI Created Works?

According to the United States Copyright Office in order for a work to be eligible for copyright, it must have been created by a human. Creating prompts to input into generative AI models does not constitute enough human involvement to qualify the output for copyright, as the person inputting the prompt has very little control over what is actually created. However, a human may use or modify the AI created works in a way that qualifies the final creation for copyright.

Examples

Steven Thaler submitted a copyright registration for an AI created artwork titled "A Recent Entrance to Paradise". The United States Copyright Office rejected his registration due to the lack of human authorship.
Kristina Kashtanova created a graphic novel titled Zarya of the Dawn. All artwork included in the work was created by Midjourney's AI. The United States Copyright Office determined that the artwork itself was not eligible for copyright due to a lack of a human author, meaning that anyone else can use that same artwork to create their own work. However, Kristina Kahtanova is eligible to register copyright in the parts of the work that were created by her including the creative compilation of the work as a whole and the text included in the work.

Beyond the rights granted by the United States Copyright Office, the terms of service of the AI tools may dictate what rights someone using their model has over the output. Users should read the terms of service of the particular tool they are using to determine what they are able to do with the output.

Is My Work Being Used to Train AI?

Most AI developers are not sharing what data they have used to train their models, so it is difficult to know for sure if your works are being used to train any AI models. However, there are three likely scenarios in which your creations may wind up in AI training data. You may or may not be able to control the use at this point.

If you have work that is publicly facing on the internet especially if it's included in any large libraries of online works, there is a good chance that it has been scraped and ingested to train one or more AI models.
Many companies are also beginning to update their terms of service to allow for works you create using their products to be used in training AI. If you are particularly concerned about what you have created being used as training data, you should pay close attention to the terms of service for the web based tools you use. There may or may not be options to opt out of your data being used.
The terms of service for many AI models also require that you agree to any information you enter in their tools being used as training data to help improve future iterations of the product. As such, you may want to make sure to not use the AI tools for anything that contains confidential or sensitive information.
Academic publishers are now beginning to sign deals with AI companies for use of their data which means your publications may be used without your knowledge or consent. Ithaka S+R is tracking agreements between publishers and AI companies.
The Author's Guild has created a model clause for authors to include in publishing agreements to prohibit the use of the material in training AI without permission.

Artificial Intelligence

Do I Need to Consider Copyright When Using AI?

Does AI Infringe Copyright?

Who Owns the Copyright in AI Created Works?

Is My Work Being Used to Train AI?

Further Reading