Log In

AI Copying AI: How The Rise of New AI Models Poses Difficult Intellectual Property Considerations

Published 1 month ago5 minute read

By Priya JainPhoto CreditAnthropic

The rapid growth and development of Artificial Intelligence (AI) has posed significant intellectual property questions for AI companies. Most recently, OpenAI, best known for its AI products like ChatGPT, accused DeepSeek, a Chinese AI startup, for illegally copying and developing a competing product called R1.[1] OpenAI states in its Terms of Use that users cannot use its “Output to develop models that compete with OpenAI”, and users cannot “[a]ttempt to or assist anyone to reverse engineer, decompile or discover the source code or underlying components of [their] Services, including [their] models, algorithms, or systems.”[2] Thus, OpenAI’s accusations against DeepSeek specifically allege that DeepSeek violated OpenAI’s Terms of Use by harvesting large amounts of data from OpenAI’s products and then using this data to teach its own AI models, like R1.[3]

The process of using a strong AI model’s technological data to create another model is called distillation.[4] This technique is specifically meant to enhance the training of smaller AI models so that they can attain the capabilities to produce a large variety of outputs and compete with other influential AI models in the market.[5] Therefore, through the distillation technique, DeepSeek would have created R1 with cost-efficient training and at a fraction of the resources than OpenAI.[6]

Since the development of AI models is a novel and uncertain domain, there are still few universal standards on the procedures and guidelines AI companies should follow in developing their models.[7] DeepSeek’s use of OpenAI’s data raises legal and ethical intellectual property questions, one of the most significant being whether there has been a copyright violation.

Copyright protection is afforded to “original works of authorship”[8] and its purpose is to  provide incentives for authors to create and disseminate their original works.[9] Considering this purpose, it is necessary to determine whether permitting companies like DeepSeek to train their models using the data of other larger AI companies would lead to a market saturation of copycat models and result in limited technological growth, or whether distillation would improve AI advancement.[10] Additionally, companies fighting distillation may not have a strong argument that their data is “original works of authorship” because they similarly train their models using other copyrighted works.[11]

In a recent copyright infringement lawsuit between The New York Times and OpenAI, The New York Times sued OpenAI for using millions of its journalist’s copyrighted works to train ChatGPT.[12] OpenAI argues that its use of these copyrighted works is permissible under fair use principles.[13] The fair use doctrine allows the reproduction of copyrighted work for purposes including “teaching [], scholarship, or research”, so OpenAI’s argument is essentially that training an AI model using copyrighted works is akin to copyrighted works being used in a traditional teaching sense.[14] Court precedent regarding the copying of technological code in the creation process of another program has been qualified as fair use because the courts found there is not copyright infringement where the end product from the copying “serves a functional purpose and provides social utility.”[15] Thus, the copying of data produced by AI models to develop more models could similarly fall into this idea of serving a functional purpose of growing and developing AI, and also providing social utility through AI enhancement.[16] However, even if DeepSeek is able to defend its distillation of OpenAI’s data as fair use, lawmakers should ultimately consider whether broad permissibility of copying technological data is truly ethical in our copyright protection landscape.

Overall, as AI technology progresses it will be interesting to see how the law evolves to balance the interests of promoting the development of AI but also ensuring that intellectual property principles are not applied so broadly as to allow uncontrolled infringements.

Priya Jain is a 2L at Vanderbilt Law School and is originally from Dallas, Texas. After graduation she plans on moving to Washington, D.C. to practice litigation.

[1] Lea Frermann & Shaanan Cohney, OpenAI says DeepSeek ‘inappropriately’ copied ChatGPT – but it’s facing copyright claims too, The Conversation (Feb. 4, 2025), https://theconversation.com/openai-says-deepseek-inappropriately-copied-chatgpt-but-its-facing-copyright-claims-too-248863.

[2] Terms of Use, OpenAI (Dec. 11, 2024), https://openai.com/policies/row-terms-of-use/.

[3] Cade Metz, OpenAI Says DeepSeek May Have Improperly Harvested Its Data, The New York Times

(Jan. 29, 2025), https://www.nytimes.com/2025/01/29/technology/openai-deepseek-data-harvest.html.

[4] Mary Bennett & Rob Robinson, OpenAI Accuses DeepSeek of Unlawful Use of AI Models, Raising Ethical and Legal Concerns, JD Supra (Feb. 3, 2025), https://www.jdsupra.com/legalnews/openai-accuses-deepseek-of-unlawful-use-2896277/.

[5] Id.

[6] João da Silva & Graham Fraser, OpenAI says Chinese rivals using its work for their AI apps, BBC (Jan. 29, 2025), https://www.bbc.com/news/articles/c9vm1m8wpr9o.

[7] Ambuj Tewari, Unpacking DeepSeek: Distillation, ethics and national security, Michigan News (Jan. 31, 2025), https://news.umich.edu/unpacking-deepseek-distillation-ethics-and-national-security/.

[8] 17 U.S.C.A. § 102

[9] Copyright Basics, United States Patent and Trademark Office (last visited Feb. 9, 2025), https://www.uspto.gov/ip-policy/copyright-policy/copyright-basics#:~:text=The%20primary%20purpose%20behind%20copyright,them%20available%20in%20the%20marketplace.

[10] See id.

[11] Lea Frermann & Shaanan Cohney, OpenAI says DeepSeek ‘inappropriately’ copied ChatGPT – but it’s facing copyright claims too, The Conversation (Feb. 4, 2025), https://theconversation.com/openai-says-deepseek-inappropriately-copied-chatgpt-but-its-facing-copyright-claims-too-248863.

[12] Bobby Allyn, ‘The New York Times’ takes OpenAI to court. ChatGPT’s future could be on the line, NPR (Jan. 14, 2025), https://www.npr.org/2025/01/14/nx-s1-5258952/new-york-times-openai-microsoft.

[13] Id.

[14] See 17 U.S.C.A. § 107.

[15] Jenny Quang, Does Training Ai Violate Copyright Law?, 36 Berkeley Tech. L.J. 1407, 1415–17 (2021); see Sega Enterprises Ltd. v. Accolade, Inc., 977 F.2d 1510, 1527-28 (9th Cir. 1992) (“We conclude that where disassembly is the only way to gain access to the ideas and functional elements embodied in a copyrighted computer program and where there is a legitimate reason for seeking such access, disassembly is a fair use of the copyrighted work, as a matter of law.”).

[16] See Quang, supra note 15, at 1415–17.

Origin:
publisher logo
Vanderbilt University
Loading...
Loading...

You may also like...