AI Training Practices Face Fair Use Criticism from Former OpenAI Staffer

According to the New York Times, former OpenAI researcher, Suchir Balaji, recently spoke out against the company’s practice of using copyrighted data in training AI models, claiming it does not fall under “fair use.” Balaji, who left OpenAI in protest, expressed his concerns over the technology’s potential harm to society.

In a recent article, Balaji explained that earlier versions of OpenAI’s technology were developed without much concern for licensing or permissions, as they were considered research projects. However, with the commercial release of ChatGPT-4, he asserts that OpenAI failed to properly address fair use principles. According to Balaji, the output of ChatGPT too closely resembles the copyrighted inputs used during its training, making it a direct competitor to those original works.

Balaji’s main argument is that generative AI models like ChatGPT rely on making copies of copyrighted data during their training process. Even though the AI may produce different outputs, the underlying training relies on this data, which he believes constitutes infringement if unauthorized.

Another significant concern Balaji highlighted is AI “hallucinations”—when the model generates false or misleading information. He believes that AI technologies, as they replace existing internet services, contribute to the spread of misinformation, further deteriorating the quality of content online.

Balaji isn’t the only one raising these concerns. In recent months, media companies have begun licensing agreements with OpenAI, suggesting that using copyrighted data without a license could pose market harm. Licensing experts agree, pointing out that such deals provide a clear path to monetizing content while ensuring proper use.

On the flip side, some publishers, like Cambridge University Press, are taking proactive steps by asking authors for permission before licensing their works for AI training. This approach is seen as a responsible way to integrate high-quality information into AI models while respecting authors’ rights.

The Authors Guild has also weighed in, praising PRH UK’s move to reserve rights for AI training in copyright agreements. However, the Guild calls for stronger protections in publishing contracts, emphasizing that no publisher should license works for AI training without explicit consent from the authors.

As debates about fair use in AI training continue, this issue underscores the importance of clear licensing agreements and the need for transparency in how AI models are developed and used.

AI Training Practices Face Fair Use Criticism from Former OpenAI Staffer

Sign Up for Our Newsletter