NVIDIA trained AI with millions of videos extracted from YouTube and Netflix, says website

According to a report by 404 Media, NVIDIA has been engaging in questionable practices in its efforts to train a new generative AI model. The company’s employees have reportedly been massively extracting video files from platforms like YouTube and Netflix, potentially infringing on copyrights, to feed the model an astonishing 80 years of video footage per day.

The information about NVIDIA’s activities was obtained from internal data leaks, including Slack communications, corporate emails, and statements from former employees. This data was allegedly used to train an as-yet-undisclosed AI model as part of a project codenamed “Cosmos.”

Advertisements

The reports indicate that NVIDIA employees utilized an open-source tool to download videos from YouTube in bulk, using 20 to 30 virtual machines hosted on Amazon Web Services. This resulted in the accumulation of 30 million URLs worth of video data over the course of just one month.

Notably, some of the video databases used in the project explicitly state that they can only be accessed for academic purposes and are not commercially licensed. Despite NVIDIA having its researchers, messages suggest that the extracted content was intended for commercial use in developing the new AI model. NVIDIA’s vice president of research, Ming-Yu Liu, was reportedly one of the key figures involved in the search for this data.

Advertisements
  When will GeForce RTX 50 graphics cards be released?

These revelations raise serious concerns about NVIDIA’s data-gathering practices and the potential disregard for intellectual property rights in its efforts to advance its AI technology. As the company continues its work on the Cosmos project, the ethical and legal implications of its actions will likely come under increased scrutiny.

What the companies involved say

NVIDIA has recently addressed concerns regarding using copyrighted content in training their AI models, affirming that they “respect the rights of all content creators” and are confident that their research efforts adhere to both the letter and spirit of copyright law.

Advertisements

The company justifies its practices by invoking the principle of “fair use,” particularly highlighting that transformative purposes, such as training models, are legally permissible. NVIDIA also emphasizes that “anyone is free to learn facts, ideas, data, and information from another source and use them to create their expressions.”

This statement comes amid growing scrutiny over how AI models are trained, especially concerning using copyrighted material without explicit permission. The issue has sparked debate over the transparency and legality of such practices, particularly within large tech companies.

Advertisements
  Microsoft MatterGen: Revolutionizing materials development with AI

In response to inquiries regarding the case, YouTube, owned by Google, reiterated CEO Neal Mohan’s comments from April 2023, where he stated that using platform videos for training AI models, such as OpenAI’s Sora, would be a “clear violation of YouTube’s terms of use.” This stance underscores YouTube’s protective approach toward the content on its platform.

Similarly, a spokesperson for Netflix confirmed that the streaming giant has no agreement with NVIDIA for content transfer. They reinforced that Netflix’s terms of service prohibit the mass extraction of data, signaling a solid stance against unauthorized use of its content for AI training purposes.

These developments highlight ongoing concerns about the transparency of AI training practices among major tech companies, especially regarding the use of copyrighted materials. NVIDIA, along with other industry leaders like Apple and Salesforce, has previously faced accusations of utilizing YouTube videos for AI training. This practice has repeatedly questioned the balance between innovation and intellectual property rights.

Source: 404 Media

TAGGED:
Author
Follow:
Rohit is a certified Microsoft Windows expert with a passion for simplifying technology. With years of hands-on experience and a knack for problem-solving, He is dedicated to helping individuals and businesses make the most of their Windows systems. Whether it's troubleshooting, optimization, or sharing expert insights,
Leave a Comment