Thousands of YouTube video transcripts were possibly used to train AI models without consent. The data set included transcripts from popular channels and shows, as well as transcripts from well-known YouTubers. The dataset was created by a non-profit organization called Luther AI, who aims to accelerate AI development by allowing open access to data. The use of this data set is protected under fair use, according to the AI companies involved. However, there is still ambiguity around the licensing and legality of using AI tools and AI generation.
Keywords
YouTube, video transcripts, AI models, consent, data set, Luther AI, fair use, licensing, legality
Takeaways
- Thousands of YouTube video transcripts were used to train AI models without consent.
- The dataset included transcripts from popular channels, shows, and well-known YouTubers.
- The dataset was created by a non-profit organization called Luther AI.
- The use of this data set is protected under fair use, according to the AI companies involved.
- There is still ambiguity around the licensing and legality of using AI tools and AI generation.
Links:
https://www.theatlantic.com/technology/archive/2023/08/books3-ai-meta-llama-pirated-books/675063/