Non-profit AI research group EleutherAI scraped YouTube subtitles to create a dataset in violation of YouTube's terms of service, ProofNews said on July 16. The dataset, called the Pile, allegedly includes subtitles of 173,536 YouTube videos from over 48,000 channels. About 12,000 deleted videos are part of the dataset. Several top tech and AI firms, [...]