Reddit Limits Access to Wayback Machine Amid AI Scraping Concerns
Reddit has announced that it will prevent the Internet Archive’s Wayback Machine from indexing most of its platform after discovering that AI companies have been scraping its data. Consequently, the Wayback Machine will only have access to the Reddit.com homepage, drastically restricting the archival capabilities concerning post details, comments, and user profiles, reports 24brussels.
According to Reddit spokesperson Tim Rathschmidt, “Internet Archive provides a service to the open web, but we’ve been made aware of instances where AI companies violate platform policies, including ours, and scrape data from the Wayback Machine.” He emphasized that Reddit aims to protect its users’ privacy and content integrity by limiting the Internet Archive’s access.
The restriction will begin “ramping up” today, and Reddit claims it previously communicated these limitations to the Internet Archive. Rathschmidt noted that Reddit has consistently raised concerns about unauthorized scraping of its content, particularly from AI firms.
In recent months, Reddit has escalated its measures against data scraping, especially as AI companies have increasingly relied on its content. While it remains open to data sharing under payment agreements, Reddit has restricted access to scrapers that do not comply with its terms. The company previously negotiated data-sharing agreements with Google for AI training purposes while also blocking access to various search engines unless compensated.
This year, Reddit revised its API policies, which ultimately led to the closure of several third-party applications and sparked protests across the platform. It has engaged in dealings with OpenAI but has also taken legal action against Anthropic, accusing it of continuing to scrape data despite reassurances to stop.
The Internet Archive has not yet responded to requests for comment regarding Reddit’s actions.