New Licensing Standard Unveiled to Regulate AI Scraping of Online Content
A new licensing standard aims to empower web publishers to define the terms under which AI developers utilize their content. Major brands including Reddit, Yahoo, Medium, Quora, and People Inc. have expressed their support for the Really Simple Licensing (RSL) standard, an initiative designed to establish clear payment structures for AI systems that scrape data from websites, reports 24brussels.
The RSL Standard enhances the existing robots.txt protocol, which has traditionally allowed publishers to set guidelines for web crawlers. However, this new framework enables site owners to incorporate licensing and royalty agreements directly into their robots.txt files. Publishers can also embed these terms into their digital publications, including online articles and training datasets for which they seek compensation.
At the forefront of this initiative is the RSL Collective, overseen by Eckart Walther—co-creator of the RSS standard—and Doug Leeds, former CEO of IAC Publishing. “The goal is to create a new, scalable business model for the web,” Walther stated. “RSL leverages early RSS concepts to introduce a systematic approach for defining licensing and compensation rights across the internet.”
The RSL Standard accommodates various licensing mechanisms, including free options. Site owners may require AI companies to pay either a subscription fee or a pay-per-crawl charge each time an AI bot accesses their content. Additionally, there is a provision for a pay-per-inference strategy, which allows publishers to receive compensation whenever an AI model references their material. Bots engaged in archival or standard search engine activities will continue to operate under existing protocols.
Multiple media organizations, such as Vox Media, News Corp, and The New York Times, have individually negotiated licensing agreements with AI firms like OpenAI and Amazon. The RSL Collective seeks to streamline this process, offering a unified means for publishers to receive payment for their work without the need for separate negotiations.
The success of the RSL Standard hinges on buy-in from dominant industry players, particularly AI companies. Historically, AI developers have faced scrutiny for disregarding robots.txt directives, complicating the implementation of fee structures without their involvement. By aligning major web publishers, the RSL Collective aims to enhance the appeal of the standard. “Our mission is to mobilize a large coalition of stakeholders, making it legally and logistically efficient for companies to comply,” Leeds emphasized.
Notably, the RSL Standard does not possess the capacity to block bots from accessing a website, unlike certain systems currently offered by Cloudflare. The Collective is currently collaborating with Fastly, a content delivery network, to manage AI bot access based on licensing agreements. “Fastly acts as the gatekeeper, ensuring only compliant bots can enter,” Leeds explained. Publishers utilizing other networks can still negotiate licensing but will lack the capability to restrict AI crawlers without additional infrastructure.
Leeds asserted that the RSL Collective is well-positioned to enforce licensing agreements through group participation, which shares the legal burdens of potential infringements. He likened the initiative to existing digital rights organizations, asserting that it establishes essential protections for digital content. However, the legal context surrounding unauthorized data scraping remains murky, with major AI companies currently embroiled in legal disputes over similar issues.
“The RSL framework fundamentally alters the landscape by mandating that bots are informed of the terms of access prior to engaging with websites,” both Leeds and Walther noted in a joint statement.
Ultimately, Leeds envisions the RSL Standard as a means to facilitate intuitive licensing for AI training use—a system not meant to reinvent existing structures but to adapt them for a new digital environment. “RSL is crucial for laying down the infrastructure necessary for implementing standards that have been successful in other sectors,” he concluded.
The RSL Collective is open to publishers and creators at no cost, with several prominent organizations, including O’Reilly and wikiHow, also endorsing the initiative.