Reddit Cuts Off Wayback Machine To Thwart AI Scrapers

If you have ever relied on the Wayback Machine to dig up an old Reddit post or see a thread that has since vanished, that window is about to close.

Reddit has announced that it is cutting off most of its site from the Internet Archive’s Wayback Machine, claiming some artificial intelligence (AI) companies have been quietly sneaking around the archive to bypass its data restrictions.

What is Internet Archive?

The Internet Archive is a non-profit organization dedicated to preserving as much of the internet’s history, from old websites to books and cultural artifacts. Its Wayback Machine allows anyone to see how a webpage looked at a specific point in time — even if it has been deleted or changed since. However, Reddit says the archive has also been keeping posts that users have removed, a practice it argues raises privacy concerns.

”Internet Archive provides a service to the open web, but we’ve been made aware of instances where AI companies violate platform policies, including ours, and scrape data from the Wayback Machine,” Reddit spokesperson Tim Rathschmidt said in a statement to The Verge. “Until they’re able to defend their site and comply with platform policies (e.g., respecting user privacy, re: deleting removed content) we’re limiting some of their access to Reddit data to protect redditors.”

The new restrictions have already been put into effect since yesterday, with Reddit saying it had notified the Internet Archive in advance.

The change means the Wayback Machine will no longer be able to save Reddit posts, comments, or profiles. It will now be able to save only Reddit’s homepage. For years, the archive has been a go-to for journalists, researchers, and curious users, preserving snapshots of Reddit’s sprawling conversations. Now, it will function more like a snapshot of daily trending headlines rather than a full historical record.

This move is part of a larger trend: Reddit has spent years tightening control over its data as AI companies scramble for content to train their models. Deals with Google and OpenAI have reportedly brought in millions, and Reddit has made it clear — if AI firms want access, they will have to say.

Reddit has been tightening control over its data for years, particularly as AI companies scramble for content to train their models. Deals with Google and OpenAI have reportedly brought in millions for the platform, but Reddit has made clear that if AI firms want access, they have to pay. Earlier this year, the company even sued AI start-up Anthropic, accusing it of scraping the site without permission.

“We have a longstanding relationship with Reddit and continue to have ongoing discussions about this matter,” Mark Graham, Director of the Wayback Machine, said in a statement to The Verge.

While Reddit says the move is about safeguarding user privacy and upholding its rules, critics worry it risks erasing pieces of the internet’s historical record. Once a post vanishes from Reddit and can’t be archived, it’s gone for good — taking with it a piece of online culture that might otherwise have been preserved.

Kavita Iyer https://www.techworm.net

An individual, optimist, homemaker, foodie, a die hard cricket fan and most importantly one who believes in Being Human!!!

Reddit Cuts Off Wayback Machine To Thwart AI Scrapers

What is Internet Archive?

Microsoft Glitch Blocks Legitimate Emails, Teams Messages

Chinese State-Backed Hackers Exploit Critical Dell Zero-Day For Over 18 Months

Man Arrested For Demanding Reward After Police Data Leak

EU Moves Against TikTok’s ‘Addictive’ Design

Russia Fully Blocks WhatsApp, Urges Citizens To Use State App