Posted on Leave a comment

Cloudflare flexes its monopolistic muscle – delists Perplexity as a verified bot, accusing them of not playing nice in the sandbox.

openart image 8fm 6fQH 1754349169632 raw

Cloudflare, as a major internet infrastructure provider, wields disproportionate power over how companies access and interact with large parts of the web. They control access to millions of websites and can unilaterally decide which crawlers are blocked or allowed. Today, they wielded that control against one of the world’s most popular AI agents.

Cloudflare has publicly accused Perplexity of using deceptive and aggressive tactics to bypass restrictions intended to block its AI bots from crawling and scraping websites. According to Cloudflare, after multiple customer complaints and internal investigations, they found Perplexity circumvented standard anti-bot protections, including ignoring robots.txt directives and rotating IP addresses to avoid being blocked. Cloudflare alleges that when Perplexity’s primary bots (with user agents like “PerplexityBot”) were blocked, Perplexity would disguise its crawler as a legitimate human visitor—using a spoofed user agent, such as impersonating Google Chrome on macOS, and employing undeclared and rotating IP addresses belonging to different Autonomous System Numbers (ASNs).

As a result, Cloudflare has delisted Perplexity as a verified bot, meaning Perplexity is now blocked by Cloudflare’s managed rules, alongside other untrusted and unverified crawling activity. This move is significant because Cloudflare’s “Verified Bots” program typically permits access to compliant search engines and crawlers, but requires adherence to rules such as respecting robots.txt files and using published user agent strings and known IP ranges.

Cloudflare’s CEO has compared these tactics to those used by sophisticated cyber actors, highlighting the seriousness of these allegations. The incident underscores increasing tension between publishers, web infrastructure providers, and AI companies over how online content is accessed, scraped, and used to train AI systems.

Perplexity’s response disputes the claims, describing Cloudflare’s findings as a mischaracterization or “publicity stunt,” and maintaining that the bots described do not belong to Perplexity. However, Cloudflare reports the observed activity was detected across “tens of thousands of domains and millions of requests per day”.

Modern AI assistants vs. Web Crawlers

Perplexity went on to explain that their technology was not a “crawler” as Cloudflare claimed but rather, a Modern AI Assistant, and that their methods were no different than Google’s.

“Modern AI assistants work fundamentally differently from traditional web crawling. When you ask Perplexity a question that requires current information—say, “What are the latest reviews for that new restaurant?”—the AI doesn’t already have that information sitting in a database somewhere. Instead, it goes to the relevant websites, reads the content, and brings back a summary tailored to your specific question.

This is fundamentally different from traditional web crawling, in which crawlers systematically visit millions of pages to build massive databases, whether anyone asked for that specific information or not. User-driven agents, by contrast, only fetch content when a real person requests something specific, and they use that content immediately to answer the user’s question. Perplexity’s user-driven agents do not store the information or train with it.”

The difference between automated crawling and user-driven fetching isn’t just technical—it’s about who gets to access information on the open web. When Google’s search engine crawls to build its index, that’s different from when it fetches a webpage because you asked for a preview. Google’s “user-triggered fetchers” prioritize your experience over robots.txt restrictions because these requests happen on your behalf.

They then went on to describe Cloudflare as ignorant fools.

When companies like Cloudflare mischaracterize user-driven AI assistants as malicious bots, they’re arguing that any automated tool serving users should be suspect—a position that would criminalize email clients and web browsers, or any other service a would-be gatekeeper decided they don’t like.

This controversy reveals that Cloudflare’s systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats. If you can’t tell a helpful digital assistant from a malicious scraper, then you probably shouldn’t be making decisions about what constitutes legitimate web traffic.

Our Sponsors

Geeks talk back