Close Menu
Mirror Brief

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Chipmaker TSMC says it has discovered potential trade secret leaks

    August 5, 2025

    ‘Don’t Let the Sun’ Film First Look: Locarno Festival 2025 (Exclusive)

    August 5, 2025

    Caro Editions Copenhagen Spring 2026 Collection

    August 5, 2025
    Facebook X (Twitter) Instagram
    Mirror BriefMirror Brief
    Trending
    • Chipmaker TSMC says it has discovered potential trade secret leaks
    • ‘Don’t Let the Sun’ Film First Look: Locarno Festival 2025 (Exclusive)
    • Caro Editions Copenhagen Spring 2026 Collection
    • Browns sign former Pro Bowl quarterback amid injuries to Kenny Pickett, Shedeur Sanders, Dillon Gabriel
    • Scientists identify bacterium behind devastating wasting disease in starfish | Marine life
    • Six of the best ferry crossings in the UK – from the Isles of Scilly to the Outer Hebrides | United Kingdom holidays
    • Eurostar still faces delays as high-speed train travel resumes in northern France
    • BP makes biggest find in 25 years as it refocuses on fossil fuels
    Tuesday, August 5
    • Home
    • Business
    • Health
    • Lifestyle
    • Politics
    • Science
    • Sports
    • World
    • Travel
    • Technology
    • Entertainment
    Mirror Brief
    Home»Technology»Perplexity is allegedly scraping websites it’s not supposed to, again
    Technology

    Perplexity is allegedly scraping websites it’s not supposed to, again

    By Emma ReynoldsAugust 5, 2025No Comments2 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Perplexity is allegedly scraping websites it's not supposed to, again
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Web crawlers deployed by Perplexity to scrape websites are allegedly skirting restrictions, according to a new report from Cloudflare. Specifically, the report claims that the company’s bots appear to be “stealth crawling” sites by disguising their identity to get around robots.txt files and firewalls.

    Robots.txt is a simple file websites host that lets web crawlers know if they can scrape a websites’ content or not. Perplexity’s official web crawling bots are “PerplexityBot” and “Perplexity-User.” In Cloudflare’s tests, Perplexity was still able to display the content of a new, unindexed website, even when those specific bots were blocked by robots.txt. The behavior extended to websites with specific Web Application Firewall (WAF) rules that restricted web crawlers, as well.

    A flowchart created by Cloudflare to illustrate the different ways Perplexity's web crawlers try to access the content of a website.

    Cloudflare

    Cloudflare believes that Perplexity is getting around those obstacles by using “a generic browser intended to impersonate Google Chrome on macOS” when robots.txt prohibits its normal bots. In Cloudlfare’s tests, the company’s undeclared crawler could also rotate through IP addresses not listed in Perplexity’s official IP range to get through firewalls. Cloudflare says that Perplexity appears to be doing the same thing with autonomous system numbers (ASNs) — an identifier for IP addresses operated by the same business — writing that it spotted the crawler switching ASNs “across tens of thousands of domains and millions of requests per day.”

    Engadget has reached out to Perplexity for comment on Cloudflare’s report. We’ll update this article if we hear back.

    Up-to-date information from websites is vital to companies training AI models, especially as service’s like Perplexity are used as replacements for search engines. Perplexity has also been caught in the past circumventing the rules to stay up-to-date. Multiple websites reported in 2024 that Perplexity was still accessing their content despite them forbidding it in robots.txt — something the company blamed on the third-party web crawlers it was using at the time. Perplexity later partnered with multiple publishers to share revenue earned from ads displayed alongside their content, seemingly as a make-good for its past behavior.

    Stopping companies from scraping content from the web will likely remain a game of whack-a-mole. In the meantime, Cloudflare has removed Perplexity’s bots from its list of verified bots and implemented a way to identify and block Perplexity’s stealth crawler from accessing its customers’ content.

    allegedly Perplexity scraping supposed websites
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleSpot the human! Bodies embedded in nature – in pictures | Art and design
    Next Article BP launches fresh costs review despite beating profit forecasts – business live | Business
    Emma Reynolds
    • Website

    Emma Reynolds is a senior journalist at Mirror Brief, covering world affairs, politics, and cultural trends for over eight years. She is passionate about unbiased reporting and delivering in-depth stories that matter.

    Related Posts

    Technology

    Chipmaker TSMC says it has discovered potential trade secret leaks

    August 5, 2025
    Technology

    Social media battles and barbs on both sides of Atlantic over UK Online Safety Act | Internet safety

    August 5, 2025
    Technology

    Jeh Aerospace nets $11M to scale the commercial aircraft supply chain in India

    August 5, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Medium Rectangle Ad
    Top Posts

    Revealed: Yorkshire Water boss was paid extra £1.3m via offshore parent firm | Water industry

    August 3, 202513 Views

    Eric Trump opens door to political dynasty

    June 27, 20257 Views

    How has Ryanair changed its cabin baggage rule – and will other airlines do it too? | Ryanair

    July 5, 20256 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    Technology

    Meta Wins Blockbuster AI Copyright Case—but There’s a Catch

    Emma ReynoldsJune 25, 2025
    Business

    No phone signal on your train? There may be a fix

    Emma ReynoldsJune 25, 2025
    World

    US sanctions Mexican banks, alleging connections to cartel money laundering | Crime News

    Emma ReynoldsJune 25, 2025

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Medium Rectangle Ad
    Most Popular

    Revealed: Yorkshire Water boss was paid extra £1.3m via offshore parent firm | Water industry

    August 3, 202513 Views

    Eric Trump opens door to political dynasty

    June 27, 20257 Views

    How has Ryanair changed its cabin baggage rule – and will other airlines do it too? | Ryanair

    July 5, 20256 Views
    Our Picks

    Chipmaker TSMC says it has discovered potential trade secret leaks

    August 5, 2025

    ‘Don’t Let the Sun’ Film First Look: Locarno Festival 2025 (Exclusive)

    August 5, 2025

    Caro Editions Copenhagen Spring 2026 Collection

    August 5, 2025
    Recent Posts
    • Chipmaker TSMC says it has discovered potential trade secret leaks
    • ‘Don’t Let the Sun’ Film First Look: Locarno Festival 2025 (Exclusive)
    • Caro Editions Copenhagen Spring 2026 Collection
    • Browns sign former Pro Bowl quarterback amid injuries to Kenny Pickett, Shedeur Sanders, Dillon Gabriel
    • Scientists identify bacterium behind devastating wasting disease in starfish | Marine life
    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Disclaimer
    • Get In Touch
    • Privacy Policy
    • Terms and Conditions
    © 2025 Mirror Brief. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.