amazon Web Services has launched an investigation to determine whether Perplexity ai is violating its rules, according to cabling. To be precise, the company’s cloud division is reportedly investigating allegations that the service is using a crawler, hosted on its servers, that ignores the Robots Exclusion Protocol. This protocol is a web standard, in which developers place a robots.txt file on a domain that contains instructions on whether or not bots are allowed to access a particular page. Complying with those instructions is voluntary, but crawlers from reputable companies have generally respected them since web developers began implementing the standard in the 1990s.
In a previous article, With cable reported that discovered a virtual machine that was bypassing its website's robots.txt instructions. That machine was hosted on an amazon Web Services server using the IP address 44.221.181.252 that is “likely operated by Perplexity.” It also reportedly visited other Condé Nast properties hundreds of times over the past three months to scrape their content. He guardian, Forbes and The New York Times I had also detected it by visiting his posts several times, With cable saying. To confirm if Perplexity was actually deleting your content, cabling You entered headlines or short descriptions of your articles into the company's chatbot. The tool responded with results that faithfully paraphrased their articles “with minimal attribution.”
A recent Reuters The report claimed that Perplexity is not the only ai company that is bypassing robots.txt files to collect content used to train large language models. However, it appears that cabling it only provided amazon with information about the Perplexity ai tracker. “AWS terms of service prohibit abusive and illegal activities, and our customers are responsible for complying with those terms,” amazon Web Services told us in a statement. “We routinely receive reports of alleged abuse from a variety of sources and engage our clients to understand those reports.” The spokesperson also added that the company's cloud division said cabling He was investigating the information provided by the publication, as he does with all reports of possible violations.
Perplexity spokeswoman Sara Platnick said With cable that the company has already responded to amazon's queries and denied that its crawlers are circumventing the Robots Exclusion Protocol. “Our PerplexityBot, running on AWS, respects robots.txt and we confirm that services controlled by Perplexity are not crawled in any way that violates the AWS Terms of Service,” he said. Platnick told us that amazon investigated From Wired The media investigation is only part of a standard protocol for investigating reports of abuse of its resources. The company has apparently not heard from amazon about any type of investigation before. With cable He contacted the company. Platnick admitted cablingHowever, PerplexityBot will ignore robots.text when a user includes a specific URL in their chatbot query.
Aravind Srinivas, CEO of Perplexity, also previously denied that his company is “ignoring the Robot Exclusion Protocol and then lying about it.” Srinivas admitted ai-ceo-aravind-srinivas-on-plagiarism-accusations” rel=”nofollow noopener” target=”_blank” data-ylk=”slk:Fast Company;cpos:5;pos:1;elm:context_link;itc:0;sec:content-canvas” class=”link “>fast company that Perplexity uses third-party web crawlers in addition to its own, and that the bot cabling identified was one of them.
Update, June 28, 2024, 2:20 p.m. ET: We've updated this post to add Perplexity's statement to Engadget.
Update, June 28, 2024, 8:27 p.m. ET: We have updated this post with a statement from amazon Web Services.