The founders of TollBit, a six-month-old New York startup, believe we live in the “Napster era” of ai. Just as people of a certain generation downloaded digital music, companies are stealing vast amounts of the internet without paying the rights holders. They want TollBit to be the iTunes of the ai world.
“We’re kind of living in the Wild West,” Olivia Joslin, the company’s co-founder and COO, told Engadget in an interview. “We want to make it easier for ai companies to pay for the data they need.” Her idea is simple: create a marketplace that connects ai companies that need access to fresh, high-quality data with publishers who actually spend money to create it.
In fact, ai companies have only recently started paying for (some of) the data they need from news publishers. OpenAI started an arms race in late 2022, but just a year ago the company signed the first of its many licensing deals with the Associated Press. Later that year, OpenAI announced a partnership with German publisher Axel Springer, which operates Business information and Political in the US. Several publishers, including Voicehe Financial timeNews Corp and TIMEThey have since signed agreements with OpenAI and Google.
But that leaves many other publishers and creators out of the picture, without the option to make this Faustian pact even if they wanted to. This is the “long tail” of publishers that TollBit wants to target.
“Powerful ai models already exist and have already been trained,” Toshit Panigrahi, co-founder and CEO of TollBit, told Engadget. “And right now, there are thousands of apps that are just pulling these existing models off the shelves. What they need is new content. But right now, there is no infrastructure, either for them to buy it or for content creators to sell it in a seamless way.”
Neither Joslin nor Panigrahi had special knowledge of the media industry, but they both knew how online marketplaces and platforms worked: They were colleagues at Toast, a platform that lets restaurants manage billing and reservations. Panigrahi watched as deals (and lawsuits) piled up in the ai sector, then called Joslin.
Their first conversations were about RAG, which stands for Retrieval Augmented Generation in the ai world. With RAG, ai models first look for information from specific databases (like what parts of the internet can be mined) and use that information to synthesize an answer rather than just relying on training data. Services like ChatGPT don’t know current home prices or the latest news. Instead, they get that data, usually by consulting websites. That lack of up-to-date data is why ai chatbots are often used to analyze data.technology/2024/07/22/ai-chatbots-breaking-news/” rel=”nofollow noopener” target=”_blank” data-ylk=”slk:stumped;cpos:1;pos:1;elm:context_link;itc:0;sec:content-canvas” class=”link “> Perplexed for breaking news queries – if they don’t get the latest data, they simply can’t keep up.
“We thought that using content for RAG was fundamentally different than using it for training,” Panigrahi said.
By some estimates, RAG is the future of search engines. More and more people are asking questions on the internet and expecting comprehensive answers in return, rather than a list of blue links. In just over a year, startups like Perplexity, backed by Jess Bezos and NVIDIA, among others, have burst onto the scene with ambitions to compete with Google. Even OpenAI has plans to one day let ChatGPT become its search engine. In response, Google has sprung into action: it now selects relevant information from search results and presents it as a coherent answer at the top of the results page, a feature it calls ai Overviews (it doesn’t always work well, but it seems to be here to stay).
The rise of RAG-based search engines has publishers shuddering. After all, who would make money if ai read the internet for us? After Google released ai Overviews earlier this year, at least one ai-search-threatens-publishers-with-2b-annual-ad-revenue-loss/” rel=”nofollow noopener” target=”_blank” data-ylk=”slk:report;cpos:6;pos:1;elm:context_link;itc:0;sec:content-canvas” class=”link “>report Publishers are estimated to lose more than $2 billion in advertising revenue because fewer people would have a reason to visit their websites. “ai companies also need continuous access to high-quality content and data,” Joslin said, “but if there is no economic model for this, no one will have an incentive to create content and that will be the end of ai applications as well.”
Rather than paying one-time checks, TollBit’s model aims to compensate publishers on an ongoing basis. Hypothetically, if someone’s content was used in a thousand ai-generated responses, they would be paid a thousand times over at a price they would set themselves and could change on the fly.
Each time an ai company accesses a publisher’s new data through TollBit, it can pay a small fee set by the publisher that Panigrahi and Joslin say should be roughly equivalent to what the publisher would have received for a traditional pageview. And the platform can also block ai companies that haven’t signed up from accessing publishers’ data.
So far, the founders say they have onboarded 100 publishers and are in pilot projects with three ai companies since TollBit launched in February. They declined to disclose which publishers or ai companies had joined so far, citing confidentiality clauses, but did not deny having spoken with OpenAI, Anthropic, Google and Meta. So far, they say there has been no exchange of money between ai companies and publishers on their platform.
Until that happens, its model remains a big hypothesis, though so far investors have invested $7 million. TollBit's investors include Sunflower Capital, Lerer Hippeau, Operator Collective, AIX and Liquid 2 Ventures, with more investors currently “knocking on its door,” Joslin said. In April, TollBit alsoai-startup-tollbit” rel=”nofollow noopener” target=”_blank” data-ylk=”slk:brought on;cpos:7;pos:1;elm:context_link;itc:0;sec:content-canvas” class=”link “> Provoked Campbell Brown, serving as senior adviser, is a former television anchor who previously served as Meta's head of news partnerships for the better part of a decade.
Despite some high-profile lawsuits, ai companies are… still The founders claim that they are scraping data from the internet for free, and that they are largely getting away with it. Why would they have any incentive to pay publishers for this data? There are three big reasons: Since generative ai became mainstream, more and more websites are taking steps to prevent their content from being scraped, meaning scraping data from the web is common practice. increasingly difficult and more expensive; no one wants to deal with ongoing copyright lawsuits; and crucially, being able to easily pay for content as needed allows ai companies to tap into smaller, niche publications because it’s not possible to strike individual licensing deals with each website. Joslin also noted that several TollBit investors have also invested in ai companies that are concerned they could face litigation for using content without permission.
Getting ai companies to pay for content could provide a recurring revenue stream not just for big publishers, but potentially for anyone who publishes anything online. Last month, Perplexity, which was accused of illegally scraping content from Forbes, With cable and Conde Nast — launched a Publisher Program under which it plans to share a portion of the revenue it earns with publishers if it uses their content to generate ai-powered responses. However, the success of the program depends on how much money Perplexity makes when it introduces ads to the app later this year. Like Tollbit, it’s another hypothesis entirely.
“Our thesis with TollBit is that if you lose a pageview today, you should be compensated for it immediately rather than a few years down the road when a tech company figures out their ad program,” Panigrahi said of Perplexity’s initiative.
Despite all the licensing deals and technical advances out there, ai-powered chatbots are still terrible sources of news. They still make up facts and confidently create full links to stories that don’t actually exist. But tech companies are now shoving ai-powered chatbots into every gap they can, meaning that many people will still be getting their news from one of these products in the not-too-distant future.
A more cynical interpretation of TollBit’s premise is that the startup is offering hush money to publishers whose work is most likely to be turned into disinformation. Its founders, naturally, disagree with that characterization. “We are careful about the ai partners we bring on board,” Panigrahi said. “These companies are very conscious of the quality of the input material and the accuracy of the responses. We are seeing that paying for content, even nominal amounts, creates an incentive to respect raw input in their systems rather than treating it as a free, replaceable commodity.”