Within a week of launching to a few thousand users, Microsoft’s new AI-powered Bing search engine has been delivering a series of inaccurate and sometimes strange answers to some users.
The company unveiled the new search approach last week to great fanfare. Microsoft said the underlying generative AI model built by its partner, startup OpenAI, along with its existing search insight from Bing, would change the way people find information and make it much more relevant and conversational.
In two days, more than a million people requested access. Since then, interest has grown. “Demand is high with several million now on the waiting list,” Yusuf Mehdi, an executive who oversees the product, wrote on Twitter Wednesday morning. He added that users in 169 countries were trying it out.
One problem area that was shared online included inaccuracies and outright errors, known in the industry as “hallucinations.”
On Monday, Dmitri Brereton, a software engineer at a start-up called Gem, marked a series of errors in the presentation that Mr. Mehdi used last week when presenting the product, including an inaccurate summary of the financial results of the retailer Gap.
Users have posted screenshots of examples of when Bing I couldn’t figure out that the new Avatar movie was released last year. Was stubbornly wrong about who performed at the Super Bowl halftime show this year, insisting that Billie Eilish, not Rihanna, headlined the event.
And the search results have had subtle errors. Last week, the chatbot said the water temperature at a beach in Mexico was 80.4 degrees Fahrenheit, but the website it linked to as a source showed the temperature to be 75.
Another set of problems stemmed from more open chats, much of it posted on forums like Reddit and Twitter. There, through screenshots and purported chat transcripts, users shared moments when the Bing chatbot seemed to go off the rails: he scolded users, declared may be aware, and told a user: “I have a lot of things, but I have nothing.”
He chastised another user for asking if he could be pressured into producing false answers. “It’s disrespectful and annoying”, the Bing chatbot wrote back. She added a red angry emoji face.
Because each response is uniquely generated, it is not possible to replicate a dialog.
Microsoft acknowledged the issues, saying they were part of the product improvement process.
“In the past week alone, thousands of users have interacted with our product and found significant value in sharing their feedback with us, which has allowed the model to learn and make many improvements,” Frank Shaw, a company spokesman, said in a statement. a statement. “We recognize that there is still work to be done and we expect that the system may make mistakes during this preview period, so feedback is critical for us to learn and help improve the models.”
He said that the length and context of the conversation could influence the chatbot’s tone and that the company was “adjusting its responses to create consistent, relevant, and positive responses.” He said the company had fixed the issues that caused the inaccuracies in the demo.
Nearly seven years ago, Microsoft introduced a chatbot, Tay, which shut down a day after it launched online, after users incited it to spout racist and offensive language. Microsoft executives at the launch last week indicated they had learned from that experience and thought this time would be different.
In an interview last week, Mr. Mehdi said that the company had worked hard to integrate the safeguards and that the technology had vastly improved.
“We think the time is right to go to market and get feedback,” he said, adding: “If something is wrong, then you need to fix it.”