When Erik Duhaime PhD ’19 was working on his thesis at MIT’s Center for Collective Intelligence, he noticed that his wife, then a medical student, spent hours studying on apps that offered flashcards and quizzes. His research had shown that, as a group, medical students could classify skin lesions more accurately than professional dermatologists; The trick was to continually measure each student’s performance on cases with known answers, discard the opinions of people who were bad at the task, and intelligently gather the opinions of people who were good.
Combining his wife’s study habits with his research, Duhaime founded Centaur Labs, a company that created a mobile app called DiagnosUs to collect opinions from medical experts on real-world scientific and biomedical data. Through the app, users review anything from images of potentially cancerous skin lesions or audio clips of heart and lung sounds that could indicate a problem. If users are accurate, Centaur uses their feedback and awards them small cash prizes. Those opinions, in turn, help medical ai companies train and improve their algorithms.
The approach combines the desire of medical experts to hone their skills with the desperate need for well-labeled medical data by companies using ai for biotechnology, pharmaceutical product development or medical device commercialization.
“I realized that my wife’s studies could be productive work for ai developers,” Duhaime recalls. “Today we have tens of thousands of people using our app, and about half are medical students who are impressed to earn money in the process of studying. “So, we have this gamified platform where people compete against each other to train data and earn money if they are good and improve their skills at the same time, and in doing so, label data for teams that build life-saving ai.”
Gamification of medical labeling
Duhaime completed his PhD with Thomas Malone, the Patrick J. McGovern Professor of Management and founding director of the Center for Collective Intelligence.
“What interested me was the wisdom of the phenomenon of crowds,” Duhaime says. “Ask a group of people how many jelly beans are in a jar and the average of everyone’s answers will be pretty close. I was interested in knowing how that problem is solved in a task that requires skill or experience. Obviously, you don’t want to ask a bunch of random people if they have cancer, but at the same time, we know that second opinions in healthcare can be extremely valuable. “You can think of our platform as a powerful way to get a second opinion.”
Duhaime began exploring ways to harness collective intelligence to improve medical diagnoses. In one experiment, he trained groups of laypeople and medical students whom he describes as “semi-experts” to classify skin conditions, and found that by combining the opinions of top performers he could outperform professional dermatologists. He also discovered that by combining algorithms trained to detect skin cancer with expert opinions, he could outperform either method alone.
“The central idea was that you do two things,” Duhaime explains. “The first thing is to measure people’s performance, which seems obvious, but not much is done even in the medical field. If you ask a dermatologist if he is any good, he will tell you, “Yes, of course, I am a dermatologist.” They don’t necessarily know how good they are at specific tasks. The second thing is that when you get multiple opinions, you need to identify the complementarities between different people. You need to recognize that the experience is multidimensional, so it’s more like putting together the optimal trivia team than putting together five people who are the best at the same thing. For example, one dermatologist might be better at identifying melanoma, while another might be better at classifying the severity of psoriasis.”
While still pursuing his PhD, Duhaime founded Centaur and began using MIT’s entrepreneurial ecosystem to further develop the idea. He received funding from the MIT Sandbox Innovation Fund in 2017 and participated in the delta v startup accelerator led by the Martin Trust Center for MIT Entrepreneurship during the summer of 2018. The experience helped him enter the prestigious Y Combinator accelerator later that year .
The DiagnosUs app, which Duhaime developed with Centaur co-founders Zach Rausnitz and Tom Gellatly, is designed to help users test and improve their skills. Duhaime says about half of the users are medical students and the other half are mostly doctors, nurses and other medical professionals.
“It’s better than studying for exams, where you may have multiple choice questions,” Duhaime says. “They can see real cases and practices.”
Centaur collects millions of opinions from tens of thousands of people around the world every week. Duhaime says most people make money from coffee, although the person who earns the most from the platform is a doctor from Eastern Europe who makes around $10,000.
“People can do it on the couch or on the T,” Duhaime says. “It doesn’t feel like work, it’s fun.”
The approach stands in stark contrast to traditional data labeling and ai content moderation, which are typically outsourced to low-resource countries.
Centaur’s approach also produces accurate results. in a paper With researchers from Brigham and Women’s Hospital, Massachusetts General Hospital (MGH), and Eindhoven University of technology, Centaur showed that their collective opinions labeled lung ultrasounds as reliably as the experts did. Another study with researchers at Memorial Sloan Kettering showed that crowdsourced labeling of dermoscopic images was more accurate than that of highly experienced dermatologists. Beyond images, Centaur’s platform also works with video, audio, text from sources such as research articles or anonymous conversations between doctors and patients, and waves from electroencephalograms (EEG) and electrocardiography (ECG).
Find the experts
Centaur has discovered that the best artists come from surprising places. In 2021, to gather expert opinions on EEG patterns, researchers held a contest through the DiagnosUs app at a conference involving about 50 epileptologists, each with more than 10 years of experience. Organizers made a custom t-shirt to give to the contest winner, who they assumed would be attending the conference.
But when the results came in, a pair of Ghanaian medical students, Jeffery Danquah and Andrews Gyabaah, had outperformed everyone present. The highest-ranked conference attendee placed ninth.
“I started doing it for money, but I realized that it actually started to help me a lot,” Gyabaah told the Centaur team later. “There were times in clinic when I realized I was doing better than others because of what I learned on the DiagnosUs app.”
As ai continues to change the nature of work, Duhaime believes Centaur Labs will be used as an ongoing check on ai models.
“Right now, we’re primarily helping people train algorithms, but I think more and more we’ll be used to monitor algorithms and in conjunction with algorithms, basically serving as humans in the loop for a variety of tasks,” Duhaime says. “You could think of us less as a way to train ai and more as part of the full life cycle, where we provide feedback on the results of the models or monitor the model.”
Duhaime sees the work of humans and ai algorithms becoming more integrated and believes Centaur Labs has an important role to play in that future.
“It’s not just about training algorithms, implementing algorithms,” Duhaime says. “Instead, there will be digital assembly lines across the economy, and expert human judgment will be needed on demand, infused in different places along the value chain.”