Recently, researchers at the University of Pennsylvania studied the impact of tutors using GPT-4 on nearly a thousand high school students. The results suggest that GPT-4 tutors may need additional assistance to help students.
Students who had access to an ai tutor for practice exams performed better than students who did not have access to an ai tutor on these practice exams. However, on a subsequent exam, when none of the students had access to an ai tutor, the students who worked with an ai tutor performed worse than the other students.
“Generative ai can harm learning”the article summarizing these findings, was recently published by the Wharton School of the University of Pennsylvania.
“The one-sentence summary of the paper is: ‘We found that generative ai could harm learning because students potentially use it as an answer machine, rather than a tool that enhances learning,’” says Alp Sungu, one of the paper’s co-authors and a professor at the Wharton School.
However, Sungu and his co-authors emphasize that they are not anti-ai tutors and believe that ai tutors can ultimately be useful in certain contexts if designed correctly.
<h2 id="ai-gpt-tutors-how-the-study-was-designed-3″>GPT ai Tutors: How the Study Was Designed
To study the impact of ai tutors on math students, Sungu and his colleagues used separate prompts to create two different GPT-4-powered tutors. They called one “GPT Base,” which worked similarly to standard versions of ChatGPT in that if it was presented with a math problem, it would reveal the answers to the question while helping students. A second GPT-4 tutor was created using more advanced prompts that told the ai not to reveal the answer while working with students and instead help them find the answer on its own. The researchers called it “GPT Tutor.”
Researchers worked with nearly 1,000 students in Turkey who were in ninth, tenth, and eleventh grades in the 2023–24 school year. The preregistered randomized control trial placed students into three groups: a group with no ai tutor, a group that used the GPT Base tutor, and a group that used the more advanced GPT Tutor. After a lesson taught by a teacher, all students in the study took a practice test. Those with access to GPT Base scored 48% higher than students without access to an ai tutor, while those with GPT Tutor scored 127% higher on the practice test.
However, in the test itself, students using GPT performed 17% worse, and the group using GPT Tutor performed, on average, the same as the control group. On the plus side, GPT Tutor seemed to mitigate the negative impact of an ai tutor on students, although it did not help them either.
“I thought GPT Tutor would be better than the control group,” says Sungu. “It wasn’t.” However, with better support and improvements to the tutor in the future, he believes it could ultimately help students’ learning.
What lessons should teachers learn?
The study highlights some of the differences between the use of ai in educational and professional settings.
Programmers and others who use ai professionally tend to achieve more with its help. “If you give them a task, they will do it more efficiently, more effectively, they will become more productive,” says Sungu. “There is already a lot of literature that shows this.”
However, in education, teachers are not only interested in the immediate result, but also in what the student has actually learned. For example, students may write better quality papers if they receive more GPT stimulation, but that does not mean that they will learn more, or anything like that, about writing.
That’s why Sungu believes that for an ai tutor to be more effective, it will need to focus on learning rather than productivity. That requires more research specifically into ai in education. Otherwise, Sungu believes that while ai may advance further, it won’t change the inherent challenges surrounding ai and learning. “The technology is already good at providing answers,” Sungu says. “We need to think about the design of assessment and educational delivery.”
Some might argue that since ai is available to everyone, assessing student performance without the use of ai might no longer be important. Sungu understands that line of reasoning, and points out that there are certain technologies that make some skills obsolete, such as the calculator. “I don’t really care if you can’t multiply 7 and 8 by heart, because we rely on calculators,” he says.
On the other hand, there are other skills, such as critical thinking and problem solving, that are vital for students to develop. without “The example we give in the article is that the Federal Aviation Agency prohibits young pilots from relying entirely on autopilot,” says Sungu. “When the autopilot is inactive, we still need people to think for themselves.”