A perennial question as technology improves is to what extent it will change, or replace, work traditionally done by humans. From self-checkout at the grocery store to AI’s ability to detect serious illnesses in medical exams, workers in all fields find themselves working alongside tools that can do part of their job. With the increased availability of AI tools in classrooms accelerated by the pandemic and with no signs of slowing down, teaching has become just another field where professional work is shared with tools like AI.
We wonder about the role of AI in a specific teaching practice: assessing student learning. With the time it takes to grade and give feedback on student work, which discourages many writing teachers from assigning longer writing assignments, and with the long turnaround time most students expect to receive grades and feedback, there is significant time savings and learning potential in an AI help rating. Student work. Then again, we wondered, could an AI scoring and feedback system really help students as much as teachers?
“Teachers have the ability to say, ‘What were you trying to tell me? Because I dont understand’. The AI is trying to fix the writing process and formatting, fix what’s already there, not trying to understand what they meant to say.”
We recently completed an evaluation of an AI-powered platform through which high school students could write, submit, and review argumentative essays in response to preselected writing prompts. Each time students clicked “submit,” they received domain-based dimension-aligned scores (score from 1 to 4) across four writing domains (Claim and focus, support and evidence, organization, language and style) and dimension-aligned feedback offering feedback and suggestions for improvement, all generated by the AI instantly upon student submissions.
To compare AI scores and feedback to those provided by real teachers, we hosted an in-person meeting of 16 middle school writing teachers who had used the platform with their students during the 2021-22 school year. After calibrating the project rubric together to ensure reliable understanding and application of scores and prompts, we assigned each teacher 10 randomized essays (not from their own students) to grade and provide feedback. This yielded a total of 160 teacher-rated essays, which we were able to compare directly with the scores given by IA and feedback on those same essays.
How were the teachers’ scores similar to or different from the scores given by the AI?
On average, we found that teachers rated the essays below the AI, with significant differences on all dimensions except Claim & Focus. In terms of the overall score on all four dimensions (minimum 4, maximum 16), the average teacher score across these 160 trials was 7.6, while the average AI score across the same set of papers was 8. ,8. In terms of particular dimensions, Figure 1 shows on the Affirmation and Focus and Support and Evidence dimensions that teachers and AIs tended to agree on high (4) and low (1) scored essays, but disagreed. in the middle, with the teachers. they are more likely to rate an essay a 2 than the AI to rate a 3. On the other hand, on the Organization and Language and Style dimensions, teachers were much more likely to rate essays a 1 or 2, while the AI scores were distributed from 1 to 4, with many more trials at 3 or even 4.
How were the teachers’ written feedback similar to or different from that provided by the AI?
During our meeting with the 16 teachers, we gave them the opportunity to discuss the scores and feedback they had given on their 10 essays. Before even reflecting on their specific essays, a common observation we heard was that when they were using the program in their own classrooms the year before, they needed to help the majority of their students read and interpret the feedback the AI had given. For example, in many cases, they reported that students read a comment but weren’t sure what it was asking them to do to improve their writing. Therefore, one immediate difference that emerged, according to the teachers, was their ability to put their feedback into developmentally appropriate language that matched the needs and abilities of their students.
“On reflection, we discussed how nice the AI was, even in the comments/feedback. Kids emerging now are used to more direct and honest feedback. It’s not always about ego stroking, it’s about fixing a problem. So You don’t always need two stars for a wish. Sometimes we just have to get right to the point.”
Another difference that emerged was the teachers’ approach to the essay as a whole: the flow, the voice, whether it was just a summary or a constructed argument, whether the evidence fit the argument, or whether it all made sense as a whole. The tendency for teachers to score a 2 in the argument-focused domains of Claim and Focus and Support and Evidence, they reasoned, was due to their ability to see the entire essay, which this AI can’t actually see since many AI are trained. at sentence level instead of full essay guidance.
The harshest evaluation of the Organization by teachers similarly stems from its ability, unlike the AI, to understand the sequence and flow of the entire essay. Teachers shared, for example, that the AI could detect transition words or guide students to use more transition words and would assess the use of transition words as evidence of good organization, while they, as teachers, could see if the transitions flowed or not. simply connected to an incoherent set of sentences. In the language and style domain, teachers again pointed to ways in which the AI was easier to fool, for example by including a series of seemingly sophisticated vocabulary, which would impress the AI but would be seen by the teacher as a series of words not they do not add up to a sentence or idea.
Can AI help teachers with grades?
Assessing student work well is a time-consuming and important component of teaching, especially when students are learning to write. Students need constant practice with quick feedback to become confident and strong writers, but most teachers lack the planning and grading time and teach too many students to assign long or routine writing and to maintain any semblance of consistency. balance between work and life or sustainability in your career.
The promise of AI to ease some of this burden is potentially quite significant. While our initial findings in this study show that teachers and AI approach assessment in slightly different ways, we believe that if AI systems could be trained to view essays more holistically as teachers do and to develop a feedback language in a more developmentally and contextually appropriate context. ways for students to independently process feedback, there is real potential for AI to help teachers with grading. We believe that improving AI in these areas is a worthwhile pursuit, both to reduce the grading load on teachers and, as a result, to ensure students have more frequent opportunities to write along with immediate and helpful feedback for growth. as writers.