As a professor at the Princeton School of Public and International Affairs, where I teach econometrics and research methods, I spend a lot of time thinking about the intersection of data, education, and social justice, and how generative ai will reshape the meeting experience, analyzing and use data for change.
My students are working toward a master’s degree in public affairs and many of them are interested in pursuing careers in national and international public policy. The graduate-level econometrics course I teach is required and is designed to foster analytical and critical thinking skills in causal research methods. Throughout the course, students are tasked with writing four memoranda on designated policy issues. Typically, we examine publicly available data sets related to societal concerns, such as determining optimal criteria for loan forgiveness or evaluating the effectiveness of police stop-and-frisk policies.
To better understand how my students can use generative ai effectively and prepare to apply these tools in the data-related work they will encounter in their careers after graduate school, I knew I had to try it myself. So I set up an experiment to take one of the tasks I asked my students and complete it using generative ai.
My objective was twofold. I wanted to experience what it feels like to use the tools my students have access to. And, since I assume many of my students are now using ai for these assignments, I wanted to develop a more evidence-based stance on whether or not I should change my grading practices.
I pride myself on assigning practical but intellectually challenging tasks, and to be honest, I didn’t have much faith that any ai tool could consistently perform statistical analysis and make the connections necessary to provide relevant policy recommendations based on its results.
Experiments with code interpreter
For my experiment, I replicated an assignment from last semester that asked students to imagine how they would create a grant program for healthcare providers to provide perinatal services (before and after childbirth) to women to promote infant health and mitigate low Birth weight. Students were provided with a publicly available data set and asked to develop eligibility criteria by building a statistical model to predict low birth weight. They needed to support their selections with references from existing literature, interpret the results, provide relevant policy recommendations and produce a positioning statement.
As for the tool, I decided to try ChatGPT’s new Code Interpreter, a tool developed to allow users to upload data (in any format) and use conversational language to execute code. I provided the same guidelines I gave my students in ChatGPT and loaded the dataset into Code Interpreter.
First Code Interpreter broke down each task. He then asked me if I would like to continue the analysis after choosing variables (or criteria for the perinatal program) for the statistical model. (See task analysis and variables below.)
After running the statistics, analyzing and interpreting the data, Code Interpreter created a memo with four policy recommendations. While the recommendations were strong, the tool did not provide any reference to prior literature or direct connection to the results. You also failed to create a positioning statement. That part depended on students reflecting on their own background and experiences to consider any bias they might bring, something the tool couldn’t do.
Another flaw was that each part of the task was presented in separate parts, so I found myself repeatedly returning to the tool to ask for missed items or clarity in results. It quickly became apparent that it was easier to manually join the disparate elements together.
Without any human touch, the memo would not have received a passing grade because it was too high-level and did not provide a properly cited literature review. However, putting all the pieces together, the quality of the work could have deserved a solid B.
While Code Interpreter was not able to independently produce a passing grade, it is imperative to recognize the tool’s current capabilities. He skillfully performed statistical analyzes using conversational language and demonstrated the type of critical thinking skills I hope to see in my students when offering actionable policy recommendations. As the field of generative ai continues to advance, it is simply a matter of time before these tools consistently deliver “A-caliber” work.
How I am using the lessons learned
My students have generative ai tools like the one I experienced, so I will assume they are using them for my course assignments. In light of this impending reality, it is important for educators to adapt their teaching methods to incorporate the use of these tools in the learning process. Especially since it is difficult, if not impossible, given the technology/2023/06/02/turnitin-ai-cheating-detector-accuracy/” target=”_blank” rel=”noopener nofollow”>current limitations of ai detectors, to distinguish ai-produced content and human-produced content. That’s why I’m committed to incorporating the exploration of generative ai tools into my courses, while maintaining my emphasis on critical thinking and problem-solving skills, which I believe will continue to be key to thriving in the workforce.
In considering how to incorporate these tools into my curriculum, two paths have emerged. I can help students use ai to generate initial content, teaching them how to review and improve it with human input. This can be especially beneficial when students encounter writer’s block, but it can inadvertently stifle creativity. Instead, I can help students create their original work and leverage ai to improve it later.
While I am more drawn to the second approach, I recognize that both require students to develop essential skills in writing, critical thinking, and computational thinking to collaborate effectively with computers, which are critical to the future of education and the workforce.
As an educator, I have a duty to stay informed about the latest advances in generative ai, not only to ensure that learning occurs, but also to stay aware of the tools that exist, the benefits and limitations they have, and most importantly , how students function. I could be using them.
However, it is also important to recognize that the quality of work produced by students now requires higher expectations and possible adjustments to grading practices. The baseline is no longer zero, it is the ai. And the upper limit of what humans can achieve with these new capabilities remains an unknown frontier.