Last week some results were released from a 6-week study using AI tutors in Nigeria. Below I summarize the results of that and four other recent studies about AI tutoring (the dates reflect when the study was conducted rather than when papers were published):
Summer 2024 — 15–16-year olds in Nigeria
They had 800 students total. The treatment group studied with GPT-based Microsoft Copilot twice weekly for six weeks, studying English. They were just provided an initial prompt to start chatting—teachers had a minimal “orchestra conductor” role—but they achieved “the equivalent of two years of typical learning in just six weeks.”Spring 2024 — K-12 Title I schools in the South
They had 1,800 K-12 students in a low-income school district and gave human tutors to both the treatment and control group, though in the treatment group the tutors had access to the “Tutor CoPilot” button designed by the researchers to provide hints, similar problems, worked examples, etc. In only 29% of treatment sessions did the tutor use the button. An “exit ticket” problem was solved by 66% of treatment versus 62% of control students.Spring 2024 — 16–18-year-olds in Italy
They split 76 students (85% girls) from an Italian technical institute (a high school not aimed at university) into two groups for their ESL class: the treatment group had their weekly homework assignments supported by an interactive tutoring session using GPT-4. I don’t see the raw scores printed but the effect sizes reported as Cohen’s d are small and not significant.Fall 2023 — Harvard undergrads
They split 200 intro-physics students into two groups: the first half attended 75-minute classes involving group work with instructor feedback while the second group studied at home using an AI tutor. Then the next week they swapped methods.
The AI tutor was based on GPT-4 with a system prompt instructing it to only give incremental hints and prompts for how to handle each question, the writing of which took “several months.”
Learning gains were measured by subtracting pre-lesson quiz scores from post-lesson scores. The AI groups went from 2.75 to 4.5 out of 5, which was twice as much improvement as the control group. 83% of students rated the AI tutor's explanations as good as or better than human instructors.February–August 2023 — 8–14-year-olds in Ghana
An educational network called Rising Academies tested their WhatsApp-based AI math tutor called Rori with 637 students in Ghana. Students in the treatment group received AI tutors during study hall. After eight months, 25% of the subjects attrited from inconsistent school attendance. Of the remainder, the treatment group increased their scores on a 35-question assessment by 5.13 points versus 2.12 points for the control group. This difference was “approximately equivalent to an extra year of learning” for the treatment group.
The two African studies both show large effects using an “equivalent years of schooling” metric that seems to be based on this World Bank report which estimates that in low- and middle-income countries, each school year results in students increasing their literacy ability by 0.15 to 0.21 standard deviations. By this metric they find that the median structured-pedagogy intervention increases learning by 0.6 to 0.9 equivalent years of schooling.
Replications of Bloom’s “2-Sigma Effect” only find, on average, a “0.5-Sigma Effect” (e.g. from the 50th to 70th percentile), but tutoring is still the best known instructional intervention. Even basic prompt engineering creates a useful AI tutor even without using question banks, more scaffolding, and long-term performance data. At this point it seems inevitable that we’re going to see huge advances in student learning due to AI.