The Promises and Pitfalls of AI in Education

Subscribe to the weekly posts via email here

The Thousand-Year Lecture

Walk into almost any university lecture hall today, and you will witness a pedagogical model that has remained essentially unchanged since the University of Bologna formalized it nearly a thousand years ago. A scholar stands at the front of the room, students sit in rows, and knowledge flows in a single direction. While classroom technology has evolved, the underlying structure of instruction is stubbornly stagnant. For all the breathtaking progress we have made in medicine, engineering, and communication, the way we teach still relies on gathering large groups of students into a single room, delivering content at a uniform pace, and hoping that most of them absorb most of it. However, the resilience of this method is not due to pedagogical excellence per se, but rather due to economic necessity (Goffe et. al., 2014). We teach in groups because, until now, we have never been able to afford a better alternative.

In 1984, the educational psychologist Benjamin Bloom published one of the most striking findings in education research. He demonstrated that students who received one-on-one tutoring performed two standard deviations better than students in a traditional classroom (Bloom, 1984). To put that in concrete terms, the average tutored student outperformed 98 percent of students in a conventional lecture. Bloom called this the “2 Sigma Problem,” not because the finding was in doubt, but because the challenge was so daunting. How could we possibly deliver the benefits of individual tutoring to every student when we clearly could not afford a personal tutor for each one? For forty years, that question hung in the air, unanswered. The promise of one-one- instruction was evident, but no one could find a road that was economically passable.

Fifty Cents an Hour

Artificial intelligence has not just reopened Bloom’s question, in many ways it has answered it. Not perfectly, and not without risk, but with a clarity that should capture every educator’s attention. For the first time in history, we can deliver adaptive, one-on-one tutoring at a marginal monetary cost that rounds to zero.

I say this not as an outside observer but as someone who has built and deployed such a system. Over the past year, my team and I developed an AI tutor called Sidekick for an MBA course at the Ivey Business School, with the broader vision of scaling it into a standardized tool for other instructors at Western University to adopt. Sidekick is not a chatbot that hands students answers. It follows a Socratic, question-first pedagogy, pushing learners to reason through problems rather than passively consume solutions. Every factual claim it makes is grounded exclusively in instructor-provided course materials, and it is designed to ask better questions rather than deliver easier answers.

The economics of this AI tutor are startling. Based on our internal testing, Sidekick delivers real-time, adaptive tutoring for roughly fifty cents per hour of active use. Compare that to the cost of a human tutor and the implications are staggering. For roughly the price of a single in-person tutoring session, an AI tutor can operate effectively continuously for an entire semester.

But the cost story, impressive as it is, is not the most compelling part. What matters is what students actually did with the tool. In its first deployment, eighty percent of MBA students adopted Sidekick voluntarily with no mandate or grade incentive. The average session lasted 20.3 minutes, a duration that suggests substantive engagement on short but complex learning objectives. We also found approximately half of all Sidekick usage occurred during the hours when traditional academic support was not available (between 6:00 PM and 2:00 AM).

And the pedagogical returns were real. Our preliminary data from Sidekick showed that each tutoring session was associated with an average increase of 0.55 points on the final exam, and every hundred minutes of cumulative usage corresponded to a 2.77-point increase. These are not revolutionary numbers in isolation, but they suggest a dose-response relationship: the more students engaged with the tutor, the more they were assessed to have learned. These results align with recent research that demonstrates the magnitude of what is possible. One randomized controlled trial published in Nature found that a well-designed AI tutor produced significantly greater learning gains than those of an in-class active learning group (Kestin et. al., 2025). In a separate experiment run by Google, supervised AI models have achieved a 95.4 percent success rate in resolving student misconceptions, effectively matching the 94.9 percent rate of expert human tutors (LearnLM Team Google, 2025).

The Illusion of Competence

If the story ended there, this would be a triumphant essay about technology advancing education. Alas, it does not. Because for every student who uses AI as a Socratic sparring partner, there is another who uses it as an intellectual crutch, and the difference between those two modes of engagement is the difference between deep learning and a dangerous illusion.

The risk has a name: vaporized learning (Syed, 2025; Delikoura et. al. 2025). When a student asks an AI to generate a polished answer and then submits it as their own work, they have not learned anything. They have performed a transaction. The cognitive effort required to actually encode knowledge via their struggle and confusion has been entirely bypassed. Research confirms the cost. In one MIT study, researchers found that students who had previously relied on AI for writing tasks exhibited significantly lower knowledge retention when forced to write unassisted compared to their peers who transitioned from unassisted writing to using AI (Kosmyna, 2025). The performance looks identical in the short term, but the learning evaporates the moment the tool is removed.

The cognitive science behind these results paints a picture for us on what might be happening. AI’s fluency and clarity can trigger what psychologists call an “illusion of competence.” The clean, confident output of a language model engages our System 1 thinking (fast, automatic, effortless) while bypassing the System 2 processing (slow, deliberate, effortful) that is actually required to form durable learning. A student reads a beautifully articulated AI explanation and genuinely feels they understand the material, even when they do not. They have merely experienced the sensation of understanding, which is a very different thing.

Then there is the hallucination problem. Large language models are, at their core, pattern-completion engines. They predict the next most plausible word, not the next most truthful one. This means they will, with absolute confidence, generate statements that are factually wrong. For an expert, a hallucination is an annoyance that can be easily spotted and discarded. But for a novice learner, it can be a trap. A first-year student cannot yet have a reliable framework for distinguishing a correct AI output from a fabricated one, and the authoritative tone of the response actively discourages skepticism. We are, in effect, handing students a tool that lies with the confidence of a tenured professor.

The Great Unbundling

The pedagogical risks of AI are serious, but they are addressable with the right institutional design. The economic risks, on the other hand, may be existential. Modern university systems operate as a bundled institution. It packages together content delivery, social development, research training, career networking, and, perhaps most importantly, credentialing. For many students, the four-year degree is not just an education, but also an economic signal. Economists have long understood this through the lens of signaling theory: a diploma from a reputable university tells an employer that this person is intelligent, hardworking, and capable of sustained effort (Weiss, 1995). In some circumstances, the content of the degree can sometimes matter less than the fact that the degree was conferred. Companies have relied on universities to do part of the expensive, time-consuming work of filtering candidates, and they have been willing to pay a premium to graduates for that signal.

It is possible AI will break this bundle. If a large language model can deliver personalized instruction at fifty cents an hour, then content delivery and knowledge acquisition (one of the core functions of the university) becomes a commodity. AI-first entrants are already positioning to offer highly targeted, skill-based curricula at a fraction of university tuition. They do not need lecture halls, tenured faculty, or sprawling campuses. They need algorithms, content libraries, and a user interface.

But the deeper threat is to the credential itself. If AI-driven systems can precisely map, verify, and document a learner’s exact skill set, not through a four-year proxy, but through continuous, granular assessment, then the university’s expensive “stamp of approval” loses its market power among students. Why would a student pay four years of tuition to receive a signal through a degree program when an AI-powered assessment can demonstrate to employers, with far greater precision, exactly what they know and can do?

This is why it is possible that in the not-too-distant future, the most aggressive disruption could come from the demand side. Forward-thinking firms will increasingly bypass traditional recruiting entirely, offering guaranteed, paid learning tracks directly, which will be particularly appealing to young people. Why take on tens of thousands of dollars in student debt for an uncertain job market when a major employer offers you a structured, paid apprenticeship with a clear career trajectory? Simultaneously, new market entrants will emerge to specialize entirely in rapid, low-cost, AI-verified certification, similar to how the SATs provided standardized assessments for college admissions, but vastly more precise and continuously updated. While I’m not suggesting the university is likely to disappear, I am suggesting its monopoly on the pathway to professional life might already be cracking.

Defending the Moat

If the economic logic of unbundling is this clear, then the question for universities becomes urgent and specific: what, exactly, is your moat? What can you offer that an AI-first competitor cannot replicate at a tenth or one-hundredth of the cost?

The answer is not content. It is not lectures. It is not even expertise, narrowly defined. The answer is people, in a room, together, under pressure, navigating ambiguity and building upon each other’s insights. The physical campus must become a dedicated space for the things that artificial intelligence cannot simulate, like the spontaneous spark of shared discovery, the rapid, dynamic iteration of ideas, and the frictions of real human collaboration while working to solve complex problems. These are not soft skills in any dismissive sense, but rather some of the most durable, value-generating capabilities our students can develop in a labor market that AI is rapidly reshaping.

Right now, most universities are not meeting the AI challenge. The dominant institutional response I’ve observed to AI has been a combination of panic and passivity, either through banning ChatGPT from exams, investing in plagiarism detection software, or simply leaving students to figure out the technology on their own. This is the equivalent of banning calculators in 1985. It is not a strategy; it is a refusal to have one. By maintaining this holding pattern, we are neither using the technology to its fullest pedagogical potential nor preparing students for a world in which AI fluency is as fundamental as literacy. Instructors must be empowered and resourced to build AI deliberately into their curricula, using it as a pedagogical instrument rather than treating it as a threat to be policed.

The New Classroom

Let me be precise about what I am not arguing. I am not arguing that we should teach less rigorously, or that we should outsource the hard conceptual foundations of any discipline to AI. Students still need to understand supply and demand, the structure of an argument, the mechanics of a cell, and the logic of a proof. Education’s purpose remains valuable and non-negotiable. What must change, permanently and urgently, is how we engage our students.

If a student has on-demand access to a world-class AI tutor in their bedroom, then they will not pay university tuition to sit in a lecture hall and receive a weaker version of the same service. The lecture, as a primary mode of instruction, needs to end. What should likely replace it is an active learning environment, where students work with AI together, in real time, guided by an expert instructor who engineers the frictions of dynamic problem-solving. The professor’s role shifts from broadcaster to conductor, orchestrating debate, challenging assumptions, and forcing students to defend their AI-assisted conclusions against rigorous scrutiny.

This demands a corresponding revolution in assessment. We must move beyond certain fragile assessments that can be trivially outsourced to a language model. These instruments do not measure learning anymore, rather they measure access to new technology. Replacing them requires assessments designed around the capabilities that AI cannot fake, such as oral defense, live problem-solving under observation, collaborative case analysis where the process matters as much as the final output.

And we must teach a new core competency on algorithmic interrogation that needs to be developed. This is not “prompt engineering” in the superficial sense of crafting clever inputs. It is the disciplined ability to evaluate, stress-test, and strategically apply AI outputs to know when the model is reasoning well and when it is confabulating, to cross-reference its claims against verified frameworks, to treat every AI output as a hypothesis rather than an answer. We must pivot from testing whether students can execute tasks from scratch to testing whether they can evaluate and deploy solutions with judgment.

The Challenge

Pick a topic from one of your courses and work through three questions. In the following, I include a working example using supply, demand, and market equilibrium in competitive markets to show how this might work in practice.

Draw the line on AI. List everything a student might be asked to do with this topic and ask: which of these would I be okay with an AI doing on a student's behalf? Where you draw the line is your call, but draw it explicitly.

Example: Organizing and labelling raw market data into a readable table or chart, or tutoring a student on the definitions of supply, demand, and equilibrium outcomes.

Identify the skills to develop. On the other side of that line are the capabilities your trained student should be able to demonstrate that the AI cannot reliably demonstrate for them. Be specific about what these performances look like.

Example: Recognizing that while a particular market outcome may be allocatively efficient, it does not necessarily account for the welfare of all affected parties, and being able to construct an argument for why that distinction matters in a given case.

Design the assessment. Working backwards from those skills, what would it look like for a student to demonstrate them in a way that cannot be delegated to AI?

Example: A live case analysis where students are handed a market scenario they have not prepared for, asked to evaluate its efficiency, and required to defend their conclusions, including identifying whose welfare the standard model leaves out.

Subscribe to the weekly posts via email here

References

Bloom, B. S. (1984). The 2 Sigma Problem: The Search for Methods of Group Instruction as Effective as One-to-One Tutoring. Educational Researcher, 13(6), 4-16. https://doi.org/10.3102/0013189X013006004
Delikoura, I., Fung, Y.R., & Hui, P. (2025) "From superficial outputs to superficial learning: Risks of large language models in education." arXiv preprint arXiv:2509.21972.
Goffe, W. L., & Kauper, D. (2014). A Survey of Principles Instructors: Why Lecture Prevails. The Journal of Economic Education, 45(4), 360–375. https://doi.org/10.1080/00220485.2014.946547
Kestin, G., Miller, K., Klales, A., Milbourne, T. & Ponti, G. (2025) AI tutoring outperforms in-class active learning: an RCT introducing a novel research-based design in an authentic educational setting. Scientific Reports 15. https://doi.org/10.1038/s41598-025-97652-6
Kosmyna, N. et al. (2025) "Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing task." arXiv preprint arXiv:2506.08872 4.
LearnLM Team Google, et al. (2025) "AI tutoring can safely and effectively support students: An exploratory RCT in UK classrooms." arXiv preprint arXiv:2512.23633 (2025).
Syed, A. (2025) "TEAS: Trusted Educational AI Standard: A Framework for Verifiable, Stable, Auditable, and Pedagogically Sound Learning Systems." arXiv preprint arXiv:2601.06066.
Weiss, A. (1995) "Human capital vs. signalling explanations of wages." Journal of Economic Perspectives 9.4, 133-154.

Disclosure

What was outsourced to AI: articulation.

What was not outsourced to AI: arguments, insight, logic.

Your Challenger: Joshua Foster

A white male with short brown hair and glasses. He is wearing a dark blue suit, with a light-blue checkered shirt and a medium-blue tie. He is looking directly at the camera with a smile.

Joshua Foster is an Assistant Professor at Ivey Business School (Western University), where he researches how large language models can be used to study and shape economic decision-making. His work combines behavioural economics, market design, and machine learning to explore how preferences are formed, inferred, and aligned in both human and synthetic agents. He also develops interactive AI systems, most notably Sidekick, a personalized tutoring platform designed to support student learning through Socratic dialogue and real-time assessment. Joshua teaches behavioural economics and strategy to undergraduate and MBA students at Ivey.