WHAT YOU'LL LEARN
- ✓
What a language model actually is, and what it is not
- ✓
Why AI sounds so confident even when it is completely wrong
- ✓
What hallucination is and why it will never fully go away
- ✓
The one thing most teachers get wrong about AI from the start
The biggest mistake teachers make with AI
Most people treat AI like a very fast librarian: you ask a question, it looks up the answer, and it tells you. That mental model is wrong, and it causes real problems in classrooms.
AI does not know things the way you know things. It does not retrieve stored facts. It generates text by predicting what words should come next, one word at a time, every single time, based entirely on patterns in the data it was trained on.
Tom Hammond, Professor of Education at Lehigh University, puts it plainly: "AI doesn't actually know anything. Its outputs are the result of calculated probabilities." That distinction changes how you evaluate every response AI produces.
Tokens: language broken into pieces
AI does not read words the way you do. Before it processes anything you type, it breaks your text into small chunks called tokens. A token can be a full word, part of a word, or punctuation.
The word "understanding" becomes three tokens: "under," "stand," "ing." A rough rule: one token equals about three quarters of an English word. A page of text is around 667 tokens.
This matters for two practical reasons. First, AI tools charge based on token volume, so longer conversations cost more. Second, every AI system has a maximum number of tokens it can hold in memory at once. When a conversation gets long enough, the AI genuinely loses track of what you said earlier. It is not being careless. It literally cannot see it anymore.
ANALOGY
Think of tokens like LEGO bricks. Before building anything, a LEGO kit is broken into individual pieces. The AI does the same thing to language: it breaks every sentence into its smallest useful units, then works with those pieces rather than with whole words or ideas.
Sophisticated autocomplete
At its core, a large language model is a next-token prediction machine. When you type a prompt, the model converts it to tokens, calculates probability scores for every possible next token, selects one, adds it to the sequence, then repeats, one token at a time, until it decides to stop.
Gary Marcus, Professor Emeritus at NYU, calls them "very large autocomplete machines." That is not an insult. It captures something real. The system is not reasoning through a problem the way you would. It is generating what statistically follows from what has come before.
This is why AI can write a grammatically perfect, confident-sounding paragraph about a completely fictitious study. The words flow naturally. The structure is right. The citations look real. It is all generated token by token, shaped by patterns, not by knowledge of what is actually true.
Knowing this changes how you read AI output. The fluency is not evidence of accuracy. A confident response and a correct response are two different things.
Hallucination: the student who didn't study
Hallucination is when an AI generates information that sounds plausible but is false, fabricated, or misleading. It happens because the model bridges gaps in its training data by generating what statistically fits, not what is actually true.
OpenAI's own research describes it this way: "Spelling and punctuation follow consistent patterns, so errors there disappear with scale. But arbitrary low-frequency facts cannot be predicted from patterns alone, and hence lead to hallucinations." Even the best models in 2026 have hallucination rates of around 3 to 5%, varying by task complexity.
The classroom parallel: imagine a student who did not read the assignment but writes an essay that sounds authoritative. They string together plausible sentences based on general knowledge. Sometimes they invent citations that look real but do not exist. The essay has a confident tone, clear structure, and no obvious red flags. But it cannot be trusted without checking.
This is not a bug that will be fixed in the next version. It is a structural property of how these systems work. Hallucination rates will continue to fall, but they will not reach zero. Every AI output is a draft that requires human verification, especially on anything factual, medical, legal, or historical.
REAL-WORLD EXAMPLE
In 2023, a New York attorney submitted a legal brief containing six fabricated case precedents generated by ChatGPT, complete with convincing citations that did not exist. The attorney had not verified the sources. The cases were entirely invented. The incident led to court sanctions and became one of the most widely cited examples of AI hallucination in professional practice.
Context windows: the AI's desk
A context window is the maximum amount of text an AI can process in a single interaction, covering both what you type and what it generates. Think of it as a desk. The AI can only work with what is on the desk. When the desk gets full, older material slides off the edge and disappears from view.
Current context windows are enormous. Some models can hold the equivalent of 750,000 words at once. In practice, most classroom conversations will never approach those limits. But longer sessions, large document uploads, or complex multi-step tasks can start to cause problems. The AI does not flag when it has lost track of earlier context. It just proceeds as if everything is fine.
The practical rule: for any task where earlier details matter, restate the key information periodically. Do not assume the AI remembers what you told it twenty exchanges ago.
Training data: where the patterns come from
A large language model learns by reading enormous amounts of text: books, websites, academic papers, code, and forum discussions, finding statistical patterns across all of it. GPT-3 was trained on approximately 500 billion words. A child encounters around 100 million words by age ten. These are different scales entirely.
The model learns from whatever is in that data, including the biases, errors, outdated information, and ideological slants present in the source material. A 2021 analysis found that 80% of AI systems in education showed some form of measurable bias when audited. That figure reflects training data more than any design choice.
Training data also has a cutoff date. Most widely-used AI systems have a knowledge cutoff somewhere between 2023 and early 2026. They do not know what happened after that date unless they have access to web search tools. Asking a model about recent events without web access is asking someone who has been offline for two years what is in the news today.
The implication for classroom use: AI is most reliable on well-documented, frequently-discussed topics with stable information. It is least reliable on recent events, niche subjects, non-English topics, and anything requiring local or personal knowledge.
What AI genuinely cannot do
Understanding the architecture explains the limitations. AI cannot reason through a novel problem the way a person does. It cannot verify its own outputs. It cannot know when it does not know something. It generates plausible text regardless of whether the information exists. It cannot build real relationships with students, read a room, or respond to what a student is carrying into class from outside the door.
It also cannot replace the function of a teacher in learning. The research on this is unambiguous, and Section 2 goes into it in detail. For now, the important point is this: knowing what AI cannot do is as useful as knowing what it can. Both are part of AI literacy.
INTERACTIVE
AI Myth or AI Fact?
Test your understanding. Select the best answer for each statement.