Top
Best
New

Posted by sethbannon 7 days ago

Fighting Fire with Fire: Scalable Oral Exams(www.behind-the-enemy-lines.com)
220 points | 279 commentspage 5
amelius 6 days ago|
What makes me so sad about LLMs is that I used to get questions about math, physics all the time from cousins, nephews, etc. but that seems to be a thing from the past :(
globalnode 6 days ago||
online exams are one of the reasons ive lost interest in uni -- i dont mind old school invigilated ones where you go to a building and do the exam but im not gonna let them install anything on my computer and basically have 1 or more people i cant see looking through my webcam. dont need the qual that bad. but i feel bad for people that do. and this idea of oral exams wouldnt work for me either lol
aboardRat4 6 days ago||
If your school doesn't have oral in person exams with high quality professors, it's a garbage school.
nottorp 6 days ago||
So what is the correlation between the student not being a natural actor who speaks clearly and the exam score?
ildon 6 days ago||
As a University professor, what I really don't get about this "experiment" is the timings. They report:

> 36 students examined over 9 days > 25 minutes average (range: 9–64)

It appears that they examined only 4hrs each day, one student at a time. This is incredibly inefficient.

In my experience, the greatest benefit of doing something like this would be to be able to run these exams in parallel, while retaining a somewhat impartial grading system.

gaborcselle 7 days ago||
Curious why the setup had 3 different LLMs?
jimbokun 6 days ago|
To compare the grades across them and see if they agree within some range. If not flag for human review.
cryptonector 7 days ago||
Is there an evaluation of how good the questioning was? Did TFA review the transcripts for that? Did I miss it?

> The grading was stricter than my own default. That's not a bug. Students will be evaluated outside the university, and the world is not known for grade inflation.

Good!

> 83% of students found the oral exam framework more stressful than a written exam.

That's alright -- that's how life goes. This reminds me of a history teacher I had in middle school who told us how oral exams were done at the university he had studied in: in class, each student would come up to the front, pick three topics at random from a lottery-ball-picker type setup, and then they'd have a few minutes in which to explain how all three are related. I would think that would be stressful except to those who enjoy the topic (in this case: history) and mastered the material.

> Accessibility defaults. Offer practice runs, allow extra time, and provide alternatives when voice interaction creates unnecessary barriers.

Yes, obviously this won't work for deaf students. But why must it be an oral examination anyways? In the real world (see above example) you can't cheat at an oral examination because you're physically present, with no cheat sheets, just you, and you have to answer in real time. But these are "take-at-home" oral exams, so they had to add a requirement of audio/video recording to restore the value of the "physically present" part of old-school oral exams -- if you could do something like that for written exams, surely you would?

Clearly a take-home written exam would be prone to cheating even with a real-time AI examiner, but the real-time requirement might be good enough in many cases, and probably always for in-class exams.

Oh, that brings me to: TFA does not explicitly say it, but it strongly implies that these oral exams were take-at-home exams! This is a very important detail. Obviously the students couldn't do concurrent oral exams in class, not unless they were all wearing high quality headsets (and even then). The exams could have been in school facilities with one student present at a time, but that would have taken a lot of time and would not have required that the student provide webcam+audio recordings -- the school would have performed those recordings themselves.

My bottom-line take: you can have a per-student AI examiner, and this is more important than the exam being oral, as long as you can prevent cheating where the exam is not oral.

PS: A sample of FakeFoster would have been nice. I found videos online of Foster Provost speaking, but it's hard to tell from those how intimidating FakeFoster might have been.

owenbrown 6 days ago||
A regular paper and pencil exam would be a better experience for the students.
EdNutting 6 days ago||
I wrote a related thought piece recently on the return of oral vivas. But damn, I didn’t anticipate someone doing them using voice apps and LLMs. That’s completely fucked up.

https://ednutting.com/2025/11/25/return-of-the-viva.html

bccdee 6 days ago|
Oh my god, this sounds awful. After the first few paragraphs, I was ready to be impressed, but then they started dropping all these insane details:

---

> Only 13% preferred the AI oral format. 57% wanted traditional written exams. 83% found it more stressful.

> Here is an email from a student: "Just got done with my oral exam. [...] I honestly didn't feel comfortable with it at all. The voice you picked was so condescending that it actually dropped my confidence. [...] I don't know why but the agent was shouting at me."

> Student: "Can you repeat the question?" Agent: paraphrases the question in a subtly different way.

> Students would pause to think, and the agent would jump in with follow-up probes or worse: interpret the silence as confusion and move on.

---

Based on these highlights, you'd think the experiment was a wash. The author disagrees!

> But here's the thing: 70% agreed it tested their actual understanding: the highest-rated item.

Man, you could shoot me with a gun, then make me write an essay, & I'd be forced to agree that you had tested my "actual understanding." That doesn't mean my performance wouldn't suffer. Also, 70% is not very high. That's barely two thirds.

Even the grading was done by LLMs (rather than having a TA grade a transcript, and the results were lower. The author defends this by saying, "Students will be evaluated outside the university, and the world is not known for grade inflation," but the world isn't "known for grade inflation" because it doesn't grade you at all. That's not even an excuse, it's just nonsense. It'll toughen you up, or whatever. Was this post written by an LLM too?

> Take-home exams are dead. Reverting to pen-and-paper exams in the classroom feels like a regression.

"Regression"? I mostly wrote pen & paper exams, and I only graduated a few years ago. If students want more flexibility, team up with other courses to supervise multiple exam sessions. Leaked questions aren't going to be any more of a problem than it was for take-home exams, especially since they can't take the booklets with them when they go.

It sounds like these students had a terrible time, and for what? Written exams work fine. These guys just wanted to play with LLMs.

More comments...