Posted by sethbannon 7 days ago
> 36 students examined over 9 days > 25 minutes average (range: 9–64)
It appears that they examined only 4hrs each day, one student at a time. This is incredibly inefficient.
In my experience, the greatest benefit of doing something like this would be to be able to run these exams in parallel, while retaining a somewhat impartial grading system.
> The grading was stricter than my own default. That's not a bug. Students will be evaluated outside the university, and the world is not known for grade inflation.
Good!
> 83% of students found the oral exam framework more stressful than a written exam.
That's alright -- that's how life goes. This reminds me of a history teacher I had in middle school who told us how oral exams were done at the university he had studied in: in class, each student would come up to the front, pick three topics at random from a lottery-ball-picker type setup, and then they'd have a few minutes in which to explain how all three are related. I would think that would be stressful except to those who enjoy the topic (in this case: history) and mastered the material.
> Accessibility defaults. Offer practice runs, allow extra time, and provide alternatives when voice interaction creates unnecessary barriers.
Yes, obviously this won't work for deaf students. But why must it be an oral examination anyways? In the real world (see above example) you can't cheat at an oral examination because you're physically present, with no cheat sheets, just you, and you have to answer in real time. But these are "take-at-home" oral exams, so they had to add a requirement of audio/video recording to restore the value of the "physically present" part of old-school oral exams -- if you could do something like that for written exams, surely you would?
Clearly a take-home written exam would be prone to cheating even with a real-time AI examiner, but the real-time requirement might be good enough in many cases, and probably always for in-class exams.
Oh, that brings me to: TFA does not explicitly say it, but it strongly implies that these oral exams were take-at-home exams! This is a very important detail. Obviously the students couldn't do concurrent oral exams in class, not unless they were all wearing high quality headsets (and even then). The exams could have been in school facilities with one student present at a time, but that would have taken a lot of time and would not have required that the student provide webcam+audio recordings -- the school would have performed those recordings themselves.
My bottom-line take: you can have a per-student AI examiner, and this is more important than the exam being oral, as long as you can prevent cheating where the exam is not oral.
PS: A sample of FakeFoster would have been nice. I found videos online of Foster Provost speaking, but it's hard to tell from those how intimidating FakeFoster might have been.
---
> Only 13% preferred the AI oral format. 57% wanted traditional written exams. 83% found it more stressful.
> Here is an email from a student: "Just got done with my oral exam. [...] I honestly didn't feel comfortable with it at all. The voice you picked was so condescending that it actually dropped my confidence. [...] I don't know why but the agent was shouting at me."
> Student: "Can you repeat the question?" Agent: paraphrases the question in a subtly different way.
> Students would pause to think, and the agent would jump in with follow-up probes or worse: interpret the silence as confusion and move on.
---
Based on these highlights, you'd think the experiment was a wash. The author disagrees!
> But here's the thing: 70% agreed it tested their actual understanding: the highest-rated item.
Man, you could shoot me with a gun, then make me write an essay, & I'd be forced to agree that you had tested my "actual understanding." That doesn't mean my performance wouldn't suffer. Also, 70% is not very high. That's barely two thirds.
Even the grading was done by LLMs (rather than having a TA grade a transcript, and the results were lower. The author defends this by saying, "Students will be evaluated outside the university, and the world is not known for grade inflation," but the world isn't "known for grade inflation" because it doesn't grade you at all. That's not even an excuse, it's just nonsense. It'll toughen you up, or whatever. Was this post written by an LLM too?
> Take-home exams are dead. Reverting to pen-and-paper exams in the classroom feels like a regression.
"Regression"? I mostly wrote pen & paper exams, and I only graduated a few years ago. If students want more flexibility, team up with other courses to supervise multiple exam sessions. Leaked questions aren't going to be any more of a problem than it was for take-home exams, especially since they can't take the booklets with them when they go.
It sounds like these students had a terrible time, and for what? Written exams work fine. These guys just wanted to play with LLMs.