Posted by sethbannon 4 days ago
CFR 46.104 (Exempt Research):
46.104.d.1 "Research, conducted in established or commonly accepted educational settings, that specifically involves normal educational practices that are not likely to adversely impact students' opportunity to learn required educational content or the assessment of educators who provide instruction. This includes most research on regular and special education instructional strategies, and research on the effectiveness of or the comparison among instructional techniques, curricula, or classroom management methods."
https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-...
So while this may have been a dick move by the instructors, it was probably legal.
> Exempt human subjects research is a specific sub-set of “research involving human subjects” that does not require ongoing IRB oversight. Research can qualify for an exemption if it is no more than minimal risk and all of the research procedures fit within one or more of the exemption categories in the federal IRB regulations. *Studies that qualify for exemption must be submitted to the IRB for review before starting the research. Pursuant to NU policy, investigators do not make their own determination as to whether a research study qualifies for an exemption — the IRB issues exemption determinations.* There is not a separate IRB application form for studies that could qualify for exemption – the appropriate protocol template for human subjects research should be filled out and submitted to the IRB in the eIRB+ system.
Most of my research is in CS Education, and I have often been able to get my studies under the Exempt status. This makes my life easier, but it's still a long arduous paperwork process. Often there are a few rounds to get the protocol right. I usually have to plan studies a whole semester in advance. The IRB does NOT like it when you decide, "Hey I just realized I collected a bunch of data, I wonder what I can do with it?" They want you to have a plan going in.
[1] https://irb.northwestern.edu/submitting-to-the-irb/types-of-...
Imagine otherwise: a teacher who wants change their final exam from a 50 item Scantron using A-D choices, to a 50 item Scantron using A-E choices, because they think having 5 choices per item is better than 4, would need to ask for IRB approval. That's not feasible, and is not what happens in the real world of academia.
It is true that local IRBs may try to add additional rules, but the NU policy you quote talks about "studies". Most IRBs would disagree that "professor playing around with grading procedures and policies" constitutes a "study".
It would be presumed exempted.
Are you a teacher or a student? If you are a teacher, you have wide latitude that a student researcher does not.
Also, if you are a teacher, doing "research about your teaching style", that's exempted.
By contrast, if you are a student, or a teacher "doing research" that's probably not exempt and must go through IRB.
Reminder: This professor's school costs $90k a year, with over $200k total cost to get an MBA. If that tuition isn't going down because the professor cut corners to do an oral exam of ~35 students for literally less than a dollar each, then this is nothing more than a professor valuing getting to slack off higher than they value your education.
>And here is the delicious part: you can give the whole setup to the students and let them prepare for the exam by practicing it multiple times. Unlike traditional exams, where leaked questions are a disaster, here the questions are generated fresh each time. The more you practice, the better you get. That is... actually how learning is supposed to work.
No, students are supposed to learn the material and have an exam that fairly evaluates this. Anyone who has spent time on those old terrible online physics coursework sites like Mastering Physics understands that grinding away practicing exams doesn't improve your understanding of the material; it just improves your ability to pass the arbitrary evaluation criteria. It's the same with practicing leetcode before interviews. Doing yet another dynamic programming practice problem doesn't really make you a better SWE.
Minmaxing grades and other external rewards is how we got to the place we're at now. Please stop enshittifying education further.
Absolutely the easiest solution would have been to have a written exam on the cases and concepts that we discussed in class. It would take a few hours to create and grade the exam.
But at a university you should experiment and learn. What better class to experiment and learn than the “AI Product Management”. Students were actually intrigued by the idea themselves.
The key goal: we wanted to ensure that the projects that students submitted was actually their own work, not “outsourced” (in a general sense) to teammates or to an LLM.
Gemini 3 and NotebookLM with slide generation were released in the middle of the class, and we realized that it is feasible for a student to have a flaweless presentation in front of the class, without understanding deeply what they are presenting.
We could schedule oral exams during the finals week, which would be a major disruption for the students, or schedule exams during the break, violating university rules and ruining students vacation.
But as I said, we learned that AI-driven interviews are more structured and better than human-driven ones, because humans do get tired, and they do have biases based on who is the person they are interviewing. That’s why we decided to experiment with voice AI for running the oral exam.
Also, with all the progress in video gen, what does recording the webcam really do?
This is to be expected. The big commercial LLMs generally respond with text that agrees with the user.
> But here's what's interesting: the disagreement wasn't random. Problem Framing and Metrics had 100% agreement within 1 point. Experimentation? Only 57%.
> Why? When students give clear, specific answers, graders agree. When students give vague hand-wavy answers, graders (human or AI) disagree on how much partial credit to give. The low agreement on experimentation reflects genuine ambiguity in student responses, not grader noise.
The disagreement between the LLMs is interesting. I would hesitate to conclude that "low agreement on experimentation reflects genuine ambiguity in student responses." It could be that it reflects genuine ambiguity on the part of the graders/LLMs as to how a response should be graded.
I could accept this for a 300 students class, but 36? When I got my degree, ALL exams had an oral component, usually more than 30 minutes long. The prof and one or two TAs would take a couple days and just do it. For 36 students it’s more than doable. If I was a student being examined by an LLM I would feel like the professor didn’t care enough to do the work.
This sounds as though it was written by an LLM too.