Posted by PretzelFisch 3 hours ago
How about going back to the old system where, apart from experimental lab work, nothing is graded until the end of the term?
All weekly assignments should just be considered prep for one exam at the end of the term where the student has an opportunity to demonstrate mastery of the course's subject matter. They can prepare as they wish, use AI, and even cheat on the homework, but there will be a revelation at the end of the term.
That final test can be proctored, monitored, audited to ensure that whatever words are used are indeed the student's own words. The resulting grade depends on that, and that alone.
The approach of continuous assessment, which to me always seemed suspect and ripe for abuse, was completely broken by the AI tools that are now available.
A better approach is to rethink what we assess and how we assess it. Research shows that the design of assessments plays an important role in academic integrity. Assignments that require original thinking and regular engagement can reduce incentives to cheat and improve learning outcomes.
https://www.sciencedirect.com/science/article/abs/pii/S22119...
If the only remedy is monitored end of term exams, so be it.
Sure, it requires more resources, but it shouldn't require much more:
- We've had multiple exams before AI, and I don't see how AI makes it any harder. Obviously these are closed-book
- Schools should already be banning phones in class (and colleges have insane tuitions, they can afford more exams)
- The students who go out of their way to cheat - as long as they're a minority, let them. Why not? Either they'll fail later in life, or they didn't need to learn the material because they're pathological fakers (even if you won and forced them to learn the material, they'd probably still fake their way out of using it). Then, I doubt you need much proctoring to ensure that most students don't cheat, because most of the smart students are generally smart enough to know that actually learning the material is probably important (or if the material is probably not important, it doesn't matter if the students all cheat...)
Meanwhile, downsides of one exam:
- Disadvantages students who get overly stressed about unrecoverable exams, or have a particularly bad day on the exam
- Many students will blow off the (ungraded) assignments and put off actually learning until the end
- Less graded content (especially if the exam isn't overly long, which would disadvantage some students)
If someone knows 80% of the topics on an exam like the back of their hand and doesn't know the other 20% they shouldn't get a B, they should pass the subjects they know and be asked to retake and relearn the subjects they don't know.
When people know they can make mistakes and the result is not a perpetual black mark on their record (any grade not an A) but they are given the chance to improve and demonstrate this improvement then perhaps they might be more willing to admit and understand mistakes instead of cheating.
At-home coding projects, writing essays, etc also exercise different skiils than you can test for in a 2 hour written exam. It's unfortunate that due to rampant AI cheating, we can no longer reward the students who put in the work and develop these skills.
There are things you learn from spending several days structuring a 20-page argument that you will not learn (and cannot assess) from oral examination or a 5-paragraph essay written in a blue book.
That knowledge will show up in the blue book vis-a-vis the other exam candidates.
I would prefer not to be confrontational here, but I am having a hard time imagining that you've deeply considered the pedagogy of how to teach and evaluate students on squishy skills like this.
Knowing a bunch of facts about something is a world apart from structuring a compelling in-depth argument about it.
For example, the life-changingly-well-designed newswriting course I took in college assigned every single student a different story to spend several weeks reporting out so that we wouldn't all be out harassing the same poor people for interviews.
Of course, creating good exams is difficult, but you have to do that either way.
The OP was about students dumbing down their own work to avoid AI detectors ratting them out. That seems like a big loss.
A grade, on a single proctored test, is a crude metric, but at least it would be a brutally fair one.
Do you only learn when you’re being graded?
As a general rule when changing complex systems, you sacrifice what you aren’t trying to optimize. If you make a random change to a car without consideration for gas mileage it’s very likely to reduce gas mileage.
(The other side of that contract is, kids are not merely attending schools to learn, but to earn a degree that carries some degree of prestige)
Personally, I felt that the drive to automate the parts of the professors' workloads that mattered (i.e. teaching and grading and evaluation and research), only so that they can be given work that matters less the more they do it (i.e. publishing slightly different flavors of the same paper, to meet KPIs), was oddly perverse.
The multiple-choice test and the puzzle-solving test and really any standardized test can be exploited by any group that is sufficiently organized. This is also true in corporate interviewing where corporations think (or pretend) that they are interviewing an individual, whereas they are actually interviewing a _network_ of candidates who share details about the interviewers and the questions. I know people who got rejected in spite of getting all the interview questions correct (the theory is that nobody can do that well, so they must have had help from previously rejected/accepted candidates).
The word "trust" shares a root with the word "tree" and "truth" and "druid". Most exams and interviews are trying to speed-run trust-building (note that "verification" is from the latin word that means "true"). If trust and truth are analogous to "tree", then we are trying to speed-run the growth of a tree -- much like the orange tree, in the film, _The Illusionist_. And like the orange tree, it is a near-complete illusion, a ritual meant to keep the legal department and HR department happy.
The LLMs have simply made the corruption of academia accessible to _all_ students with an internet connection (EDIT: and instantaneous and cheap, unlike a human writer).
There has never been a shortcut to building trust. One cannot LLM their way into being a (metaphorical) druid.
I do not look forward to the Voight-Kampff tests that will come to dominate all aspects of online and asynchronous human interaction.
Note that, short of homework/classwork that _can't_ be gamed by an LLM (for some fundamental reason), even the high-quality honest students will be forced to cheat, so as to not be eclipsed by the actual low-quality cheating students[0].
I imagine that we may end up wrapping around to live in-person dialectics, as were standard in the time of Socrates and Parmenides[1]. If so, this should be fun.
[0]: If left unaddressed, we may see a bimodal distribution of great and terrible students graduating college, with those in between dropping out. If college is an attempt to categorize and rank a population, this would be a major fault in that mechanism.
[1]: Not to the exclusion of the other kinds of tests, writing is still important, critical even. But as a kind of verification-step, that should inform how much the academic community should trust the writing (I can imagine that all the writers here are experiencing stage-fright as they are reading these words).
I love this idea. And if a student is having a really bad day, or their dog just died, or they have bad cramps, or they have a hard time dealing with the intense stress of your entire grade being decided in one exam... well, those loser students can just fuck right off.
Accommodations are part of the fabric already. It doesn't seem inconceivable that we couldn't deal with them in exceptional circumstances in a similar way to how it's done today.
Accommodations are real and necessary, but applied at the end.
(Experimental sciences are an exception)
... well then, why not use those same protections (proctoring, monitoring, auditing) in continuous examination?
If your school uses software to detect AI writing, that's a problem with the quality of your school. The people choosing that software are too stupid to be running a school. The software isn't going to get any better.
The problem isn't that AI detection doesn't work. State of the art in this field is pretty solid. The only issue is that it's probabilistic, so it sometimes fails, and when it does, we have nothing else in situations where you actually want to know if someone put in the work.
So what are you proposing, exactly? That we run a large-scale experiment of "let's see what happens if children don't actually need to learn to do thinking and writing on their own"? The reality is that without some form of compulsion, most kids would rather play video games / scroll through TikTok all day. Or that we move to a vastly more resource-intensive model where every kid is given personalized instruction and watched 1:1?
Different people. I for one have always claimed that fMRI is too coarse-grained for detailed thought detection.
If AI detection "sometimes fails", it doesn't "work". It works well enough to convict someone with other evidence, but when there's no other evidence nor an attempt to get any, it has no good use.
Some proposals I'd be fine with:
- Make every student take a couple of their exams in a heavily proctored environment (chosen randomly per student and not giving them prior warning). If the student does significantly worse in their proctored exams (not counting exams that other students did significantly worse in), they're either cheating or unlucky. If you think this isn't strong enough evidence, the student's "punishment" can just be more proctored exams.
- Make it so every student is "flagged" at a certain frequency (e.g. one exam per semester). When a student is "flagged", they must demonstrate they weren't cheating via extra work (taking a similar exam, explaining their answers, etc.) If a student is suspected of cheating on a particular exam, they get flagged for it. Students who don't get suspected early get randomly flagged later to ensure those who are suspected don't suffer unfairly.
- Combine both of the above. A student can be "flagged" but there's not necessarily a limit on how many times nor random flaggings. When a student is flagged, they aren't penalized, but their next exam is heavily proctored.
Or, better proctor every student and grade only closed-book exams.
That's what fortunetellers do. The problem isn't guessing correctly about AI content in writing. The problem is false positives. That's what puts it in the same category is predictive policing scam software. And fortunetelling.
That's not like detecting thoughts via fMRI, it's like detecting tomorrows malware with yesterday's malware signatures. Or like researchers making a vaccine against the common cold
And the obvious proposal to fix that has been made multiple times in this thread: don't make take-at-home tasks part of the grade. Instead of trying to punish what you can't reliably detect, take away the incentive to do it in the first place
I don't understand your argument. The vendors for these detection tools can acquire recent samples from all frontier models just as easily as you can use them to write essays. There's nothing that requires a one-year delay.
Do AI vendors specifically train models to circumvent AI detectors? Why would they?
I think you're basing this off a fundamental misunderstanding of what these detectors look for. LLMs generate human-like text, but they also generate roughly the same style and content every time for a given prompt, modulo some small amount of nondeterminism. In essence, they are a very predictable human. Ask Gemini or ChatGPT ten times in a row to write an essay about why AI is awesome, and it will probably strike about the same tone every single time, with similar syntax, idioms, etc.
This is what these tools detect: the default output of "hey ChatGPT, write me a school essay about X". This can be evaded with clever prompting to assume a different writing personality, but there's only so much evasion you can do without making the text weird in other ways.
(This is roughly the same problem as evaluating software that only does an approximation of what it claims to do.)
(Aside: AI-based variations on this theme are in the early stages of proliferating across our society. They're being developed by many people using this forum and being sold to our schools, businesses, governments, and other organizations with little regard to whether they actually do what they claim.)
I've noticed I write a lot different because of combative online arguments. I have a problem.
So much of my communication is directed to people who don't want to hear me or understand me. So I've become very punchy and repetitive, trying to hammer home ideas that people are either unable or unwilling to understand.
I need to find ways to talk to people who want to hear and understand me.
It's hard to find other people who actually want to hear and understand though. People have different interests, and even when people appear to be working towards the same goal, they often aren't; like a boss who just won't understand the bad news, because it's easier to ignore the problem.
The result is that a ton of web forum/social-media posting would, in any other context, be fairly poor writing (even if it's otherwise got no problems) simply because of the the extra crap and contortions required to minimize garbage posts by poor readers who are, themselves, allowed to post to the same medium.
This is in addition to, though not wholly separate from, the tendency toward combativeness in online posting.
Ask more questions. It takes work when dealing with smart people who think beyond the question you asked, adding their own context, and then replying with a different question. But those are the people who are willing to engage with you. Statements without questions can be ignored, and people who engage with different questions than the ones that you asked can be safely ignored as those who don't want to engage.
The cure to a purely adversarial conversation is educated curiosity. The educated part being being able to differentiate the threads that will lead down a tribalistic path vs those that will lead down an exploratory one.
More important than all of the above, is knowing when to walk away. It's barely a majority, but that barely majority "want" to waste your time. Ignore their DOS attempt, and save your time for people who want to engage, fairly. The fairly part being the most important.
I find that a little faith goes a long way here: assume that you have a higher audience and speak to them accordingly.
Don't let the loud ones confuse you: normal, reasonable people (with normal, reasonable thoughts, just like yours) might not always reply, but they also read you.
For example, I abhor talking about modern politics. If it’s election season and I’m being asked to cast a vote or take some other specific civic action, then I understand it’s my civic duty to understand the situation and make a decision accordingly and I do.
But if it’s March and there’s really nothing specific I can do as a result of this particular conversation, I would probably also be in your camp of the “unwilling”. I would much rather chat about something else, or nothing at all.
I'm also assuming you're referring to in-person communication. If it's online communication, all bets are off. It's unlikely you're having a linear conversation and these days you're probably not even talking to a person.
If they don't want to listen, why waste the time?
> So I've become very punchy and repetitive, trying to hammer home ideas that people are either unable or unwilling to understand.
If they don't want it, why stuff it down their throats? Aren't they allowed to have their own ideas?
Is your life easier to not waste time on them? I guess. But obviously you're going to put yourself in a similar bubble, and to whatever extent the issue is important it's now become undiscussed. As you've hinted at, they could be right and you wrong, but the difference is (at least in the premise) that one is willing to talk and listen and so really only one side has the potential to change and it's not based on the merit of the argument—because of course no conversation took place. How hard does one try to encourage someone else to listen? Or rather continuing pursuing a conversation that's being denied? That's the tension. I don't know other than it seems like the side unwilling to listen wins a little bit each time they've successfully evaded it and wins a little more when the other has decided to let it go. I don't just mean they've won a proverbial argument, I mean the issue or decision in question tilts toward their side.
I'm told blogging works for some. I don't really know how you build an audience, though, and it's hard to keep going (first-hand experience) without one.
It turns out to be built into the training data. The diffusion model just doesn't have many references of naked people not embedded in porn tropes, so it autocompletes porn.
Online moderation of generated images have the same weird incentive. Since real people seldom film themselves having sex, a naked person not having sex is a red flag for a possible real person, and gets moderated more strongly.
So in the new world, well written sentences are a handicap and nudity is generally accompanied by an exchange of fluids.
Honestly, I lean towards shaming educators who do that. If you can't detect the whiff of LLM with your own senses, then it has been used properly and shouldn't be faulted. If that premise invalidates your assignment, change the assignment. It's not as if you're assigning this work to test the basic mechanics of writing (grammar, sentence/paragraph structure, parallelism, whatever) — I mean, how much of that did you consciously try to teach? My recollection is, not an awful lot; and I can only imagine it's gotten worse since I was in K-12 (and I went to pretty darn good K-12).
But wouldn't this apply to any cheating method? I don't think educators would be able to tell the difference between using a calculator, getting answers from previous tests, resubmitting assignments, etc.
> using a calculator
Students who are at a level where they'd be learning to do the computations a calculator does, shouldn't have graded homework. And even at that level, real mathematics is more than just computation.
> getting answers from previous tests
Decades ago, my teachers and professors knew advanced tricks for this, like "not just reusing the test questions from last year". Sometimes they even changed the constants in math questions between sections of the class.
Reading previous tests (including correct answers) was never considered cheating, or even slightly unethical, in my education. In fact, one of our professors had this party trick of working through all the answers for a past-year exam (perhaps multiple of them; I can't recall the details, but certainly much faster than students were expected to work things out under exam conditions) in the space of a single lecture, near the end of the course. Students were meant to see this and learn from it (as well as be impressed).
> resubmitting assignments
Why would you ever not notice this?
>Students who are at a level where they'd be learning to do the computations a calculator does, shouldn't have graded homework. And even at that level, real mathematics is more than just computation.
So, a math level less than Real analysis shouldn't have graded homework?
>Decades ago, my teachers and professors knew advanced tricks for this, like "not just reusing the test questions from last year".
Math is not the only subject. For an English class, what constant would you change so that students get a comparable exam (especially if you are going to do this between sections in the same corhort)?
>resubmitting assignments
Students are not stupid, and obviously would not resubmit an assignment for the same teacher. However, there is a significant overlap between classes, so certain assignments should be retooled for other assignments.
The discourse around "cheating" with these products has always been a mistake. We should have characterized them less as "cheating machines" and more as "expediency machines." Because once you're invested in describing students as having academic dishonesty issues rather than skill issues, you've made it an administrative problem. You never come back from that.
For mine, we lost the issue long ago when accountability culture won. We should never have bothered with the idea that "mechanics, grammar, and proofreading" should be part of a "rubric" that "assessed outcomes" for "good writing." We should have just said "we don't care if you don't think this is worthwhile, because your time is worth nothing." The last two years of student labor certainly suggests this.
I think your point stands for upper level work; however, at medium to lower levels, your counterfactual starts to weaken. The ideas have always been there, but it's the ability to express them--well enough to notice their presence--that is not.
I think grading in general can be stymying for students' motivation and creative drives.
I wish brevity and linguistic precision were taught more, as well. Miscommunication due to ambiguity is one of the biggest causes I see for confusion or heated arguments.
In my experience educators no longer use AI detectors given the risk of false positives. But some work is obviously lazy AI content. When that happens, educators talk to the student to see if they understand what they wrote.
Teachers cope with more in person writing, oral presentations, defense of what’s been written.
If you think out it the pre-AI computing generation is itself anomalous for having ubiquitous access to efficient human-only writing tools. We probably wrote more than previous generations. Early Internet / blogging culture bears this out.
But, the article's focus on writing "worse" for AI detectors misses what is important. Trying to distinguish humans from machines does not develop student capability. In fact, it's a fleeting technique because AI writing styles will vary and improve over time.
These kinds of things are novel to us and deserving skepticism, but become just the world we live in to them.
If now teachers abdicate this judgment to a software, students should be allowed to abdicate their duties to a computer as well.