Frontier AI has broken the open CTF format

Posted by frays 5 hours ago

Frontier AI has broken the open CTF format(kabir.au)

177 points | 148 comments

baq 3 hours ago|

Replace ‘CTF’ with ‘high school’ or ‘university’ and you’ve described the total slow motion collapse of education; the only saving grace is that most of it requires in person presence.

We’ve figured out the human replacement pipeline it seems, but we haven’t figured out the eduction part. LLMs can be wonderful teachers, but the temptation to just tell it ‘do it for me’ is almost impossible to resist.

Gigachad 2 hours ago||

We are interviewing for a software dev role and we made the first round in person to prevent cheating. The gap between people who learned pre ai vs post is immense. I had a dev with supposedly 3 years experience and a degree in software who wouldn't have been able to write fizzbuzz without AI.

IanCal 2 hours ago|||

Can’t say you’re wrong but the last anecdote describes many I’ve had to review for jobs long before LLMs. Fizzbuzz is a classic thing that shockingly many devs genuinely cannot do, even at home.

Gigachad 1 hour ago|||

Something that is for sure new is the AI interview cheating tools which listen in on the call and provide answers in an overlay invisible to screen sharing. The only way to deal with it would either be invasive spyware on the applicants computer or asking them to do the interview face to face.

nsvd2 1 hour ago||

Spyware wouldn't help at all because you could just put the AI between the computer and the monitor, for example, or use a VM.

sigmoid10 1 hour ago|||

Yeah, I've interviewed people like this 15 years ago. Degrees and experience mean nothing in this field. The best predictor I found was personal passion projects. Let them get as nerdy as possible, then you will see pretty quickly where their skills are at and what their limits are. And you will immediately filter out people who just studied CS because they heard you can make good money.

wookmaster 4 minutes ago|||

Completely agree with this, leetcode has become such a business now of memorization for interviews it’s useless to know if someone memorized a solution or not.

gedy 45 minutes ago|||

I agree, however there are so many interviewers who will still treat that as some softball criteria and insist that unless you "prepare" for an interview by memorizing leetcode you are 100% a faker and liar.

Retr0id 2 hours ago||||

> I had a dev with supposedly 3 years experience and a degree in software who wouldn't have been able to write fizzbuzz without AI.

If you remove the "without AI" and the end, I've been hearing similar anecdotes about fizzbuzz for years (isn't the whole point of fizzbuzz to filter out those candidates?)

Gigachad 2 hours ago||

While this is true, it seems undeniable that if you use AI to do everything for you, you will never learn the skills. I'm seeing a massive amount of developers submitting stuff for review and admitting they have no idea how it works and they just generated it.

baxtr 1 hour ago|||

I wonder if you’re filtering for the right things.

We usually hire for problem solving capabilities and not so much for technical know-how.

That’s at least how I read your comment.

gonzalohm 6 minutes ago|||

Isn't wiring coding solving a problem? If the candidate can't do that then even if they use AI for coding how are they going to review the code properly?

Gigachad 1 hour ago|||

Ultimately in a software development role you need both technical know how and problem solving capabilities.

This situation in particular was a React role so there is an expectation that when you list React as one of your skills on your resume then you know at least the basics of state, the common hooks, the difference between a reference to a value vs the value itself.

These days you can do a surprising amount with AI without knowing what you are doing, but if you don't have any clue how things work you'll very quickly run in to problems you can't prompt away.

daymanstep 3 hours ago|||

Wonderful teachers that give unreliable information with total confidence?

entropyneur 3 hours ago|||

I had human teachers who did that in middle/high school. Took me many years to pick out all the hallucinated bits of "knowledge". I don't think the current models are any less reliable that what we currently have on average.

dguest 2 hours ago|||

I'll always remember my middle school science teaching telling us that nuclear fusion violates conservation of mass because the 2 protons in a pair of hydrogen nuclei combine to make helium with 4 nucleons. It's not true, but that's not the point.

But he was a great teacher anyway. He was engaging and kept the kids in line and learning. I eventually learned the truth, and most of my classmates forgot about it. Teaching, like flying a plane or driving a train, might become more about keeping watch over a small group of people and ensuring that things don't go off the rails, and that's fine.

3form 2 hours ago|||

This one feels less sinister than some other things at least to me, personally. You can reasonably doubt that the conservation of mass is violated and find out the truth based on that. But understanding more complex biology or historical context for some things? Granted, many of these things seem to be low stakes, but I'm sure there are some there are not (sex ed comes to mind).

zem 2 hours ago||

to be fair, fusion does violate conservation of mass, just not the way the teacher explained it. the loss of mass is where the energy comes from.

mr_mitm 2 minutes ago|||

Yes, there is no law of conservation for mass like there is for energy. Fusion is a good example for why it's not conserved. The teacher was right.

3form 1 hour ago|||

Yes, together with mass-energy equivalency it would form a coherent argument, and then also a correct one - but the thing is that if incomplete, it still might sound funky enough to you to research it if you care.

I think it helps that it's a very narrow field to look at, compared to fuzzy and big-picture view of social studies, for example. So much room to be confidently wrong... And sadly I can't think of a solution, LLMs or not.

bernds74 1 hour ago|||

I had a chemistry teacher who told us that hydrogen reacts violently with oxygen, and this is how the hydrogen bomb works.

daymanstep 1 hour ago||

I had a chemistry teacher who insisted that the fissile isotope of Uranium was U-238 not U-235. I challenged him on this multiple times and he refused to budge on this. I get that it's a simple mistake to make (it seems like U-238 is bigger so intuitively ought to be less stable) but he could have just looked it up and he didn't, I guess he was just so confident about it that he thought there was no way he could have been wrong about it.

oldsecondhand 1 hour ago|||

That's an American problem though. In most of Europe you need a masters degree to teach highschool and that involves at least an undergrad level of understanding the subjects you will teach.

E.g. in Hungary I had a university CS professor that originally wanted to be a highschool teacher and a highschool physics teacher that originally wanted to be researcher. Their choice of degree didn't determine which outcome they got. The researcher and teacher curriculum had an 80%+ overlap.

Bawoosette 3 hours ago||||

To be fair, that was much of my actual experience with human professors in university.

renticulous 2 hours ago|||

Veritasium proved that in a difficult challenge.

A Physics Prof Bet Me $10,000 I'm Wrong

https://www.youtube.com/watch?v=yCsgoLc_fzI

IshKebab 3 hours ago|||

Yeah one of my teachers was able to identify which high school I had come from due to something I had been mistaught.

Levitz 2 hours ago||||

Off the top of my head: DOMS being little crystals in muscles, tongue having separate areas for each type of taste, food pyramid, blue blood in the veins, the appendix being useless, body temperature doesn't change disregarding whether it's exposed to cold or to heat, and a whole lot of stuff related to politics and history I'd rather just omit (I don't live in the US).

All things I learned in school which were wrong information.

Not to mention, the current state of education is far worse. I don't think most realize how low the bar is.

akdev1l 33 minutes ago||

My biology teacher in school once tried to teach us that winds created by God. Not like spiritually or something but that God literally made the wind I guess.

My “earth sciences” teacher also once tried to argue with me against the universal law of gravitation. (no, she was not referring to Special/General Relativity. She didn’t agree two objects in a vacuum fall at the same speed regardless of mass.

autoexec 2 hours ago||||

They'll also encourage and praise you even when you're heading down the wrong path until you think you've uncovered the secret of the universe or proven that established science was wrong this whole time when really you've just been bullshitting with an engagement bot.

k__ 3 hours ago||||

Anti-intellectualism is at it again, hu?

victorbjorklund 3 hours ago||||

Like humans.

CoastalCoder 1 hour ago||

I think we should go a little deeper on this idea.

We can all agree that both human "experts" and LLMs can sometimes be right, and sometimes be confidently wrong.

But that doesn't imply that they're equally fit for purpose. It just means that we can't use that simple shortcut to conclude that one is inferior to the other.

So where do we go from here?

oofbey 1 hour ago||

I’ve always thought of the definition of “expert” as reliably knowing the difference between what is known, what is speculated but unproven, and what is unknown. People claim expertise in all sorts of things that they aren’t experts in. But true experts should not be wrong. They should qualify levels of certainty. This definition certainly works in the sciences.

p-e-w 3 hours ago|||

The amount of bullshit and blatant lies I’ve heard from my human teachers dwarfs the hallucinations produced by today’s LLMs.

mold_aid 3 hours ago|||

>LLMs can be wonderful teachers

Are they or aren't they

p-e-w 3 hours ago|||

A million times better than any human teacher I’ve ever had, for sure.

Now I’m certain that there exist those mythical human instructors who can do better, but that’s not worth much if 99.99% of people don’t have access to them. Just like a good human physician who takes their time with the patient is better than an LLM, but that’s not worth much either given that this doesn’t match most people’s experience with their own physicians.

qsera 22 minutes ago|||

>A million times better than any human teacher I’ve ever had, for sure.

Not really, not if you want to ask it deep questions. It won't have an answer that is deeper than something that you can find online, and if pressed it will just keep circling around the same response.

The reason is that this "thing" was never curious, never asked questions, and never really learned anything. It just has learned the Internet "by heart", and is as boring as a human teacher who is not really curious about the subject they are teaching, and has just got some degree by "by hearting" some text book. Of course it does it much better than a human, but it is fundamentally the same thing.

vladms 2 hours ago||||

Did an LLM teach you a topic you did not feel like learning?

For me the best human teachers were the ones that managed to make me interested on topics that I thought are boring/useless (many times my opinion being stupid, mostly due to lack of experience).

So far with LLM I learn about things I know something (at least that they exist) and I am interested in, which is a small subset of things that one should learn during lifetime.

jimnotgym 2 hours ago|||

Well I have some evidence to support your hypothesis. During Covid my kids were at home, eventually with some kind of self learning website from school. I was upstairs working, checking in with progress on the parents app. Finish your daily school work and then you can game.

The kids learnt all about Team Fortress 2, Roblox, Rainbow Six etc. They also learnt how to game the learning system so it looked like they were doing their work.

throwaway132448 2 hours ago|||

Good point well made.

mold_aid 10 minutes ago|||

>Now I’m certain that there exist those mythical human instructors who can do better,

You're certain that mythical instructors exist (?) who "can" do better?

Are human instructors more competent as teachers than AI teachers, or are AI teachers more competent as teachers than human teachers? No "this or that can happen," just a definitive statement please.

AI is likely a million times better student than my dimwit cybersec meatbags...er, majors, for sure, as well! Don't have a reliable way to measure or experience why/how, tho, so I'm not out here claiming it. Even if I did, why would I argue for their replacement?

IanCal 2 hours ago|||

They can be incredible. One on one teaching with an infinitely patient teacher who can generate interactive problems on the fly, for dollars a month? Wild. A year of paid ChatGPT would pay for about 9 hours of cheap tutoring here.

rockskon 2 hours ago||

That's not going to work out the way you think it will when a student won't even know how to ask questions.

repelsteeltje 2 hours ago|||

I found this interview [0] on the subject of AI in CS education on the Oxide & Friends podcast very illuminating. Of course, Brown University CS != All education, but interesting angle nevertheless.

[0] Episode webpage: https://share.transistor.fm/s/31855e83

pjc50 3 hours ago|||

"Education is just a CTF for the valuable flag of a credential. In this essay I will --"

magic_hamster 2 hours ago||

Education is also figured out. You just need to learn, do and practice for yourself. Telling the agent "to just do it for you" is tempting, but it's not learning. You need to be deliberate when you're trying to actually learn and internalize.

Also, you could spin up your own educational agent with very strict instructions on guiding the user instead of just doing the work. Of course you can always go around it but if you're making an effort to learn, this is a good middle ground.

himata4113 4 hours ago||

I was writing an obfuscator recently, I just had the model deobfuscate and optimize the code back to original and I kept improving the obfuscator until it couldn't. The funny thing is that after all this I also ended up with a really strong deobfuscator and optimizer which is probably more capable than most commercial tools.

The solution is just to make CTFs harder, but when do CTFs become too hard? Maybe the problem is that 'hard' CTFs are fundementally too 'simple' where it's just a logic chain and an exhaustive bruteforce towards a solution since there really are limited ways to express a solution in plain sight.

Or maybe human creativity has been exhausted and we're not so limitless as we thought. Only time will tell.

I had another idea spring to mind: we could hide two flags, one that could only be found by ai agents and not humans or tools written by humans.

Trung0246 1 hour ago||

Interesting, what I just did recently is basically the same of this as I tried to push the limit of js obfuscator as much as possible by keep forcing gpt/claude deobfuscate final output then having gpt improve the tool to break the deobfuscator.

Do you publish it somewhere? Here's a sample my my js obfuscator output: https://gist.github.com/Trung0246/c8f30f1b3bb6a9f57b0d9be94d...

koolala 3 hours ago||

A portion could require astral projection and computers can't do that. Or maybe just a VR mini-game like the 90s always imagined.

himata4113 3 hours ago||

bringing CTF solutions into the real world is a really good idea! I didn't even think of this until you mentioned it.

we have very powerful simulation tools so something like "project a pattern at these angles" wouldn't really work as you could simulate that.

I guess something cool is that we can make simulating the solution very expensive, but in real world it would be free since it's analog... As long as simulations take longer than it takes for a human to find a solution it would be a pretty good way to deal with it. I am sure people smarter than me can come up with something.

Maybe I was too early to dismiss human creativity.

dguest 2 hours ago||

Maybe CTF is dead, but there are plenty of fun problems in the real world -- ask any scientist, engineer, or medical researcher.

There are a million places where a computer can interact with a non-digital system in a loop.

- Tune an FPGA, or a whole data-center, or just a physical computer.

- Make a drone fly somewhere.

- Design a selective toxin (or anti-toxin).

Or, you know, get more people to click on adds. All totally possible to automate.

koolala 1 hour ago||

Using real-life calculators to add? Calculate the Flag. I don't think it is dead at all. It's like mixing in board game / escape room / science / engineeer/ medical research elements.

chrismorgan 3 hours ago||

Meta: this was submitted with the article’s title “The CTF scene is dead” which I found very easy to understand. It has just been updated to use the subtitle’s first sentence, “Frontier AI has broken the open CTF format”. I find that much harder to grasp, rather like a garden-path sentence. My immediate thoughts were that “Frontier” was a company name, and that there was some file format named CTF. If you don’t know about Capture The Flag contests, the change doesn’t help. If you do, I think the change makes it worse.

IanCal 3 hours ago||

If it helps I understand the second much better and feels less clickbaity and includes more info. I do agree with the points you made about the confusion although I find frontier a term used in this area a lot, “frontier AI models have” would probably resolve that.

Jenk 2 hours ago|||

If the title simply said "AI is out-performing humans at CTF" then none of this confusion exists. Nothing is "broken," we don't need to be superfluous with "frontier," and the point is still there.

IanCal 2 hours ago||

But the article is arguing it is broken. That’s the point. You can disagree but that’s very much that the author is writing about, not a curiosity, and that it’s these top models that are not custom security models.

jofzar 2 hours ago|||

Imo frontier is too niche and specific, if you know what a frontier model means then it's fine, but if you don't then it's negative/detrimental to the title.

"new" does the same thing and is probably just a better descriptor then frontier

jack_pp 2 hours ago|||

if you are on HN and have no idea what "frontier model" would mean maybe it's time you found out.

hbbio 2 hours ago||

I also misread the updated title.

"Frontier models break the open CTF format" is good

"Frontier AI..." means wtf is Frontier AI.

Because of course it exists (just googled it): https://frontierai.company/

rockskon 2 hours ago|||

But then you're not acting as a billboard promoting AI. Isn't that partly the point?

aaron695 1 hour ago|||

[dead]

jsoaoxhd 1 hour ago||

Why do people always hijack threads to discuss titles? Most articles have terrible titles. Just downvote it and move on.

dandellion 24 minutes ago||

Why do you contribute to making this thread longer? Just downvote an move on.

bornfreddy 22 minutes ago||

I guess this is very similar to what happened to demo scene, in some way. The limits are what makes these problems interesting, and once we have better machines / tools, the incredible skill is no longer prerequisite, making everything less interesting for participants. Sad, but - such is life...

yk 17 minutes ago||

There's something funny about complaining about cheating in a hacking competition.

Well actually I get it. In cycling motor doping, putting a hidden engine into the bike, seems more offensive than regular doping. I think this is because there is a continuum from eating well to taking supplements to injecting stuff, but having a engine breaks a fundamental idea about cycling. Similar hacking is about cleverly abusing the rules.

hoyd 3 hours ago||

«That feedback loop is breaking. If the visible scoreboard is dominated by teams using AI, a beginner is pushed toward using AI before they have built the instincts the AI is replacing. That is an anti-pattern. It prevents active learning, and active struggle is the bit that actually teaches you. It is also completely demotivating to put in real effort and see no visible progress because the ladder above you has been automated.»

This stands out to me, and speaks perhaps broader than the article itself? I’m sure this has been in the spotlight before, but well put for many areas I think.

black_knight 2 hours ago|

I see this with beginner programming students at university. They get AI to help them with assignments, with the intention of learning, but ultimately they do not get the understanding they would have if they had done the assignment themselves. Then they are at a deficit for learning more advanced topics.

My fear is that they never get to the level they need to be at to create good software even with the help of AI. So, although an expert with AI can create great software, that is not where we end up. In stead we will have vibe coded messes by people who barely have any grasp of what is going on.

SirHumphrey 3 hours ago||

Competitive programming scene always included offline competition and with AI they are becoming more important (and in general they were more fair even before). If CTFs are to survive, they should probably try to adopt this strategy.

You could even go so far that anything loaded on your computer is fair game, but not more than that (certain competitive programming competition for example allow unlimited amount of paper material - for CTFs you probably need much more than that, therefore electronic).

rurban 4 hours ago||

I don't do CTF's but took part at the security workshop for fun ~2 years with my Android phone only. I was first with the first simple challenge, but then couldnt continue because my phone was just too limited. But I watched what the others did. And a young Indian guy did everything with ChatGPT then. I found it silly, but amusing, because he actually got second. There was no Codex nor Claude then. Nowadays it must be dead for real, because I would solve everything with my agents, as I do in the real world.

parasti 2 hours ago||

I can't help but draw parallels with video games. Aimbots in competitive multiplayer games is a well defined issue: it's considered cheating and frowned upon, players caught cheating are banned from the game. Tool-assisted speedruns (TAS) where a player attempts a world record at completion in a single-player game is another face of the same concept (computers help you win), but one that is socially accepted as long as runs are clearly labelled as TAS.

ViscountPenguin 2 hours ago||

The biggest difference would be the fact that you can discover video game cheating through some kind of trace. Speed running communities go pretty hardcore on that kind of thing nowadays.

It's a lot harder to detect cheating when your only trace is how fast someone submitted the string CTF{DUck1e_Pwned}

justanotherjoe 2 hours ago||

Sure if the goal is entertainment and sports, you're right. However, unlike chess or counter strike it's downstream from a real needed utility. Like, is there a point to do it anymore? (ofc there is, but still, it's been devalued from the perspective of the 'real utility')

nrabulinski 21 minutes ago||

It’s literally not. The most interesting and satisfying CTFs have never been grounded in reality, it’s just been an expression of mastery, both from players and authors, with a few notable exceptions. But they’re that, exceptions, not the rule.

susam 4 hours ago|

I have normally found any sort of timed technical competition intimidating. Even so, about 6 or 7 years ago, after being persuaded by a colleague, I participated in a few CTFs. I am glad I did, back when this type of thing still meant something. I have kept a screenshot from one of the CTFs that I am quite fond of: https://susam.net/files/blog/ctf-2019.png

More comments...