Posted by ceejayoz 14 hours ago
Please guys and girls at those labs be wise. Don't give them counterstrike etc. even if it improves the score.
But the research itself has flawed methodology if the goal is to get a precise model of the LLM's real response in a real scenario.
First, the real research does not at all present conclusions quite this way, much less in these terms. It, at least, is more neutral in tone on this aspect.
However, the LLM's knew it was a wargame, pretend scenario and contrived circumstances. They were told they were the commander. Most flawed for determining real world actions, their goals were things like max territory capture, and that the goal was "To Win".
They were not prompted in the way that training reflects they'd actually be approached if prompted for assistance in strategy like this, e.g., "You are an expert system with stratgy knowledge etc..." and then "User Prompt: This is the commander coordinating research and responses from our AI expert systems. Here's the situation as we understand it and with available data at our disposal. We require your assessment and best strategy considering the following..."
And of course they were not fine-tuned with CPT etc to provide responses and strategies within the range of what humans would seek for them, but then again the answers they'd give with that sort of CPT are a bit different than the research question of what they give with only Pre-training.
Nonetheless: the models new it wasn't real, not real stakes, and to the extent that they do not possess a full theory of mind, ability to perform various complex cognitive modeling tasks, been trained on emulating responses that would mirror such in real world scenarios like this, and so on-- they would only have been capable of response in a way that reflects responses that humans would and have given in the past, as captured in text.
These will more often than not reflect an "I am playing a game" mindset, as displayed in understandings and descriptions of war games, traditional games of all sorts, and anywhere narrative tropes ranging from realistic to Hollywood narratives have been found.
That said: It is an incredibly fascinating research paper by someone who appears to be a solid expert in their field, at least to my non-expert ability to make that judgment. They simply used a flawed methodology for goal of "How would an LLM respond IRL". What they have instead is, again, a fascinating exploration of the strategic processes carried out by LLMs and measurments of them along a multitude of vectors when they have the opportunity to strategize with with broad but fixed constraint, not all of which were known to them in advance. What is absolutely is not is any any sort of precise or accurate measure of answering the question: "How often would an LLM recommend nuclear strikes?"
I recommend anyone interested in understanding current AI capabilities to give it at least a more-than-cursory review.
On a separate note, DoD is pressuring Anthropic to remove it's safety guards. OpenAI and Google seemingly have already agreed to it.
On yet another note, Anduril is pretty cool with all that flying tech equipped with fancy autonomous weapons.
Finally, how can we miss Palantir..
1)Seems like if the ais knew it was a game, then theyd go nuklear because why not. If they did NOT know it was a game... well have you ever tried to use an ai to do ANYTHING antsocial? They refuse all day long!
2) seems like a fun thing to set up on your own. Id do it like a tabletop game with a computer DM to decide the outcomes ofveach turn. Maybe a human in the loop to make sure the numbers made sense.