Top
Best
New

Posted by simianwords 4 hours ago

Grok 4.3(docs.x.ai)
128 points | 153 commentspage 2
alyxya 3 hours ago|
Despite their attrition, this combined with their cursor partnership is likely going to make them competitive in coding agents soon.
mirekrusin 3 hours ago||
All those plans from providers should be sliders – prepay more, get more in return.
agunapal 3 hours ago||
Very competitive price for the speed and intelligence being offered!
OtherShrezzing 3 hours ago||
The tok/s stat is interesting. Since the dominant constraint on inference speed is hardware, it suggests X purchased far more compute than was really needed to serve the demand for their models.

Expensive miscalculation.

flir 3 hours ago|
Didn't a bunch of hardware that was destined for Tesla get redirected to xAI? I'm sure I remember something like that.
mikeyouse 2 hours ago||
Yep! Why his shareholders in Tesla abide by this kind of thing is beyond me, but he often mixes resources from completely unrelated companies: https://www.cnbc.com/amp/2024/06/04/elon-musk-told-nvidia-to...
happosai 3 hours ago||
I lost the trust in them when they added the racist "what about killing of Boers in south Africa" thing to their system prompt.

No way am I going to use a model where the backing has such blatantly obvious brain washing goals.

Hugsun 2 hours ago||
It is unbelievable that this is a controversial opinion.
miroljub 3 hours ago||
[flagged]
vrganj 3 hours ago|||
There is no non-bias. What you call unbiased is always just a reflection of your personal biases.

That being said, I am definitely against a model that is biased to be following the ideology of a far-right extremist.

Jtarii 3 hours ago||||
Musk bought a social media company for the specific purpose of getting Trump elected by turning it into a right wing propaganda machine. Have Anthropic/OpenAI/Google done something similar to that?
henry2023 2 hours ago|||
[dead]
BoredPositron 3 hours ago||
Yay, free tokens. I don't know why but grok always seems good fast in the free token phase and after that degrades.
Imustaskforhelp 4 hours ago||
Pelican riding a bike here: https://gist.github.com/SerJaimeLannister/f6de26bd0d0817e056...

(ran this on arena.ai direct chat and also tried to write this gist inspired by how simon writes his gists about pelicans)

Edit: just realized that I made pelican riding a bike instead of bicycle, which now makes sense as to why it hardened the bicycle to look tankier, going to compare this with pelican riding a bicycle if anybody else shares the pelican riding a bicycle.

gchamonlive 3 hours ago|
https://simonwillison.net/2025/Nov/13/training-for-pelicans-...

You should probably come up with variations, like a beaver riding a scooter or something, just to see what's what :)

Imustaskforhelp 3 hours ago||
Thanks I have generated both

beaver riding a scooter: https://gist.github.com/SerJaimeLannister/f6de26bd0d0817e056...

pelican riding a bicycle: https://gist.github.com/SerJaimeLannister/f6de26bd0d0817e056...

Personal opinion but the beaver one looks especially bad as compared to pelicans. Can we be for sure that this model of grok-4.3 hasn't been trained on pelican. Simonw in blog-post says that he will try with other creatures so I hope he does that but it does feel to me as the model/xAI is trying to cheat, Hope Simonw tests it out more.

Edit: Also added turtle riding a scooter, something which literally has images online or heck even teenage mutant ninja turtles and I thought that it would be able to pass this but it wasn't even able to generate this: https://gist.github.com/SerJaimeLannister/f6de26bd0d0817e056...

This literally looks more avocado than turtle. Perhaps this could be a bug from arena.ai or something else too, not sure but at this point waiting for simon's analysis.

gchamonlive 3 hours ago||
We can never be sure of course, but I think this is a very strong indication that pelican riding a bike is indeed going into the training dataset.

Thanks for generating those!

simianwords 4 hours ago||
https://artificialanalysis.ai/models/grok-4-3
nextaccountic 4 hours ago||
This puts Sonnet 4.6 above Opus 4.6 in the coding index.. kinda hard to trust those numbers.

(Also it puts Opus 4.7 universally above Opus 4.6, and I may be wrong but this doesn't seem to match the experience of most/many/some people. I think it's widely recognized that Anthropic is severely lacking compute and Opus 4.7 is a costs saving measure)

conception 48 minutes ago|||
What I’ve usually seen is 4.7 -> 4.5 -> 4.6 in terms of quality. Though 4.7 seems to hallucinate more than before.
manmal 3 hours ago|||
Anthropic themselves have (had?) this thing where Opus is used for planning and Sonnet for coding.
nextaccountic 1 hour ago||
I thought this was a costs saving measure: we plan with the frontier model / SOTA, then code with something cheaper.

But then, Anthropic employees don't have rate limits, right?

Alifatisk 3 hours ago|||
Does numbers don't look exciting at all? I may have gotten spoiled by releases from Qwen, Kimi and Z.ai who keep closing the gap between closed weight SOTA models and open weight. From my experience, Grok is only useful for one thing, and that's looking up things for you and gathering a consensus on topics. That's it.

Update, I noted that Grok 4.3 is in the "Most attractive quadrant", that's cool! It is also in the top 5 highest in "AA-Omniscience Index", good! Really good.

progbits 3 hours ago|||
What's with the charts and numbers?

It says #1 for speed but then in the chart it's #2. Also says #10 for intelligence but then it's #7 in the chart.

BoorishBears 4 hours ago||
What an exciting game we're playing, where the most popular leaderboard is completely made up and the stakes are in the trillions.
alfiedotwtf 3 hours ago||
If there was any model I wouldn’t trust, it wouldn’t be the ones from China, it would be the one from Elon Musk
Cthulhu_ 3 hours ago|
Thankfully it's not an either / or, I don't trust any models. This is a healthy attitude to have because you shouldn't trust anyone on the internet either, especially when it comes to specific subjects.
benrutter 2 hours ago|||
That's definitely a good approach. Although I get a little concerned about the resources put into convincing people that models (and especially Grok) are accurate. For example, X's "fact checked by Grok" approvals, which I've unfortunately heard people reference as meaningful.

Politically motivated models can still do a lot of damage that affects me (or "have a lot of impact" depending on whether you like the politics or not) even if I don't engage with them myself.

2ndorderthought 2 hours ago|||
I don't trust this. But by not trusting it I am inherently trusting it. But by trusting it I shouldn't.
khalic 3 hours ago|
This project is a gigantic waste of resources, it’s fine tuned on politics of the CEO, was used for CSAM generation and just sucks overall
johnnyApplePRNG 2 hours ago||
The resource waste he's talking about is horrendous, read more here: https://time.com/7308925/elon-musk-memphis-ai-data-center/
servo_sausage 3 hours ago|||
I like that there are models with divergent politics; the status quo being creepy corporate left silicon valley is not healthy or pleasant to interact with.

Even with grock it's only broadening things to creepy corporate right of silicon valley.

breezybottom 1 hour ago||
Silicon Valley...left? Huh?
spiderfarmer 3 hours ago||
It’s a model made for 36% of Americans. The rest of the world can’t care less.
2ndorderthought 3 hours ago||
Considering how few Americans there are and how little of that 39% even uses technology, that's what 20 million people at a maximum?
Hugsun 2 hours ago||
That seems like a decently sized market. Maybe not for an AI lab though.
2ndorderthought 2 hours ago||
Sure it's a good market for a normal company. For a social media company it's pretty isolated and really limits the products that can come out. But their current selling points: propaganda, csam, and psychosis engagement are quite strong amongst that population.
cindyllm 2 hours ago||
[dead]
More comments...