Posted by tosh 9 hours ago
The beauty of this benchmark is that it takes all of two seconds to come up with your own unique one. A seahorse on a unicycle. A platypus flying a glider. A man’o’war piloting a Portuguese man of war. Whatever you want.
Edit: someone needs to explain why this comment is getting downvoted, because I don't understand. Did someone's ego get hurt, or what?
It was sort of humorous for the maybe first 2 iterations, now it's tacky, cheesy, and just relentless self-promotion.
Again, like I said before, it's also a terrible benchmark.
I was expecting something more realistic... the true test of what you are doing is how representative is the thing in relation to the real world. E.g. does the pelican look like a pelican as it exists in reality? This cartoon stuff is cute but doesnt pass muster in my view.
If it doesn't relate to the real world, then it most likely will have no real effect on the real economy. Pure and simple.
In contrast, the only "realistic" SVGs I've seen are created using tools like potrace, and look terrible.
I also think the prompt itself, of a pelican on bicycle, is unrealistic and cartoonish; so making a cartoon is a good way to solve the task.
And I wonder how Gemini Deep Think will fare. My guess is that it will get half the way on some problems. But we will have to take an absence as a failure, because nobody wants to publish a negative result, even though it's so important for scientific research.
https://hn.algolia.com/?q=1stproof
This is exactly the kind of challenge I would want to judge AI systems based on. It required ten bleeding-edge-research mathematicians to publish a problem they've solved but hold back the answer. I appreciate the huge amount of social capital and coordination that must have taken.
I'm really glad they did it.
If Agents get good enough it's not going to build some profitable startup for you (or whatever people think they're doing with the llm slot machines) because that implies that anyone else with access to that agent can just copy you, its what they're designed to do... launder IP/Copyright. Its weird to see people get excited for this technology.
None of this good. We are simply going to have our workforces replaced by assets owned by Google, Anthropic and OpenAI. We'll all be fighting for the same barista jobs, or miserable factory jobs. Take note on how all these CEOs are trying to make it sound cool to "go to trade school" or how we need "strong American workers to work in factories".
The computer industry (including SW) has been in the business of replacing jobs for decades - since the 70's. It's only fitting that SW engineers finally become the target.
I don't think that's going to make society very pleasant if everyone's fighting over the few remaining ways to make livelihood. People need to work to eat. I certainly don't see the capitalist class giving everyone UBI and letting us garden or paint for the rest of our lives. I worry we're likely going to end up in trenches or purged through some other means.
Put another way, I’m on the capital side of the conversation.
The good news for labor that has experience and creativity is that it just started costing 1/100,000 what it used to to get on that side of the equation.
I am one of the “haves” and am not looking forward to the instability this may bring. Literally no one should.
these people always forget capitalism is permitted to exist by consent of the people
if there's 40% unemployment it won't continue to exist, regardless of what the TV/tiktok/chatgpt says
This is truly the dumbest statement I've ever seen on this site for too many reasons to list.
You people sound like NFT people in 2021 telling people that they're creating and redefining art.
Oh look peter@capital6.com is a "web3" guy. Its all the same grifters from the NFT days behaving the same way.
but forgot there's likely someone above them making exactly the same one about them
I imagine llm job automation will make people so poor that they beg to fight in wars, and instead of turning that energy against he people who created the problem they'll be met with hours of psyops that direct that energy to Chinese people or whatever.
We will see.
Not interested enough to pay $250 to try it out though.