Top
Best
New

Posted by kmdupree 23 hours ago

SWE-bench Verified no longer measures frontier coding capabilities(openai.com)
316 points | 170 commentspage 5
techpulselab 21 hours ago|
[dead]
pylonpeng 11 hours ago||
[flagged]
vdalhambra 19 hours ago||
[dead]
hibouaile 16 hours ago||
[dead]
chhxdjsj 16 hours ago||
[dead]
ryguz 20 hours ago||
[dead]
tripleee 18 hours ago||
[dead]
huflungdung 20 hours ago||
[dead]
neversupervised 22 hours ago|
Terminal Bench is the future
embedding-shape 22 hours ago|
First, you might want to say why you think so, otherwise this is just borderline spam. Secondly, when your praise things (without motivation or reasoning even), and you've contributed to that specific thing, please say that up front instead of just praising the thing, again it makes it look like spam otherwise.