Posted by yashvg 2 days ago
happy to answer any questions. i think my higher level insight to paraphrase McLuhan, "first the model shapes the harness, then the harness shapes the model". this is the first model that combines cognition's new gb200 cluster, cerebras' cs3 inference, and data from our evals work with {partners} as referenced in https://www.theinformation.com/articles/anthropic-openai-usi...
regardless of what i'm allowed to say, i will personally defend that actually its increasingly less important the qualities of the base model you choose as long as its "good enough", bc then the RL/posttrain qualities and data takes over from there and is the entire point of differentiation
I think the real reason is that it's a Chinese model (I mean, come on) and your parent company doesn't want any political blowback.
As if it doesn't cost tens of millions to pre-train a model. Not to mention the time it takes. Do you want them to stall progress for no good reason?
I doubt current models from China are trained to do smart spying / injecting sneaky tool calls. But based on my Deep Learning experience with the models both training and inference, it's definitely possible to train a model to do this in a very subtle and hard to detect way...
So your point is valid and I think they should specify the base model for security concerns, or conduct safety evaluations on it before passing it to sensitive customers