Git LFS for workbook + the following prompt :
“Create a commit explains what has changed in the workbook since the last commit. Be brief, but explain the change in business terms as well as code change terms.”
LLMs work best when they can call tools (edit the sheet) and test their results in a loop.
It's like the "value seek" thing Excel has had since forever; "adjust these values until this cell is X"
Excel doesn't have any way to verify that every formula in that 60k line sheet is correct and someone hasn't accidentally replaced one with a static number for example.
I suspect similar tools could be made for Claude and other LLMs except that it wouldn't be plagued by the mind-numbing tedium of doing this sort of audit.
https://github.com/anthropics/skills/blob/main/document-skil...
I linked to the skill prompt just to more clearly explain the approach that's currently available to all Claude users.
It requires zero familiarity with git or command line.
Being able to select a few rows and then use plain language to describe what I want done is a time saver, even though I could probably muddle through the formulas if I needed to.
It is an entire agent loop. You can ask it to build a multi sheet analysis of your favorite stock and it will. We are seeing a lot of early adopters use it for financial modeling, research automation, and internal reporting tasks that used to take hours.
-stop using the free plan -don't use gemini flash for these tasks -learn how to do things over time and know that all ai models have improved significantly every few months
To see something much more powerful on Google Sheets than Gemini for free, you can add "try@tabtabtab.ai" to your sheet, and make a comment tagging "try@tabtabtab.ai" and see it in action.
If that is too much just go to ttt.new!
— Zack Korman, <https://x.com/ZackKorman/status/1974828240679166396>
Edit: found it on their other blog post https://www.anthropic.com/news/advancing-claude-for-financia...
Spend a few years in an insurance company, a manufacturing plant, or a hospital, and then the assertion that the frontier labs will figure it out appears patently absurd. (After all, it takes humans years to understand just a part of these institutions, and they have good-functioning memory.)
This belief that tier 5 is useless is itself a tell of a vulnerability: the LLMs are advancing fastest in domain-expertise-free generalized technical knowledge; if you have no domain expertise outside of tech, you are most vulnerable to their march of capability, and it is those with domain expertise who will rely increasingly less on those who have nothing to offer but generalized technical knowledge.
I don’t think the frontier labs have the bandwidth or domain knowledge (or dare I say skills) to do tier 5 tasks well. Even their chat UIs leave a lot to be desired and that should be their core competency.
However I would think more of elite data centers rather than commodity data centers. That's because I see Tier 4 being deeply involved in their data centers and thinking of buying the chips to feed their data centers. I wouldn't be so inclined to throw in my opinion immediately if I found an article showing this ordering of the tiers, but being a tweet of a podcast it might have just been a rough draft.
That OpenAI is now apparantly striving to become the next big app layer company could hint at George Hotz being right but only if the bets work out. I‘m glad that there is competition on the frontier labs tier.
https://docs.claude.com/en/docs/about-claude/models/overview