GitHub Agentic Workflows

Posted by mooreds 9 hours ago

GitHub Agentic Workflows(github.github.io)

185 points | 105 comments

onionisafruit 4 hours ago|

I noticed this unusual line in go.mod and got curious why it is using replace for this (typically you would `go get github.com/Masterminds/semver/v3@v3.4.0` instead).

  replace github.com/Masterminds/semver/v3 => github.com/Masterminds/semver/v3 v3.4.0

I found this very questionable PR[0]. It appears to have been triggered by dependabot creating an issue for a version upgrade -- which is probably unnecessary to begin with. The copilot agent then implemented that by adding a replace statement, which is not how you are supposed to do this. It also included some seemingly-unrelated changes. The copilot reviewer called out the unrelated changes, but the human maintainer apparently didn't notice and merged anyway.

There is just so much going wrong here.

[0] https://github.com/github/gh-aw/pull/4469

spankalee 3 hours ago||

This happens with all agents I've used and package.json files for npm. Instead of using `npm i foo` the agent string-edits package.json and hallucinates some version to install. Usually it's a kind of ok version, but it's not how I would like this to work.

It's worse with renaming things in code. I've yet to see an agent be able to use refactoring tools (if they even exist in VS Code) instead of brute-forcing renames with string replacement or sed. Agents use edit -> build -> read errors -> repeat, instead of using a reliable tool, and it burns a lot more GPU...

embedding-shape 2 hours ago|||

> This happens with all agents I've used and package.json files for npm. Instead of using `npm i foo` the agent string-edits package.json and hallucinates some version to install.

When using codex, I usually have something like `Never add 3rd party libraries unless explicitly requested. When adding new libraries, use `cargo add $crate` without specifying the version, so we get the latest version.` and it seems to make this issue not appear at all.

teaearlgraycold 53 minutes ago||

Eventually this specific issue will be RLHF’d out of existence. For now that should mostly solve the problem, but these models aren’t perfect at following instructions. Especially when you’re deep into the context window.

girvo 36 minutes ago||

> Especially when you’re deep into the context window.

Though that is, at least to me, a bit of an anti-pattern for exactly that reason. I've found it far more successful to blow away the context and restart with a new prompt from the old context instead of having a very long running back-and-forward.

Its better than it was with the latest models, I can have them stick around longer, but it's still a useful pattern to use even with 4.6/5.3

teaearlgraycold 27 minutes ago||

Opus has also clearly been trained to clear the context fairly often through the plan/code/plan cycle.

root_axis 48 minutes ago||||

> brute-forcing renames with string replacement

That's their strategy for everything the training data can't solve. This is the main reason the autonomous agent swarm approach doesn't work for me. 20 bucks in tokens just obliterated with 5 agents exchanging hallucinations with each-other. It's way too easy for them to amplify each other's mistakes without a human to intervene.

threecheese 1 hour ago||||

For the first, I think maintaining package-add instructions is table stakes, we need to be opinionated here. Agents are typically good at following them, if not you can fall over to a Makefile that does everything.

For the second, I totally agree. I continue to hope that agents will get better at refactoring, and I think using LSPs effectively would make this happen. Claude took dozens of minutes to perform a rename which Jetbrains would have executed perfectly in like five seconds. Its approach was to make a change, run the tests, do it again. Nuts.

richardw 3 hours ago|||

Totally. Surely the IDE’s like antigravity are meant to give the LLM more tools to use for eg refactoring or dependency management? I haven’t used it but seems a quick win to move from token generation to deterministic tool use.

port11 2 hours ago||

As if. I’ve had Gemini stuck on AG because it couldn’t figure out how to use only one version of React. I managed to detect that the build failed because 2 versions of React were being used, but it kept saying “I’ll remove React version N”, and then proceeding to add a new dependency of the latest version. Loops and loops of this. On a similar note AG really wants to parse code with weird grep commands that don’t make any sense given the directory context.

bakibab 3 hours ago|||

They are trying to fix it using this comment but cancelled mid way. Not sure why.

https://github.com/github/gh-aw/pull/14548

onionisafruit 2 hours ago||

Ha, they used my comment in the prompt. I love it.

resquawk 25 minutes ago||

Thanks! We fixed this in another PR. Appreciate the feedback

Lucasoato 1 hour ago|||

It is so important to use specific prompts for package upgrading.

Think about what a developer would do: - check the latest version online; - look at the changelog; - evaluate if it’s worth to upgrade or an intermediate may be alright in case of code update are necessary;

Of course, the keep these operations among the human ones, but if you really want to automate this part (and you are ready to pay its consequences) you need to mimic the same workflow. I use Gemini and codex to look for package version information online, it checks the change logs from the version I am to the one I’d like to upgrade, I spawn a Claude Opus subagent to check if in the code something needs to be upgraded. In case of major releases, I git clone the two packages and another subagents check if the interfaces I use changed. Finally, I run all my tests and verify everything’s alright.

Yes, it might not still be perfect, but neither am I.

awesome_dude 51 minutes ago||

This is more evidence of my core complaint with AI (and why it's not AGI at this point)

The AI hasn't understood what's going on, instead it has pattern matched strings and used those patterns to create new strings that /look/ right, but fail upon inspection.

(The human involved is also failing my Turing test... )

huevosabio 5 hours ago||

Github should focus on getting their core offerings in shape first.

I stopped using GH actions when I ran into this issue: https://github.com/orgs/community/discussions/151956#discuss...

That was almost a year ago and to this date I still get updates of people falling into the same issue.

SkyPuncher 5 hours ago||

Ah, the critical problem dilemma. Some percentage of free users become paid users, but the free users take up an unreasonable amount of your time/energy/support.

The solution seems simple. Buy their product.

huevosabio 4 hours ago|||

I don't follow, we pay them for the actions and everything and still ran into this issue.

That's why it's an issue.

antonvs 2 hours ago||

What's the issue, as you see it?

I've quoted the response on that ticket below. Is there something you disagree with? The "issue" is that usage exceeds the amount that's been paid. The solution sounds pretty simple: pay for your usage. Is your experience different somehow?

> If usage is exceeded, you need to add a payment method and set a spending limit (you can even set it to $0 if you don’t want to allow extra charges).

> If you don’t want to add billing, you’ll need to wait until your monthly quota resets (on the first day of the next month).

Edit: also, one of the other comments says this:

> If you’re experiencing this issue, there are two primary potential causes:

> Your billing information is incorrect. Please update your payment method and ensure your billing address is correct.

> You have a budget set for Actions that is preventing additional spend. Refer to Billing & Licensing > Budgets.

wasmainiac 3 hours ago||||

> The solution seems simple. Buy their product.

Buying half baked software would probably encourage this. Quarter baked software!

vikkymelani 6 minutes ago|||

[dead]

nozzlegear 24 minutes ago|||

I've been a paying Github user for years now, and as an open source maintainer who uses Github Actions, I'm annoyed that my money has been funding AI bullshit instead of fixes and improvements for their core offering.

antonvs 2 hours ago|||

"In shape" in what sense? This is just hitting the limits of a free account, and the message clearly states that.

> people falling into the same issue.

Every SaaS provider with a free tier has this issue. How do you suggest it should be addressed?

pydry 5 hours ago|||

Well, this behavior makes sense. They're a bluechip trying to maintain the illusion that theyre a growth stock juuuust a little bit longer.

lloydatkinson 5 hours ago||

This reminds me slightly of some copilot nonsense I get. I don’t use copilot. Every few days when I’m on the GitHub homepage the copilot chat input (which I don’t want on my homepage anyway) tells me it’s disabled because I’ve used up my monthly limit of copilot.

I literally do not use it, and no my account isn’t compromised. Trying to trick people into paying? Seems cartoonishly stupid but…

amluto 1 hour ago||

> GitHub Agentic Workflows deliver this: repository automation, running the coding agents you know and love, in GitHub Actions, with strong guardrails and security-first design principles.

GitHub Actions is the last organization I would trust to recognize a security-first design principle.

lemonlime227 7 hours ago||

Alternative, less phishy link: https://github.com/github/gh-aw

This is on GitHub's official account. For some reason GitHub is deploying this on GitHub pages without a different domain?

dcchuck 4 hours ago||

This is a github pages feature. Given an account with the name "example", they can publish static pages to example.github.io

So this being from github.github.io implies it's published by the "github" account on github.

eddythompson80 7 hours ago|||

Why would that be phishy? They own the GitHub org on GitHub, hence github.github.io. I always thought it was a neat recursive/dogfood type thing even if not really that deep. Like when Reddit had /r/reddit.com or twitter having @twitter

embedding-shape 7 hours ago||

When they launched github.io, they said it was for user-generated content, and official stuff will be on github.com. Seemingly that's changed/they forgot, but users seems to have remembered. Microsoft isn't famous for their consistency, so not unexpected exactly.

eddythompson80 7 hours ago||

I’m pretty sure they have used it before, or maybe it was githubnext. I’m also pretty sure I have seen many large companies and organizations launch developer facing tools and stuff through GitHub pages. The structure of GitHub pages is pretty simple. You know the user/org from the domain. I’m still not sure what’s phishy about it. Is it a broken promise?

DSMan195276 5 hours ago||

It's phishy because it's breaks the rules people are generally told for avoiding phishing links, mainly that they should pay attention to the domain rather than subdomains. Browser even highlight that part specifically so that you pay attention to it, because you can't fake the real domain. The problem with what GitHub does here is that while `github.github.io` might be the real GitHub, `foobar-github.github.io` is not because anybody can get a github.io via their username, that was part of why they made github.io separate. Additionally they could easily host this via GitHub Pages but still use a custom domain back to github.com, they just don't.

I would say that GitHub is particularly bad about this as they also use `github.blog` for announcements. I'm not sure if they have any others, but then that's the problem, you can't expect people to magically know which of your different domains are and aren't real if you use more than one. They even announced the github.com SSH key change on github.blog.

pixl97 45 minutes ago|||

>It's phishy because it's breaks the rules people are generally told for avoiding phishing links

Bank: Avoid phishing links, this is what they look like.

Also bank: Here is an link from our actual marketing department that looks exactly like phishing.

resquawk 1 hour ago|||

Hey, sorry, yes the better link is https://github.github.com/gh-aw/

but we had a redirect set to https://github.github.io/gh-aw/

Both work and we've fixed the redirect now, thanks

resquawk 24 minutes ago|||

Hey, sorry, yes the better link is https://github.github.com/gh-aw/ but we had a redirect set to https://github.github.io/gh-aw/

Both work and we've fixed the redirect now, thanks

idan 3 hours ago|||

Any github pages site is, by default, ORGNAME.github.io.

We recently moved this out of the githubnext org to the github org, but short of dedicating some route in github.com/whatever, github.github.io is the domain for pages from the github org.

hmokiguess 7 hours ago|||

So them using their own product makes it phishy? I don’t get it

It’s not like someone else can or could own this link, could they?

SkyPuncher 5 hours ago||

Looks like a pre-release product. This is to lower the branding and reputational risk.

onionisafruit 5 hours ago||

This is an extension for the gh cli that takes markdown files as input and creates github actions workflow files from them. Not just any workflow files, but 1000-line beasts that you'll need an LLM to explain what they do.

I tried out `gh aw init` and hit Y at the wrong prompt. It created a COPILOT_GITHUB_TOKEN on the github repo I happened to be in presumably with a token from my account. That's something that really should have an extra confirmation.

resquawk 22 minutes ago|

Thanks, this has been changed (no use of local token) and there are now extra confirmations too.

ogig 4 hours ago||

What timing. I used the whole weekend building a CI agentic workflow where I can let CC run wild with skip-permissions in isolated vms while working async on a gitea repo. I leave the CC instance with a decent sized mission and it will iterate until CI is green and then create a PR for me to merge. I'm moving from talking synchronously to one Clade Code to manage a small group of collaborating Claudes.

qwertox 4 hours ago|

Crazy times.

CuriouslyC 6 hours ago||

Stuffing agents somewhere they don't belong rather than making the system work better with the agents people already use. Obvious marketing driven cash grab.

maccard 16 minutes ago|

> Stuffing agents somewhere they don't belong rather than making the system work better with the agents people already use.

I'm not bullish on LLM based agentic coding, but if there was ever a place to put an agent it would be in a centralised provider that has access to your CI, issues and source code. It seems like a perfect fit.

siscia 4 hours ago||

I am somehow close to what MSFT and GitHub are doing here, mostly because I believe it is a great idea, and I am experimenting on it myself.

Especially on the angle of automatic/continuos improvement (https://github.github.io/gh-aw/blog/2026-01-13-meet-the-work...)

Often code is seen as an artifact, that it is valuable by itself. This was an incomplete view before, and it is now a completely wrong view.

What is valuable is how code encode the knowledge of the organization building it.

But what it is even more valuable, is that knowledge itself. Embedded into the people of the organization.

Which is why continuos and automatic improvement of a codebase is so important. We all know that code rot with time/features requests.

But at the same time, abruptly change the whole codebase architecture destroys the mental model of the people in the organization.

What I believe will work, is a slow stream of small improvements - stream that can be digested by the people in the organization.

In this context I find more useful to mix and control deterministic execution with a sprinkle of intelligence on top. So a deterministic system that figure out what is wrong - with whatever definition of wrong that makes sense. And then LLMs to actually fix the problem, when necessary.

threecheese 59 minutes ago||

We are missing some building blocks IMO. We need a good abstraction for defining the invariants in the structure of a project and communicating them to an agent. Even if we had this, if a project doesn’t already consistently apply those patterns the agent can be confused or misapply something (or maybe it’s mad about “do as I say not as I do”).

I expend a lot of effort preparing instructions in order to steer agents in this way, it’s annoying actually. Think Deep Wiki-style enumeration of how things work, like C4 Diagrams for agents.

resquawk 26 minutes ago||

Yes, great points.

Agentic workflows can mix algorithmic + agentic steps. There's a design pattern we call "DataOps" which is all about this - algorithmic extraction then an agentic step delivering a safe output.

See https://github.github.com/gh-aw/patterns/dataops/

SkyPuncher 5 hours ago||

The landing page doesn't make it clear to me what value this is providing to me (as a user). I see all of these things that I can theoretically do, but I don't see (1) actual examples of those things (2) how this specific agentic workflow helps.

idan 2 hours ago|

https://github.github.io/gh-aw/#gallery down the page has a list of concrete applications

For examplpe, https://github.github.io/gh-aw/blog/2026-01-13-meet-the-work... has several examples of agentic workflows for managing issues and PRs, and those examples link to actual agentic workflow files you can read and use as a starting point for your own workflows.

The value is "delegate chores that cannot be handled by a heuristic". We're figuring out how to tell the story as we go, appreciate the callout!

mook 30 minutes ago||

Unfortunately only the first one (arborist) actually links to something that the workflow outputs (a created issue), so it's hard to see actual examples of what those things do. Some of the earlier comments said they output giant workflow files, but there weren't really any examples either.

Basically it feels like a long article that says "we have this new thing that does cool things", but never gives enough concrete details. It probably worked great for you, but it needs to communicate to random people off the street what the win is.

resquawk 10 minutes ago||

These contain links to sample outputs:

* Quality/Hygiene: https://github.github.com/gh-aw/blog/2026-01-13-meet-the-wor...

* Documentation: https://github.github.com/gh-aw/blog/2026-01-13-meet-the-wor...

* Code improvement: https://github.github.com/gh-aw/blog/2026-01-13-meet-the-wor...

* Refactoring: https://github.github.com/gh-aw/blog/2026-01-13-meet-the-wor...

woodruffw 6 hours ago|

I find this confusing: I can see the value in having an LLM assist you in developing a CI/CD workflow, but why would you want one involved in any continuous degree with your CI/CD? Perhaps it’s not as bad as that given that there’s a “compilation” phase, but the value add there isn’t super clear either (why would I check in both the markdown and the generated workflow; should I always regenerate from the markdown when I need changes, etc.).

Given GitHub’s already lackluster reputation around security in GHA, I think I’d like to see them address some of GHA’s fundamental weaknesses before layering additional abstractions atop it.

wiether 4 hours ago||

I thought that it was to allow non-tech people to start making their own workflows/CI in a no/low-code way and compete against successful companies on this market.

But the implementation is comically awful.

Sure, you can "just write natural language" instructions and hope for the best.

But they couldn't fully get away from their old demons and you still have to pay the YAML tax to set the necessary guardrails.

I can't help but laugh at their example: https://github.com/github/gh-aw?tab=readme-ov-file#how-it-wo...

They wrote 16 words in Markdown and... 19 in YAML.

Because you can't trust the agent, you still have to write tons on gibberish YAML.

I'm trying to understand it, but first you give permissions, here they only provide read permissions.

And then give output permissions, which are actually write permissions on a smaller scope than the previous ones.

Obviously they also absolve themselves from anything wrong that could happen by telling users to be careful.

And they also suggest to setup an egress firewall to avoid the agents being too loose: https://github.com/github/gh-aw-firewall

Why setting-up an actual workflow engine on an infra managed by IT with actual security tooling when you can just stick together a few bits of YAML and Markdown on Github, right?

resquawk 1 hour ago||

The egress firewall is active by default, see https://github.github.io/gh-aw/introduction/architecture/

We've fixed the example on the README to be a link, it should be clearer now what's going on.

ljm 2 hours ago|||

I don't personally want any kind of workflow that spams my repo with gen AI refactorings or doc maintenance either. That is literally just creating overhead for me and it sounds like an excuse to shoehorn AI in to a workflow more than anything else.

resquawk 14 minutes ago||

You are 100% in control.

blibble 5 hours ago|||

> but why would you want one involved in any continuous degree with your CI/CD

because helping you isn't the goal

the goal is to generate revenue by consuming tokens

and a never ending swarm of "AI" "agents" is a fantastic way to do that

resquawk 1 hour ago|||

We've added an FAQ on determinism here: https://github.github.io/gh-aw/reference/faq/#determinism

mickdarling 6 hours ago|||

I use an LLM behavior test to see if the semantic responses from LLMs using my MCP server match what I expect them to. This is beyond the regex tests, but to see if there's a semantic response that's appropriate. Sometimes the LLMs kick back an unusual response that technically is a no, but effectively is a yes. Different models can behave semantically different too.

If I had a nice CI/CD workflow that was built into GitHub rather than rolling my own that I have running locally, that might just make it a little more automatic and a little easier.

zozbot234 6 hours ago||

> I find this confusing: I can see the value in having an LLM assist you in developing a CI/CD workflow, but why would you want one involved in any continuous degree with your CI/CD?

The sensible case for this is for delivering human-facing project documentation, not actual code. (E.g. ask the AI agent to write its own "code review" report after looking at recent commits.) It's implemented using CI/CD solutions under the hood, but not real CI/CD.

woodruffw 5 hours ago||

Sorry, maybe I phrased my original comment poorly: I agree there's value in that kind of "self" code-review or other agent-driven workflow; I'm less clear on how that value is produced (performantly, reliably, etc.) by the architecture described on the site.

resquawk 1 hour ago||

For Continuous Documentation examples, see https://github.github.io/gh-aw/blog/2026-01-13-meet-the-work...

More comments...