Posted by fzliu 2 days ago
Programming copilots are often sold on how they can automate drudgery and boilerplate, which implies we are incapable of or uninterested in designing programming languages, tools, and patterns which do not require boilerplate or drudgery.
Teaching models to use traditional GUI apps implies we have given up on or are not even bothering to create proper hooks for an automation system to utilise.
Something about it feels wrong to me, because it bakes existing inefficiencies into the system. Can we really not solve the inefficiencies instead of pouring unfathomable amounts of compute into working around them?
Enabling automation will never be zero effort and anything more than zero effort for something with such a low ROI is a no-go by default. But increasingly, automation is actually seen as a danger to their business models and companies sometimes even go out of their way to prevent it.
Looking at the screen the same way a user does is the only way to win.
But not all software is written with those UI frameworks. Some use different widget frameworks, some immediate GUIs, others just render a webpage and either use HTML or fully render the controls themselves. And without everybody using the same standard, the only standard we have for parsing their output is the pixels they render to.
> My entire job seems to be repeating variations of "never start by forgetting the user's stated intent only to then attempt to guess it".
Can't wait to replace "apt get install" by "gpt get install" and then have it solve all the dependency errors by itself.
1. Claude for computer use
2. Various startup offerings—if you have recommendations, please list them
3. Established tools like Playwright, Selenium, and WebDriver, combined with screenshots and LLM-based guidance
What tools or approaches are actually working for building useful automation solutions?
I've yet to try it but my understanding is the repo here has got working code along with installation instructions:
Slight rough edges (to be expected) and you do need to read the README with attention but it's all par for the course. I had to install einops which wasn't in the requirements.txt and even though I had downloaded the HF models they released, it still needed to pull in another model when I first ran the demo.
This certainly seems like it has a lot of promise to make that much much much easier. Game UI's are less uniform so maybe this might be harder or not easily be applicable, but hopefully
I can't say exactly why. Maybe you feel like you haven't earned it. Maybe it's the idle nature of farming that we really enjoy...
If I don't enjoy the experience anymore that's fine with me too. I think I'd still feel a sense of accomplishment, feel like I'd advanced as a human and mastered my environment and machines for diving in here.
I don't feel the agency I want to have. These games make me want to extend myself, my agency. Playing them manually offers some very low grade enjoyment but that sense of missing out gnaws at me, and I'm not at all dissuaded by parent trying to ward me off, and if I do end up winning so hard I don't care anymore, me right now would regard that as a victory condition & rief from this pressure I feel about ineffectively plodding through as I do now.
Copying the repo and downloading the models through HuggingFace or manually does not seem to work, you get errors indicating missing files.
More likely they just slipped up with getting everything uploaded properly - it's easily done, and luckily easily corrected, so we'll likely see issues get resolved fairly swiftly.
Looks like a few tweaks made to the github repo ~13 hours ago which may explain the issues those had earlier and why it's now fine for me.
EDIT; surely it’s just broken, the repo does include .safetensor weights. Maybe the problem is the “suspicious”-flagged PyTorch weight for “icon detection”, whatever that means?