Posted by kcorbitt 3 days ago
That said, if there isn't already, perhaps there should be a !!!BIG WARNING!!! around leaving it to its own devices... or rather, your devices.
I only access mine from a VM that does just that and I still have to log on every single time.
It is going to be same with malware.
This is regardless of it being from a trusted machine or merchant from which you’ve purchased before.
There are probably some cases where this is not true (thinking people without a banking app) but I get the 3D verify for every transaction I make regardless of payment method or vendor.
I can't think of a single bank app/site that requires 2FA on every login; most have a "trusted device" option and that cookie becomes your "something you have" second factor for future logins.
"What is my purpose. Existence is pain."
I think in-browser actions are much safer and can be more predictable with easier to implement safeguards, but I would love to see how this concept pan out in the future!
PS: you can check it out on GitHub: https://github.com/SamDc73/WebTalk/
Please let me know what you guys think!
My limited testing has produced okay result for a trivial use case and very disappointing results for a simple use case.
Trivial: what is the time. | Claude: took screnshot and read the time off the bottom right. | Cost: $0.02
Simple: download a high resolution image of singapore skyline and set it as desktop wallpaper | Claude: description of steps looks plausible but actions are wild and all over the place. opens national park service website somehow and only other action it is able to do is right click a couple of times. failed! | Cost: $0.37
Long way to go before it can be used for even hobby use cases I feel.
PS: is it possible that the screenshots include a image of Agent.exe itself and that is creating a poor feedback loop somehow?
Given time I suspect that strange actions made by AI agents will become the new “ducking” autocorrect.
Finishing up a feature on a side project at 1am.
Think “oh I know, I’ll have Computer Use run some regression tests on it.”
Run computer Use and walk away to get a drink.
While you’re gone Computer Use opens a browser and goes to Facebook. Then Likes a photo that your ex took at the beach… at 1am…
It will interesting to see how this evolves. UI automation use case is different from accessibility do to latency requirement. latency matters a lot for accessibility not so much for ui automation testing apparatus.
I've often wondered what the combination of grammar-based speech recognition and combination with LLM could do for accessibility. Low domain Natural Language Speech recognition augmented by grammar based speech recognition for high domain commands for efficiency/accuracy reducing voice strain/increasing recognition accuracy.
Regardless, not once in my life have I ever thought "man it's way too time consuming and onerous for me to spend my money. I wish there was a way for me to spend my money faster and with less oversight."
Like, right now, I want to buy an e-bike under $500, any Chinese brand will do. And I want it to look at Reddit and stuff to see what people have said etc. etc.
But I'm not going to do it because it takes too long. If machine can do it, fine by me.
Also probably a bad idea for 99+% of people