Posted by suchintan 3 days ago
Our open source repo is at https://github.com/Skyvern-AI/Skyvern, and we're excited to share our cloud version with you (https://app.skyvern.com) :)
Skyvern allows you to define a single (or a series of) goal-based prompts to instruct an agent to complete complex tasks on websites. Here’s a quick demo of Skyvern: https://www.loom.com/share/76b231309df74a528061fcf102e1967f
We built this to solve a specific problem: building browser automations often requires companies to either hire people and scale out operations teams to do tedious manual work, or hire developers to use products like UI-Path or Selenium to build automations.
Code-based solutions always run into the same problem: they’re brittle (wow this website added a new pop-up dialog and my script broke), and fail to achieve the same objective across multiple websites (how can I fill out a contact-us form on hundreds of different websites?)
We did a Show HN a few months ago (https://news.ycombinator.com/item?id=39706004), and since then, we’ve onboarded customers for a wide variety of use cases: generating insurance quotes on websites like Geico.com; applying to jobs on websites like lever.co; automating filing permits in local government portals; registering new corporations for employment identification; fetching invoices from hundreds of different portals such as hydroone.com; automating purchasing on a handful of e-commerce websites like zooplus.com; and filling out contact us forms on a bunch of random smb websites (such as HVAC websites).
To be able to service all of these, we’ve built and open-sourced quite a few interesting features:
(1) a fully-featured React application allowing you to see every action Skyvern is taking in real-time;
(2) livestreaming browser instances to allow our users to see what Skyvern is doing when running inside of a docker container;
(3) authenticated sessions, integrating with Bitwarden and allowing users to specify Email + Phone + QR-code based 2FAs;
(4) “workflows” allowing users to chain multiple goal-based prompts together, which can handle tasks like invoice downloading, or automating purchasing pipelines;
(5) processing HTML Elements (ex. identifying + summarizing SVGs) and performing website interactions (ex. Iterating over dynamic autocompletes to fill in address information correctly)
(6) “cached workflows”, allowing Skyvern to memorize previous interactions (ie text inputs) and re-use them in future runs.
We’ve also been blessed with a few model advancements to solve some of the cost concerns the community brought up. Skyvern’s token costs went down 80% from $15 / 1M tokens (GPT-4V) to $2.50 / 1M tokens (GPT-4O)
Despite the model costs going down 80%, Skyvern is still quite expensive to run, so we give every new user $5 of credits to try it out and see if it can be useful for you.
We would be honored if you could give it a try at https://app.skyvern.com and share some feedback with us, and we look forward to any and all of your comments!
Like this: Could I use this to pull screenshots or PDFs of my grocery receipts from a major grocery chain?
I never realized how important it is to track invoices in Europe (where VAT needs to be closely tracked), and a large % of vendors require you to log into their portal to fetch them
Any plans on bundling a local LLM / supporting local LLMs?
This isn't true anymore -- we just need to build and launch support for it
What's the use case here exactly? Sorry for being a bit pessimistic, but this sounds like an easy way to automatically send a lot of spam.
I want to use this to automate approving/declining group members for our facebook group which is approaching half million members and fb admin tools are pretty lacking
Unfortunately the mobile experience is pretty bad - practically unusable. I'd expect any web application made in the last decade to be mobile-first.
We'll improve it soon!
There are many back office tasks where people copy data from page 1 into a form of page 2.
The craziest one we heard about was this government portal in India that was hard to automate because halfway through the portal you had to refresh the page a bunch of times to get a button to show up
Question, if it's computer vision based, does that mean that it can be trivially ported to support desktop automations?