Launch HN: Skyvern (YC S23) – open-source AI agent for browser automations

Posted by suchintan 10/24/2024

Launch HN: Skyvern (YC S23) – open-source AI agent for browser automations(github.com)

Hey HN, we’re Suchintan and Shu from Skyvern (https://www.skyvern.com). We’re building an open source tool to help companies automate browser-based workflows using LLMs.

Our open source repo is at https://github.com/Skyvern-AI/Skyvern, and we're excited to share our cloud version with you (https://app.skyvern.com) :)

Skyvern allows you to define a single (or a series of) goal-based prompts to instruct an agent to complete complex tasks on websites. Here’s a quick demo of Skyvern: https://www.loom.com/share/76b231309df74a528061fcf102e1967f

We built this to solve a specific problem: building browser automations often requires companies to either hire people and scale out operations teams to do tedious manual work, or hire developers to use products like UI-Path or Selenium to build automations.

Code-based solutions always run into the same problem: they’re brittle (wow this website added a new pop-up dialog and my script broke), and fail to achieve the same objective across multiple websites (how can I fill out a contact-us form on hundreds of different websites?)

We did a Show HN a few months ago (https://news.ycombinator.com/item?id=39706004), and since then, we’ve onboarded customers for a wide variety of use cases: generating insurance quotes on websites like Geico.com; applying to jobs on websites like lever.co; automating filing permits in local government portals; registering new corporations for employment identification; fetching invoices from hundreds of different portals such as hydroone.com; automating purchasing on a handful of e-commerce websites like zooplus.com; and filling out contact us forms on a bunch of random smb websites (such as HVAC websites).

To be able to service all of these, we’ve built and open-sourced quite a few interesting features:

(1) a fully-featured React application allowing you to see every action Skyvern is taking in real-time;

(2) livestreaming browser instances to allow our users to see what Skyvern is doing when running inside of a docker container;

(3) authenticated sessions, integrating with Bitwarden and allowing users to specify Email + Phone + QR-code based 2FAs;

(4) “workflows” allowing users to chain multiple goal-based prompts together, which can handle tasks like invoice downloading, or automating purchasing pipelines;

(5) processing HTML Elements (ex. identifying + summarizing SVGs) and performing website interactions (ex. Iterating over dynamic autocompletes to fill in address information correctly)

(6) “cached workflows”, allowing Skyvern to memorize previous interactions (ie text inputs) and re-use them in future runs.

We’ve also been blessed with a few model advancements to solve some of the cost concerns the community brought up. Skyvern’s token costs went down 80% from $15 / 1M tokens (GPT-4V) to $2.50 / 1M tokens (GPT-4O)

Despite the model costs going down 80%, Skyvern is still quite expensive to run, so we give every new user $5 of credits to try it out and see if it can be useful for you.

We would be honored if you could give it a try at https://app.skyvern.com and share some feedback with us, and we look forward to any and all of your comments!

327 points | 74 commentspage 2

modo_ 10/24/2024|

Congrats on the launch! This is really cool - one of the applications of LLM I find most compelling. I've seen so many back office processes that have hundreds of steps, are incredibly error prone, and traditionally couldn't be automated due to API limitations. Solutions like Skyvern are going to supercharge businesses that have had historically low margins due to the number of humans required. (Not as a replacement for a human, but as a force multiplier)

suchintan 10/24/2024|

The most fascinating part is how tough that work really is. Everyone we've talked to loathes the manual stuff, but until a better solution comes out, you have to allocate X% of your time to it

hannesle 10/30/2024||

Hi, looks cool! Congratulations. Will check it out and maybe add it to https://ai-tools.directory for people looking for such solutions!

drewsonian 10/24/2024||

This is great, and I can think of several business uses and some personal.

Like this: Could I use this to pull screenshots or PDFs of my grocery receipts from a major grocery chain?

suchintan 10/24/2024|

Yes! We're helping a few companies with this right now. This use-case actually surprised me.

I never realized how important it is to track invoices in Europe (where VAT needs to be closely tracked), and a large % of vendors require you to log into their portal to fetch them

delusional 10/24/2024||

The plaintext version of your signup email replaces the ampersand in the url with an & XML entity. You probably don't want that.

suchintan 10/24/2024|

Interesting. We will fix it

jackb4040 10/24/2024||

> You won't be able to run Skyvern unless you enable at least one provider.

Any plans on bundling a local LLM / supporting local LLMs?

suchintan 10/24/2024|

We have an open issue for this right now -- we would LOVE some contributions here. The biggest problem until Llama 3.2 came out was that most (good) open source llms were text-only, and Skyvern needs vision to perform well

This isn't true anymore -- we just need to build and launch support for it

socksy 10/24/2024||

In theory to support ollama all you should need to do is be able to change the URL that would otherwise go to OpenAI, and select the model. The only gotcha is that the llama3.2 builds for ollama are currently text only — however they've just added support for arbitrary hugging face models so you're not limited by the officially supported models.

ganeshkrishnan 10/24/2024||

awesome work. I had the github starred from the day I saw on Show HN but never got around to using it.

I want to use this to automate approving/declining group members for our facebook group which is approaching half million members and fb admin tools are pretty lacking

suchintan 10/24/2024|

Thank you for the star! We had someone talk about us the other day on r/localllama (https://www.reddit.com/r/LocalLLaMA/comments/1g9zhbd/if_your...) and I still couldn't believe that we ever got past 50 stars

imp0cat 10/25/2024||

> how can I fill out a contact-us form on hundreds of different websites?

What's the use case here exactly? Sorry for being a bit pessimistic, but this sounds like an easy way to automatically send a lot of spam.

BrandiATMuhkuh 10/24/2024||

Congratulations on the launch. This is really cool. I was recently tinkering with the same idea. But based on a browser extension.

There are many back office tasks where people copy data from page 1 into a form of page 2.

suchintan 10/24/2024|

Yeah we've been surprised by how many interesting things companies do in the background to keep them running

The craziest one we heard about was this government portal in India that was hard to automate because halfway through the portal you had to refresh the page a bunch of times to get a button to show up

selimthegrim 10/24/2024||

The railway ticket site?

suchintan 10/24/2024||

It was a state level permit website I think. Very interesting!

bluerooibos 10/25/2024||

Looks super interesting!

Unfortunately the mobile experience is pretty bad - practically unusable. I'd expect any web application made in the last decade to be mobile-first.

suchintan 10/25/2024|

Yep. This is totally fair feedback -- we're still a super early product and haven't had a chance to optimize the phone experience.. largely because it's tough to see the magic from the phone

We'll improve it soon!

TZubiri 10/25/2024|

Sounds good.

Question, if it's computer vision based, does that mean that it can be trivially ported to support desktop automations?

More comments...