Top
Best
New

Posted by suchintan 10/24/2024

Launch HN: Skyvern (YC S23) – open-source AI agent for browser automations(github.com)
Hey HN, we’re Suchintan and Shu from Skyvern (https://www.skyvern.com). We’re building an open source tool to help companies automate browser-based workflows using LLMs.

Our open source repo is at https://github.com/Skyvern-AI/Skyvern, and we're excited to share our cloud version with you (https://app.skyvern.com) :)

Skyvern allows you to define a single (or a series of) goal-based prompts to instruct an agent to complete complex tasks on websites. Here’s a quick demo of Skyvern: https://www.loom.com/share/76b231309df74a528061fcf102e1967f

We built this to solve a specific problem: building browser automations often requires companies to either hire people and scale out operations teams to do tedious manual work, or hire developers to use products like UI-Path or Selenium to build automations.

Code-based solutions always run into the same problem: they’re brittle (wow this website added a new pop-up dialog and my script broke), and fail to achieve the same objective across multiple websites (how can I fill out a contact-us form on hundreds of different websites?)

We did a Show HN a few months ago (https://news.ycombinator.com/item?id=39706004), and since then, we’ve onboarded customers for a wide variety of use cases: generating insurance quotes on websites like Geico.com; applying to jobs on websites like lever.co; automating filing permits in local government portals; registering new corporations for employment identification; fetching invoices from hundreds of different portals such as hydroone.com; automating purchasing on a handful of e-commerce websites like zooplus.com; and filling out contact us forms on a bunch of random smb websites (such as HVAC websites).

To be able to service all of these, we’ve built and open-sourced quite a few interesting features:

(1) a fully-featured React application allowing you to see every action Skyvern is taking in real-time;

(2) livestreaming browser instances to allow our users to see what Skyvern is doing when running inside of a docker container;

(3) authenticated sessions, integrating with Bitwarden and allowing users to specify Email + Phone + QR-code based 2FAs;

(4) “workflows” allowing users to chain multiple goal-based prompts together, which can handle tasks like invoice downloading, or automating purchasing pipelines;

(5) processing HTML Elements (ex. identifying + summarizing SVGs) and performing website interactions (ex. Iterating over dynamic autocompletes to fill in address information correctly)

(6) “cached workflows”, allowing Skyvern to memorize previous interactions (ie text inputs) and re-use them in future runs.

We’ve also been blessed with a few model advancements to solve some of the cost concerns the community brought up. Skyvern’s token costs went down 80% from $15 / 1M tokens (GPT-4V) to $2.50 / 1M tokens (GPT-4O)

Despite the model costs going down 80%, Skyvern is still quite expensive to run, so we give every new user $5 of credits to try it out and see if it can be useful for you.

We would be honored if you could give it a try at https://app.skyvern.com and share some feedback with us, and we look forward to any and all of your comments!

327 points | 74 commentspage 3
andychert 10/24/2024|
Do I understand correctly that this is an open source of the GUI only, you don't show the model itself?
andychert 10/24/2024|
Or you don't have your own model, you use GPT-4V to determine the coordinates of where to click the bot?
ProofHouse 10/25/2024||
Cool but pricing is utterly insane
shaburn 10/24/2024||
Would be great to have a fixed blockchain based event log, ideally encrypted.
infocollector 10/24/2024||
Quick question: What does DataDog's ddtrace do in the opensource version?
suchintan 10/24/2024|
Nothing -- we use DataDog for our cloud telemetry and haven't built a great way to separate dependencies between cloud and open source
rokhayakebe 10/24/2024||
Can I use this to make changes to a Wordpress website if given login?
suchintan 10/24/2024|
Depends on the scope of the changes. What did you have in mind?
rokhayakebe 10/24/2024||
Maybe add a new page or update a link.
biosboiii 10/25/2024||
you can use the official API for that, right? without having to pay ChatGPT and click pixels.
drippingfist 10/24/2024||
This is very cool. Do you think I could use to do UX/UI testing?
suchintan 10/24/2024|
Give it a try! It's very capable of doing simple tasks like logging in and clicking around. You'll need to prompt assertions like "Complete if..." and "Terminate if..."
tdsone3 10/25/2024||
Has someone run this on modal.com yet?
Cheesman123 10/24/2024||
Congrats on the launch - love the tool
PeterStuer 10/25/2024||
But will Cloudflare brick it?
ji_zai 10/25/2024|
Congrats!! This is super neat. I've been looking for good ways to have AI browse the internet on my behalf - the way I normally do, and give me a presentation / summary of the highlights, so that I don't have to open myself up as much to social media and the chance for doomscrolling, etc.

I'm going to be playing with this.

More comments...