Remix.run Logo
huss97 9 hours ago

Hello Hacker News! We’re Nas and Huss, co-founders of steel.dev (http://steel.dev). Steel is an open-source browser API for AI agents and apps. We make it easy for AI devs to build browser automation into their products without getting flagged as a bot or worrying about browser infra.

over the last year or so, we’ve built quite a few AI apps that interact with the web and noticed - a. it was magical when you could get an llm to use the web and it worked and b. our browser infra was the source of 80% of our development time. Maintaining our browser infrastructure became its own engineering challenge - keeping browser pools healthy, managing session states and cookies, rotating proxies, handling CAPTCHA solving, and ensuring clean process termination. We got really good at running browser infrastructure at scale, but maintaining it was still stealing time away from building our actual products. So we wanted to build the product we wish we had.

Steel allows you to run any automation logic on our hosted instances of chromium. When you start a dedicated browser session you get stealth, proxies, and captcha solving out of the box. We do this by exposing websocket and http endpoints so you can connect to these instances with puppeteer, playwright, selenium(in beta), or raw CDP commands if you’re built like that.

Behind the scenes, we host several browser instances and route incoming connection requests to one of these instances. Our core design principle was to allow for every session to have its own dedicated browser instance + resources (currently 2gb vram and 2gb vcpu) while still allowing for quick session creation/connection times. Our first thought was to have separate nodes running in a Kubernetes cluster, but the cost of hosting warm browser instances would be expensive (which would be reflected in the pricing), and the boot times would be too slow to handle the scale that some customers required. We got around this by deploying our browser instance image on a firecracker VM, taking advantage of the lightning-fast boot times and ability to share a root FS.

Today, we’re open-sourcing the code for the steel browser instance, with plans to open-source the orchestration layer soon. With the open-source repo, you get backwards compatibility with our node/python SDKs, a lighter version of our session viewer, and most of the features that come with Steel Cloud. You can run this locally to test Steel out at an individual session level or one-click deploy to render/railway to run remotely.

We're really happy we get to show this to you all, thank you for reading about it! Please let us know your thoughts and questions in the comments.

overu589 9 hours ago | parent | next [-]

Very interesting. I’m not sure I immediately see your application either however I have been having similar thoughts.

After playing a popular indi game (Kenshi) I was wondering about the very simple automation interface the game relies upon. Why not a virtual world (with interfaces attaching any external source) in which business logic agents interact through the available interfaces of the environment, and other agents. Though tbh, I imagine the entire environment as implemented in layers of YAML style schemas and profiles. So all data, whether in a datastore, active instance, streamed or serialized can be related to in the same way. An envelope with attributes and content specified by the type attribute. The only code would then be the rendering environment, and whatever these agents call for stream processing.

Sort of a gamification of automation, though what can’t be beat is dead simple account of what any one thing is doing at a given time.

ohthatsnas 8 hours ago | parent [-]

Hey, Nas here.

The concept of agents running wild with only schemas of how to interact with their environment + one another is something I’ve thought about a lot. With current models, I guess this would just be function calling or forced JSON outputs, but I think it would make for some very interesting results.

With Steel, we’re really just providing a way to do something similar to the “rendering engine” in this scenario but with the web. So agents can interface with their environments (websites) very easily and at scale (which comes with its own difficulties wrt deloying/managing)

Oras 4 hours ago | parent | prev [-]

If these instances are shared, how do you segregate login details, sessions cookies, …etc? Are you always running them in incognito?

ohthatsnas 3 hours ago | parent [-]

Instances are not shared. :) Everyone gets a dedicated session with dedicated resources. One session for every machine.