Remix.run Logo
Show HN: Tabstack – Browser infrastructure for AI agents (by Mozilla)
88 points by MrTravisB a day ago | 13 comments

Hi HN,

My team and I are building Tabstack to handle the "web layer" for AI agents. Launch Post: https://tabstack.ai/blog/intro-browsing-infrastructure-ai-ag...

Maintaining a complex infrastructure stack for web browsing is one of the biggest bottlenecks in building reliable agents. You start with a simple fetch, but quickly end up managing a complex stack of proxies, handling client-side hydration, and debugging brittle selectors. and writing custom parsing logic for every site.

Tabstack is an API that abstracts that infrastructure. You send a URL and an intent; we handle the rendering and return clean, structured data for the LLM.

How it works under the hood:

- Escalation Logic: We don't spin up a full browser instance for every request (which is slow and expensive). We attempt lightweight fetches first, escalating to full browser automation only when the site requires JS execution/hydration.

- Token Optimization: Raw HTML is noisy and burns context window tokens. We process the DOM to strip non-content elements and return a markdown-friendly structure that is optimized for LLM consumption.

- Infrastructure Stability: Scaling headless browsers is notoriously hard (zombie processes, memory leaks, crashing instances). We manage the fleet lifecycle and orchestration so you can run thousands of concurrent requests without maintaining the underlying grid.

On Ethics: Since we are backed by Mozilla, we are strict about how this interacts with the open web.

- We respect robots.txt rules.

- We identify our User Agent.

- We do not use requests/content to train models.

- Data is ephemeral and discarded after the task.

The linked post goes into more detail on the infrastructure and why we think browsing needs to be a distinct layer in the AI stack.

This is obviously a very new space and we're all learning together. There are plenty of known unknowns (and likely even more unknown unknowns) when it comes to agentic browsing, so we’d genuinely appreciate your feedback, questions, and tips.

Happy to answer questions about the stack, our architecture, or the challenges of building browser infrastructure.

sippeangelo 3 hours ago | parent | next [-]

With all respect to Mozilla, "respects robots.txt" makes this effectively DoA. AI agents are a form of user agent like any other when initiated by a human, no matter the personal opinion of the content publisher (unlike the egregious automated /scraping/ done for model training).

MrTravisB 3 hours ago | parent | next [-]

This is a valid perspective. Since this is an emerging space, we are still figuring out how to show up in a healthy way for the open web.

We recognize that the balance between content owners and the users or developers accessing that content is delicate. Because of that, our initial stance is to default to respecting websites as much as possible.

That said, to be clear on our implementation: we currently only respond to explicit blocks directed at the Tabstack user agent. You can read more about how this works here: https://docs.tabstack.ai/trust/controlling-access

mossTechnician 3 hours ago | parent | prev | next [-]

I agree with you in spirit, but I find it hard to explain that distinction. What's the difference between mass web scraping and an automated tool using this agent? The biggest differences I assume would be scope and intent... But because this API is open for general development, it's difficult to judge the intent and scope of how it could be used.

jakelazaroff 2 hours ago | parent [-]

What's difficult to explain? If you're having an agent crawl a handful of pages to answer a targeted query, that's clearly not mass scraping. If you're pulling down entire websites and storing their contents, that's clearly not normal use. Sure, there's a gray area, but I bet almost everyone who doesn't work for an AI company would be able to agree whether any given activity was "mass scraping" or "normal use".

1shooner an hour ago | parent [-]

What is worse: 10,000 agents running daily targeted queries on your site, or 1 query pulling 10,000 records to cache and post-process your content without unnecessarily burdening your service?

jakelazaroff an hour ago | parent [-]

I apprehend that you want me to say the first one is worse, but it's impossible with so few details. Like: worse for whom? in what way? to what extent?

If (for instance) my content changes often and I always want people to see an up-to-date version, the second option is clearly worse for me!

1shooner 17 minutes ago | parent [-]

No, I've been turning it over in my mind since this question started to emerge and I think it's complicated, I don't have an answer myself. After all, the first option is really just the correlate to today's web traffic, it's just no longer your traffic. You created the value, but you do not get the user attention.

My apprehension is not with AI agents per se, it is the current, and likely future implementation: AI vendors selling the search and re-publication of other parties' content. In this relationship, neither option is great: either these providers are hammering your site on behalf of their subscribers' individual queries, or they are scraping and caching it, and reselling potentially stale information about you.

ugh123 3 hours ago | parent | prev | next [-]

100%

observationist 3 hours ago | parent | prev [-]

Exactly. robots.txt with regards to AI is not a standard and should be treated like the performative, politicized, ideologically incoherent virtue signalling that it is.

There are technical improvements to web standards that can and should be made that doesn't favor adtech and exploitative commercial interests over the functionality, freedom, and technically sound operation of the internet

shevy-java 2 hours ago | parent | prev | next [-]

Mozilla giving up on Firefox every day ...

srameshc 3 hours ago | parent | prev | next [-]

This looks good , but if Pay-as-you-go pricing can have some more information about what your actual are charges are per unit or whatever metrics, that would be helpful. I signed up but still can not find the actual pricing.

Diti 4 hours ago | parent | prev [-]

Pricing page is hidden behind a registration form. Why?

I also wanted to see how/if it handled semantic data (schema.org and Wikidata ontologies), but the hidden pricing threw me off.

MrTravisB 3 hours ago | parent [-]

Thanks for the feedback. We are definitely not trying to hide it. We actually do have pricing listed in the API section regarding the different operations, but we could definitely work on making this clearer and easier to parse.

We are simply in an early stage and still finalizing our long-term subscription tiers. Currently, we use a simple credit model which is $1 per 10,000 credits. However, every account receives 50,000 credits for free every month ($5 value). We will have a dedicated public pricing page up as soon as our monthly plans are finalized.

Regarding semantic data, our JSON extraction endpoint is designed to extract any data on the page. That said, we would love to know your specific use cases for those ontologies to see if we can further improve our support for them.