Remix.run Logo
Show HN: CommerceTXT – An open standard for AI shopping context (like llms.txt)(commercetxt.org)
9 points by tsazan 3 days ago | 10 comments

Hi HN, author here.

I built CommerceTXT because I got tired of the fragility of extracting pricing and inventory data from HTML. AI agents currently waste ~8k tokens just to parse a product page, only to hallucinate the price or miss the fact that it's "Out of Stock".

CommerceTXT is a strict, read-only text protocol (CC0 Public Domain) designed to give agents deterministic ground truth. Think of it as `robots.txt` + `llms.txt` but structured specifically for transactions.

Key technical decisions v1.0:

1. *Fractal Architecture:* Root -> Category -> Product files. Agents only fetch what they need (saves bandwidth/tokens).

2. *Strictly Read-Only:* v1.0 intentionally excludes transactions/actions to avoid security nightmares. It's purely context.

3. *Token Efficiency:* A typical product definition is ~380 tokens vs ~8,500 for the HTML equivalent.

4. *Anti-Hallucination:* Includes directives like @INVENTORY with timestamps and @REVIEWS with verification sources.

The spec is live and open. I'd love your feedback on the directive structure and especially on the "Trust & Verification" concepts we're exploring.

Spec: https://github.com/commercetxt/commercetxt Website: https://commercetxt.org

reddalo 2 hours ago | parent | next [-]

We should stop polluting website roots with these files (including llms.txt).

All these files should be registered with IANA and put under the .well-known namespace.

https://en.wikipedia.org/wiki/Well-known_URI

tsazan 2 hours ago | parent [-]

I understand the theoretical argument.

We follow the precedent of robots.txt, ads.txt, and llms.txt.

The reason is friction. Platforms like Shopify and Wix make .well-known folders difficult or impossible for merchants to configure. Root files work everywhere.

Adoption matters more than namespace hygiene.

JimDabell an hour ago | parent [-]

How about following the precedent of all of these users of /.well-known/

https://en.wikipedia.org/wiki/Well-known_URI#List_of_well-kn...

robots.txt was created three decades ago, when we didn’t know any better.

Moving llms.txt to /.well-known/ is literally issue #2 for llms.txt

https://github.com/AnswerDotAI/llms-txt/issues/2

Please stop polluting the web.

tsazan an hour ago | parent [-]

I prioritize simplicity and adoption for non-technical users over strict IETF compliance right now. My goal is to make this work for a shop owner on Shopify and Wix, not just for sysadmins.

That said, I am open to supporting .well-known as a secondary location in v1.1 if the community wants it.

amitav1 an hour ago | parent | prev | next [-]

Wait, am I dumb, or did the authors hallucinate? @INVENTORY says that 42 are in stock, but the text says "Only 3 left". Am I misunderstanding this or does stock mean something else?

tsazan an hour ago | parent [-]

Good eye. This demonstrates the protocol’s core feature.

The raw data shows 42. We used @SEMANTIC_LOGIC to force a limit of 3. The AI obeys the developer's rules, not just the CSV.

We failed to mention this context. It causes confusion. We are changing it to 42.

nebezb 30 minutes ago | parent [-]

Ah, so dark patterns then. Baked right into your standard.

tsazan 20 minutes ago | parent [-]

Not dark patterns. Operational logic.

Physical stock rarely equals sellable stock. Items sit in abandoned carts. Or are held as safety buffers. If you have 42 items and 39 are reserved, telling the user "42 available" is the lie. It causes overselling.

The protocol allows the developer to define the sellable reality.

Crucially, we anticipated abuse. See Section 9: Cross-Verification.

If an agent detects systematic manipulation (fake urgency that contradicts checkout data), the merchant suffers a Trust Score penalty. The protocol is designed to penalize dark patterns, not enable them.

duskdozer an hour ago | parent | prev [-]

I'm not sure I understand the point of this as opposed to something like a json file, and also, assuming there is any type of structured format, why one would use an LLM for this task instead of a normal parser.

tsazan 30 minutes ago | parent [-]

You assume JSON is a standalone file. It rarely is.

Even if it were, JSON is verbose. Every bracket and quote costs tokens.

In reality, the data is buried in 1MB+ of HTML. You download a haystack to find a needle.

We fetch a standalone text file. It cuts the syntax tax. It is pure signal.