Remix.run Logo
Gwtar: A static efficient single-file HTML format(gwern.net)
112 points by theblazehen 5 hours ago | 27 comments
simonw 3 hours ago | parent | next [-]

TIL about window.stop() - the key to this entire thing working, it's causes the browser to stop loading any more assets: https://developer.mozilla.org/en-US/docs/Web/API/Window/stop

Apparently every important browser has supported it for well over a decade: https://caniuse.com/mdn-api_window_stop

Here's a screenshot illustrating how window.stop() is used - https://gist.github.com/simonw/7bf5912f3520a1a9ad294cd747b85... - everything after <!-- GWTAR END is tar compressed data.

Posted some more notes on my blog: https://simonwillison.net/2026/Feb/15/gwtar/

moritzwarhier 2 hours ago | parent | next [-]

Not the inverse, but for any SPA (not framework or library) developers seeing this, it's probably worth noting that this is not better than using document.write, window.open and simular APIs.

But could be very interesting for use cases where the main logic lives on the server and people try to manually implement some download- and/or lazy-loading logic.

Still probably bad unless you're explicitly working on init and redirect scripts.

8n4vidtmkvmk 3 hours ago | parent | prev [-]

Neat! I didn't know about this either.

Php has a similar feature called __halt_compiler() which I've used for a similar purpose. Or sometimes just to put documentation at the end of a file without needing a comment block.

tym0 2 hours ago | parent | prev | next [-]

I was on board until I saw that those can't easily be opened from a local file. Seems like local access is one of the main use case for archival formats.

avaer an hour ago | parent [-]

Agreed, I was thinking it's like asm.js where it can "backdoor pilot" [1] an interesting use case into the browser by making it already supported by default.

But not being able to "just" load the file into a browser locally seems to defeat a lot of the point.

[1] https://en.wikipedia.org/wiki/Television_pilot#Backdoor_pilo...

zetanor 3 hours ago | parent | prev | next [-]

The author dismisses WARC, but I don't see why. To me, Gwtar seems more complicated than a WARC, while being less flexible and while also being yet another new format thrown onto the pile.

simonw 3 hours ago | parent | next [-]

I don't think you can provide a URL to a WARC that can be clicked to view its content directly in your browser.

zetanor 2 hours ago | parent [-]

At the very least, WARC could have been used as the container ("tar") format after the preamble of Gwtar. But even there, given that this format doesn't work without a web server (unlike SingleFile, mentioned in the article), I feel like there's a lot to gain by separating the "viewer" (Gwtar's javascript) from the content, such that the viewer can be updated over time without changing the archives.

I certainly could be missing something (I've thought about this problem for all of a few minutes here), but surely you could host "warcviewer.html" and "warcviewer.js" next to "mycoolwarc.warc" "mycoolwrc.cdx" with little to no loss of convenience, and call it a day?

obscurette 2 hours ago | parent | prev [-]

WARC is mentioned with very specific reason not being good enough: "WARCs/WACZs achieve static and efficient, but not single (because while the WARC is a single file, it relies on a complex software installation like WebRecorder/Replay Webpage to display)."

mr_mitm an hour ago | parent | prev | next [-]

Pretty cool. I made something similar (much more hacky) a while ago: https://github.com/AdrianVollmer/Zundler

Works locally, but it does need to decompress everything first thing.

nullsanity 2 hours ago | parent | prev | next [-]

Gwtar seems like a good solution to a problem nobody seemed to want to fix. However, this website is... something else. It's full of inflated self impprtantance, overly bountiful prose, and feels like someone never learned to put in the time to write a shorter essay. Even the about page contains a description of the about page.

I don't know if anyone else gets "unemployed megalomaniacal lunatic" vibes, but I sure do.

3rodents 2 hours ago | parent | next [-]

gwern is a legendary blogger (although blogger feels underselling it… “publisher”?) and has earned the right to self-aggrandize about solving a problem he has a vested interest in. Maybe he’s a megalomaniac and/or unemployed and/or writing too many words but after contributing so much, he has earned it.

TimorousBestie an hour ago | parent [-]

I was more willing to accept gwern’s eccentricities in the past but as we learn more about MIRI and its questionable funding resources, one wonders how much he’s tied up in it.

The Lighthaven retreat in particular was exceptionally shady, possibly even scam-adjacent; I was shocked that he participated in it.

k33n 22 minutes ago | parent [-]

What does any of that have to do with the value of what’s presented in the article?

fluidcruft 2 hours ago | parent | prev [-]

What's up with the non-stop knee-jerk bullshit ad hom on HN lately?

Krutonium 2 hours ago | parent | next [-]

We're tired, chief.

esseph 2 hours ago | parent | prev | next [-]

The earth is falling out from under a lot of people, and they're trying to justify their position on the trash heap as the water level continues to rise around it. It's a scary time.

TimorousBestie 41 minutes ago | parent | prev [-]

Technically it’s only an ad hominem when you’re using the insult as a component in a fallacious argument; the parent comment is merely stating an aesthetic opinion with more force than is typically acceptable here.

Retr0id an hour ago | parent | prev | next [-]

It's fairly common for archivers (including archive.org) to inject some extra scripts/headers into archived pages or otherwise modify the content slightly (e.g. fixing up relative links). If this happens, will it mess up the offsets used for range requests?

spankalee 2 hours ago | parent | prev | next [-]

I really don't understand why a zip file isn't a good solution here. Just because is requires "special" zip software on the server?

newzino an hour ago | parent [-]

Zip stores its central directory at the end of the file. To find what's inside and where each entry starts, you need to read the tail first. That rules out issuing a single Range request to grab one specific asset.

Tar is sequential. Each entry header sits right before its data. If the JSON manifest in the Gwtar preamble says an asset lives at byte offset N with size M, the browser fires one Range request and gets exactly those bytes.

The other problem is decompression. Zip entries are individually deflate-compressed, so you'd need a JS inflate library in the self-extracting header. Tar entries are raw bytes, so the header script just slices at known offsets. No decompression code keeps the preamble small.

fluidcruft 37 minutes ago | parent [-]

You can also read a zip sequentially like a tar file. Some info is in the directory only but just for getting file data you can read the file records sequentially. There are caveats about when files appear multiple times but those caveats also apply to processing tar streams.

O1111OOO 2 hours ago | parent | prev | next [-]

I gave up a long time ago and started using the "Save as..." on browsers again. At the end of the day, I am interested in the actual content and not the look/feel of the page.

I find it easier to just mass delete assets I don't want from the "pageTitle_files/" directory (js, images, google-analytics.js, etc).

mikae1 an hour ago | parent | next [-]

Have you https://addons.mozilla.org/firefox/addon/single-file/?

If you really just want the text content you could just save markdown using something like https://addons.mozilla.org/firefox/addon/llmfeeder/.

TiredOfLife 19 minutes ago | parent | prev [-]

Save as doesn't work on sites that lazy load.

westurner an hour ago | parent | prev | next [-]

Does this verify and/or rewrite the SRI integrity hashes when it inlines resources?

Would W3C Web Bundles and HTTP SXG Signed Exchanges solve for this use case?

WICG/webpackage: https://github.com/WICG/webpackage#packaging-tools

"Use Cases and Requirements for Web Packages" https://datatracker.ietf.org/doc/html/draft-yasskin-wpack-us...

renewiltord 2 hours ago | parent | prev [-]

Hmm, I’m interested in this, especially since it applies no compression delta encoding might be feasible for daily scans of the data but for whatever reason my Brave mobile on iOS displays a blank page for the example page. Hmm, perhaps it’s a mobile rendering issue because Chrome and Safari on iOS can’t do it either https://gwern.net/doc/philosophy/religion/2010-02-brianmoria...