Remix.run Logo
lylejantzi3rd 3 days ago

I came up with something similar recently, except it doesn't use template elements. It just uses functions and template literals. The function returns a string, which gets dumped into an existing element's innerHTML. Or, a new div element is created to dump it into. Re-rendering is pretty quick that way.

A significant issue I have with writing code this way is that the functions nest and it becomes very difficult to make them compose in a sane way.

    function printPosts(posts) {
      let content = ""

      posts.forEach((post, i) => {
        content += printPost(post)
      })

      window.posts.innerHTML = content
    }

    function printPost(post) {
      return `
        <div class="post" data-guid="${post.guid}">
          <div>
            <img class="avatar" src="https://imghost.com${post.avatar.thumb}"/>
          </div>
          <div class="content">
            <div class="text-content">${post.parsed_text}</div>
            ${post?.image_urls?.length > 0 ? printImage(`https://imghost.com${post.image_urls[0].original}`) : ''}
            ${post?.url_preview ? `<hr/><div class="preview">${printPreview(post.url_preview)}</div>` : ''}
            ${post?.quote_data ? `<hr/><div class="quote">${printQuote(post.quote_data)}</div>` : ''}
            ${post?.filtered ? `<div>filtered by: <b>${post.filtered}</b></div>` : ''}
          </div>
        </div>
      `
    }
chrismorgan 3 days ago | parent | next [-]

This is begging for injection attacks. In this case, for example, if parsed_text and filtered can contain < or &, or if post.guid or post.avatar.thumb can contain ", you’re in trouble.

Generating serialised HTML is a mug’s game when limited to JavaScript. Show me a mature code base where you have to remember to escape things, and I’ll show you a code base with multiple injection attacks.

foota 3 days ago | parent | next [-]

Yeah, OPs code is asking for pain. I suspect there are now developers who've never had to generate html outside the confines of a framework and so are completely unaware of the kinds of attacks you need to protect yourself against.

You can do it from scratch, but you essentially need to track provenance of strings (either needs to be escaped and isn't html, e.g., user input, or html, which is either generated and with escaping already done or static code). It seems like you could build this reasonably simply by using tagged template literals and having e.g., two different Types of strings that are used to track provenance.

brigandish 3 days ago | parent [-]

Thus recreating Perl’s taint mode. Everything new is old.

lylejantzi3rd 3 days ago | parent | prev [-]

Posts are sanitized on the server side. This is client side code.

chrismorgan 3 days ago | parent | next [-]

Although appealing, that’s an extremely bad idea, when you’re limited to JavaScript. In a language with a better type system, it can be only a very bad idea.

The problem is that different contexts have different escaping rules. It’s not possible to give a one-size-fits-all answer from the server side. It has to be done in a context-aware way.

Field A is plain text. Someone enters the value “Alpha & Beta”. Now, what does your server do? If it sanitises by stripping HTML characters, you’ve just blocked valid input; not good. If it doesn’t sanitise but instead unconditionally escapes HTML, somewhere, sooner or later, you’re going to end up with an “Alpha &amp; Beta” shown to the user, when the value gets used in a place that isn’t taking serialised HTML. It always happens sooner or later. (If it doesn’t sanitise or escape, and the client doesn’t escape but just drops it directly into the serialised HTML, that’s an injection vulnerability.)

Field B is HTML. Someone enters the value “<img src=/ onerror=alert('pwnd')>”. Now, what does your server do? If it sanitises by applying a tag/attribute whitelist so that you end up with perhaps “<img src="/">”, fine.

krapp 3 days ago | parent | next [-]

Server-side templating frameworks had context-aware escaping strategies for years before front end frameworks were even a thing. Injection attacks don't persist because this is a hard problem, they persist because security is not a priority over getting a minimum viable product to market for most webdev projects.

The old tried and true strategy of "never sanitize data, push to the database with prepared statements and escape in the templates" is basically bulletproof.

naasking 3 days ago | parent | prev | next [-]

You're unnecessarily complicating this. The server is aware of what fields are HTML so it just encodes the data that it returns like we've been doing for 30 years now. If your point is that this approach is only good with servers that you trust, then that's useful to point out, although we kind of already are vulnerable to server data.

chrismorgan 2 days ago | parent [-]

You’re not getting it: we’re not talking about the server producing templated HTML, which is fine; but rather the server producing JSON, and then the client dropping strings from that object directly into serialised HTML. That’s a problem, because the only way to be safe is to entity-encode everything, but then when you use a string in a context that doesn’t use HTML syntax, you’ll get the wrong result.

It’s not an unnecessary complication. You fundamentally need to know what format you’re embedding something into, in order to encode it, and the server can’t know that.

Depending on what you do, you may want it unencoded, encoded for HTML data or double-quoted attribute value state (& → &amp;, < → &lt;, " → &quot;), encoded for a URL query string parameter value (percent-encoding but with & → %26 as well), and there are several more reasonable possibilities even in the browser frontend context.

These encodings are incompatible, therefore it’s impossible for the server to just choose one and have it work everywhere.

naasking 14 minutes ago | parent [-]

> It’s not an unnecessary complication. You fundamentally need to know what format you’re embedding something into, in order to encode it, and the server can’t know that.

There are two cases here:

1. Backend endpoints are specifically tied to the view being generated (returns viewmodels), in which case the server knows what the client is rendering and can encode it. This frankly should be the default approach because it minimizes network traffic and roundtrips. The original code displayed is perfectly fine in this case.

2. Endpoints are generic and the client assembles views by making multiple requests to various endpoints and takes on the responsibility that server-side frameworks used to do, including encoding.

3 days ago | parent | prev [-]
[deleted]
hombre_fatal 3 days ago | parent | prev [-]

Server-side sanitization means that your view code is inherently vulnerable to injection. You'll notice in modern systems you don't sanitize data in the database and you don't have to manually sanitize when rendering frontend code. It's like that for a reason.

Server-side sanitization and xss injection should be left in the 2000s php era.

jdsleppy 3 days ago | parent [-]

Where do you suggest we sanitize values? Only in the client, when rendering them?

chrismorgan 3 days ago | parent [-]

Depends on what you mean by sanitising.

If you mean filtering out undesirable parts of a document (e.g. disallowing <script> element or onclick attribute), that should normally be done on the server, before storage.

If instead you mean serialising, writing a value into a serialised document: then this should be done at the point you’re creating the serialised document. (That is, where you’re emitting the HTML.)

But the golden standard is not to generate serialised HTML manually, but to generate a DOM tree, and serialise that (though sadly it’s still a tad fraught because HTML syntax is such a mess; it works better in XML syntax).

This final point may be easier to describe by comparison to JSON: do you emit a JSON response by writing `{`, then writing `"some_key":`, then writing `[`, then writing `"\"hello\""` after carefully escaping the quotation marks, and so on? You can, but in practice it’s very rarely done. Rather, you create a JSON document, and then serialise it, e.g. with JSON.stringify inside a browser. In like manner, if you construct a proper DOM tree, you don’t need to worry about things like escaping.

juliend2 3 days ago | parent [-]

What's wrong about filtering before saving, is that if you forget about one rule, you have to go back and re-filter already-saved data in the db (with some one-off script).

I think "normally" we should instead filter for XSS injections when we generate the DOM tree, or just before (such as passing backend data to the frontend, if that makes more sense).

zdragnar 3 days ago | parent [-]

Don't forget that different clients or view formats (apps, export to CSV, etc) all have their own sanitization requirements.

Sanitize at your boundaries. Data going to SQL? Apply SQL specific sanitization. Data going to Mongo? Same. HTML, JSON, markdown, CSV? Apply the view specific sanitizing on the way.

The key difference is that, if you deploy a JSON API that is view agnostic, that the client now needs to apply the sanitization. That's a requirement of an agnostic API.

chrismorgan 2 days ago | parent [-]

Please don’t use the word sanitising for what you seem to be describing: it’s a term more commonly used to mean filtering out undesirable parts. Encoding for a particular serialised format is a completely different, and lossless, thing. You can call it escaping or encoding.

zdragnar 2 days ago | parent [-]

Sanitizing is just a form of encoding that prevents data from becoming executable unintentionally.

chrismorgan 2 days ago | parent [-]

I don’t like how you’re categorising things. Sanitising is absolutely nothing to do with encoding. You can sanitise without encoding, you can encode without sanitising, or you can do both in sequence; and all of these combinations are reasonable and common, in different situations. And sanitising may operate on serialised HTML (risky), or on an HTML tree (both easier and safer).

Saying sanitising is a form of encoding is even less accurate than saying that a paint-mixing stick is a type of paint brush. You can mix paint without painting it, and you can paint without mixing it first.

MrJohz 3 days ago | parent | prev | next [-]

How do you update the html when something changes? For me, that's the most interesting question for these sorts of micro-frameworks - templating HTML or DOM nodes is super easy, but managing state and updates is hard.

dleeftink 3 days ago | parent | next [-]

I find the coroutine/generator approach described in a series of posts by Lorenzo Fox/Laurent Renard to be a promising alternative[0].

It takes a little to wrap your head around, but essentially structures component rendering to follow the natural lifecycle of a generator function that takes as input the state of a previous yield, and can be automatically cleaned up by calling `finally` (you can observe to co-routine state update part in this notebook[1]).

This approach amounts to a really terse co-routine microframework [2].

[0]: https://lorenzofox.dev/posts/component-as-infinite-loop/#:~:...

[1]: https://observablehq.com/d/940d9b77de73e8d6

[2]: https://github.com/lorenzofox3/cofn

lylejantzi3rd 3 days ago | parent | prev [-]

I call printPosts with the new post data. It rewrites the whole chunk in one go, which is pretty snappy. I haven't decided how I'm going to handle more granular updates yet, like comment count or likes.

MrJohz 3 days ago | parent [-]

Yeah, that's a pretty common approach. Unfortunately, browsers aren't very good at doing patch updates, so it'll completely reset any UI elements in the region being rerendered.

It also will make it hard to scope anything you want to do to an individual DOM element. If you want granular updates, for example, you want to be able to do something like `document.querySelector(???)` and be certain it's going to refer to, say, a specific text input in your `printPost` template, without worrying about accessing the inputs created by other instances of the `printPost` template. You can do that with unique IDs, but it's fiddly and error-prone.

3 days ago | parent [-]
[deleted]
spankalee 3 days ago | parent | prev | next [-]

You should really check out lit-html[1]. It's not a framework like this README claims. It just renders template with template literals, but it does so with minimal DOM updates and safely. And it has a number of features for declaratively adding event handlers, setting properties, and dealing with lists.

[1]: https://lit.dev/docs/libraries/standalone-templates/

econ 3 days ago | parent | prev | next [-]

I prefer something like this before building the template string.

image = post.image_urls?[0] || "";

Then have the printImage function return an empty string if the argument is an empty string.

${printImage(image)}

Easier on the eyes.

hyperhello 3 days ago | parent | prev [-]

I like it. Not only does it move the UI into JavaScript, but it moves the scripting into the HTML!

Koffiepoeder 3 days ago | parent [-]

Have a feeling this will lead to XSS vulnerabilities though.