Remix.run Logo
chrismorgan 4 days ago

reStructuredText and Markdown both have a bad habit of clevernesses that fall down—just in different areas.

Both do at least some degree of only matching delimiters at word boundaries. I consider that to be a huge mistake.

reStructuredText falls for it, but has a universally-applicable workaround (backslash-space as a separator—note that it is not an escaped space, as you might reasonably expect: it’s special-cased to expand to nothing but a syntax separator).

Markdown falls for it inconsistently, which as a user of languages that divide words with spaces, is honestly worse. Its rules are more nuanced, which is generally a bad thing, because it makes it harder to build the appropriate mental model. It was also wildly underspecified, though that’s mostly settled now. For many years, Stack Overflow used at least two, I think three but I can’t remember where the third would have been, mutually-incompatible engines, and underscores and mid-word formatting were a total mess. Python in particular suffered—for many years, in comments it was impossible to get plain-text (i.e. not `-wrapped code) __init__.

In CommonMark, _abc_ and *abc* get you abc, but a*b*c gets you abc while a_b_c gets you a_b_c. That’s an admission of failure in syntax. Hmm… I hadn’t thought of this, but I suppose that makes _ basically untenable in languages with no word separator. Interesting argument against Prettier, which has a badly broken Markdown mode¹, and which insists on _ for emphasis, not *.

In my own lightweight markup language I’ve been steadily making and using for my own stuff for the last five years or so, there’s nothing about word boundaries. a*b*c is abc, and if a dialect² defined _ as emphasis, a_b_c would be abc.

Another example of the cleverness problem in reStructuredText is how hard wrapping is handled. https://docutils.sourceforge.io/docs/ref/rst/restructuredtex... is a good example of how badly wrong this can go. (Markdown has related issues, but a little more constrained. A mid-paragraph line starting with “1. ” or “- ”—both plausible, and the latter certain to occur eventually if you use - as a dash—will start a list.) The solution here is to reject column-based hard-wrapping as a terrible idea. Yes, this is a case where the markup language should tell people “you’re doing it wrong”, because otherwise the markup language will either mangle your content, or become bad; or more likely both.

Meanwhile in Markdown, it tries to be clever around specific HTML tags and just becomes hard to predict.

—⁂—

¹ Prettier’s Markdown formatting is known to mangle content, particularly around underscores and asterisks, and they haven’t done anything about it. The first time I accidentally used it it deleted the rest of a file after some messy bad emphasis stuff from a WYSIWYG HTML → Markdown conversion. That was when I discovered .prettierignore is almost completely broken, too. I came away no longer just unimpressed with some of Prettier’s opinions, but severely unimpressed with the rest of it technically. Why they haven’t disabled it until such things are fixed, I don’t know.

² There’s very little fundamental syntax in it: line break, indent and parsing CSS Counter Styles is about it. The rest is all defined in dialects, for easy extension.