Remix.run Logo
LeifCarrotson a day ago

In case "bases with optional newlines" wasn't obvious to anyone else, a specific example (from Wikipedia) is:

    ;LCBO - Prolactin precursor - Bovine
    MDSKGSSQKGSRLLLLLVVSNLLLCQGVVSTPVCPNGPGNCQVSLRDLFDRAVMVSHYIHDLSS
    EMFNEFDKRYAQGKGFITMALNSCHTSSLPTPEDKEQAQQTHHEVLMSLILGLLRSWNDPLYHL
    VTEVRGMKGAPDAILSRAIEIEEENKRLLEGMEMIFGQVIPGAKETEPYPVWSGLPSLQTKDED
    ARYSAFYNLLHCLRRDSSKIDTYLKLLNCRIIYNNNC*
where "SS...EM", HL..VT", or "ED..AR" may be common subsequences, but the plaintext file arbitrarily wraps at column 65 so it renders on a DEC VT100 terminal from the 70s nicely.

Or, for an even simpler example:

    ; plaintext
    GATTAC
    AGATTA
    CAGATT
    ACCAGA
    TTACAG
    ATTACA
becomes, on disk, something like

    ; plaintext\r\nGATTAC\r\nAGATTA\r\nCAGATT\r\nACCAGA\r\nTTACAG\r\nATTACA\r\n
which is hard to compress, while

    ; plaintext\r\nGATTACAGATTACAGATTACCAGATTACAGATTACA
is just

    "; plaintext\r\n" + "GATTACA" * 7
and then, if you want, you can reflow the text when it's time to render to the screen.
tgtweak 20 hours ago | parent | next [-]

Feels like it could be an extension to the compression lib (and would retain newlines as such) vs requiring external document tailoring. Also feels like a very specific use case but this optimization might have larger applications outside this narrow field/format.

spatoa 5 hours ago | parent | prev | next [-]

When working with data, I definitely prefer the UI to adapt to the data. I never save anything for the display back.

Terr_ 19 hours ago | parent | prev [-]

Huh, so in other words: "If you don't arbitrarily interrupt continuous sequences of data with cosmetic noise, they compress better."