Remix.run Logo
cryptonector 3 months ago

Would you rather they reinvent the wheel badly? Thjat's what ProtocolBuffers is: badly reinvented ASN.1/DER!

PB is:

  - TLV (tag-length-value), like DER
  - you have to explicitly list the
    tags in the IDL as if it was ASN.1
    in 1984 (but actually, worse,
    because even back then tags were
    not always required in ASN.1, only
    for diambiguation)
  - it's super similar to DER, yet not
    not the same
  - PB was created in part because ASN.1
    had so little open source tooling,
    but PB had none until they wrote it
    so they could just have written the
    ASN.1 tooling they'd wished they had
smh
RainyDayTmrw 3 months ago | parent | next [-]

In complete fairnes to PBs, PBs have a heck of a lot less surface area than ASN.1. You could argue, why not use a subset of ASN.1, but it seems people have trouble agreeing which subset to use.

cryptonector 3 months ago | parent [-]

I don't agree with that. PB is practically the same as DER. All the attack surface area lies in the codec, specifically in the decoder.

mananaysiempre 3 months ago | parent [-]

There are two notions of surface area here: that exposed to the external input, which must be secured, and that exposed to the programmer, which must be understood. As far as the latter is concerned, one can’t really disassociate the encoding of DER from the, well, notation of ASN.1, which, while definitely not as foreign as it may first appear, is still very rich compared to the one Protobufs use. (I do think a good tutorial and a cheat-sheet comparison to more widely used IDLs would help—for certain, obscure dusty corners and jargon-laden specs have never stopped anyone from writing the COM dialect of DCE IDL.)

Even if we restrict ourselves to the former notion, the simple first stage of parsing that handles DER proper is not the only one we have to contend with: we also have to translate things like strings, dates, and times to ones the embedding environment commonly uses. Like, I’m the kind of weird pervert that would find it fun to implement transcoding between T.61 and Unicode faithfully, but has anyone ever actually put T.61 in an ASN.1 T61String? As far as I know, not as far as PKIX is concerned—seemingly every T61String in a certificate just has ISO 8859-1 or *shudder* even Windows-1252 inside it (and that’s part of the reason T61Strings are flat out prohibited in today’s Web PKI, but who can tell about private PKIs?). And I’ll have to answer questions like this about every one of a dozen obscure and/or antiquated data types that core ASN.1 has (EMBEDDED PDV anyone?..).

cryptonector 3 months ago | parent [-]

> There are two notions of surface area here: that exposed to the external input, which must be secured, and that exposed to the programmer, which must be understood. As far as the latter is concerned, one can’t really disassociate the encoding of DER from the, well, notation of ASN.1, [...]

I disagree. I say that as a part-time maintainer of an open source ASN.1 stack that generates ergonomic C from ASN.1.

> I do think a good tutorial and a cheat-sheet comparison [...]

For ASN.1? There's a great deal of content out there, and several books. I'm not sure what more can be done. Tutorials? Look around this thread. People who can't be bothered with docs nowadays also can't be bothered with tutorials -- they just rely on LLMs.

> Like, I’m the kind of weird pervert that would find it fun to implement transcoding between T.61 and Unicode faithfully, but has anyone ever actually put T.61 in an ASN.1 T61String?

Me too, but as you note, no one really does that. My approach as to PKIX is to only-allow-ASCII for string types other than UTF8String.

> And I’ll have to answer questions like this about every one of a dozen obscure and/or antiquated data types that core ASN.1 has (EMBEDDED PDV anyone?..).

Now do C++!

A "modern" subset of ASN.1 is not that much smaller than x.680 + all of x.681, x.682, and x.683.

flowerthoughts 3 months ago | parent | prev | next [-]

The one thing that grinds my gears about BER/CER/DER is that they managed to come up with two different varint encoding schemes for the tag and length.

cryptonector 3 months ago | parent [-]

Meh. One rarely ever needs tags larger than 30, and even more seldom tags larger than twice that, say.

flowerthoughts 3 months ago | parent [-]

Yeah, but if you're writing a parser for use by others, you have to implement both, even if it's "rarely" used. Or some intern somewhere will have a bad day after getting tasked with "just add this value here, it'll be an easy starter project." :)

cryptonector 3 months ago | parent [-]

And then it's a tiny bit more code. It's really not a problem.

mort96 3 months ago | parent | prev [-]

Why wouldn't you want to explicitly number fields? Protocols evolve and get extended over time, making the numbering explicit ensures that there's no accidental backwards compat breakage from re-ordering fields. Implicit field numbers sounds like an excellent reason to not use ASN.1.

This shilling for an over-engineered 80s encoding ecosystem that nobody uses is really putting me off.

cryptonector 3 months ago | parent [-]

> Why wouldn't you want to explicitly number fields? Protocols evolve and get extended over time, making the numbering explicit ensures that there's no accidental backwards compat breakage from re-ordering fields.

ASN.1 went through this whole evolution, and ended up developing extensive support for extensibility and "automatic tagging" so you don't have to manually tag. That happened because the tagging was a) annoying, b) led to inconsistent use, c) led to mistakes, d) was almost completely unnecessary in encoding rules that aren't tag-length-value, like PER and OER.

The fact that you are not yet able to imagine that evolution, and that you are not cognizant with ASN.1's history proves the point that one should study what came before before reinventing the wheel [badly].

mananaysiempre 3 months ago | parent | next [-]

I have to admit that I could not make heads or tails of the extension marker stuff in the ASN.1 standards I’ve read (so the essential ones like basic ASN.1 and BER, not the really fun stuff like object classes, macros, or ECN). This is rather unlike the rest of those standards. So, could you elaborate on what those actually do and/or why they’re the right way to do things?

cryptonector 3 months ago | parent [-]

> So, could you elaborate on what those actually do and/or why they’re the right way to do things?

Yes.

TL;DR: formal languages allow you to have tooling that greatly reduces developer load -work and cognitive load-, which yields more complete and correct implementations of specifications that use formal languages.

  ---
My apologies for the following wall of text, but I hope you can spare the time to read it.

Suppose you don't have formal ways to express certain things like "if you see extra fields at the end of this structure, ignore them", so you write that stuff in English (or French, or...). Not every implementor will be a fluent English (or French, or ...) reader, and even the ones who are might move too fast and break things. If you make something formal in a machine-readable language, then you don't have that problem.

Formalizing things like this adds plenty of value and doesn't cost much as far as the specification language and specs using it go. It does cost something to make tooling implement it fully, but it's not really that big a deal -- this stuff is a lot simpler than -say- Clang and LLVM.

> I could not make heads or tails of the extension marker stuff

It's like this. Suppose you have a "struct" you might want to add fields to later on:

  SEQUENCE {
     foo UTF8String
     n   INTEGER
  }
well, then you an "extensibility marker" to denoted this:

  SEQUENCE {
     foo UTF8String
     n   INTEGER,
     ...
  }
This tells your tooling to ignore and skip over any extensions present when decoding.

But now you want to define some such extensions and leave the result to be extensible, so you write:

  SEQUENCE {
     foo UTF8String
     n   INTEGER,
     ...,
     [[2: -- first extension
     bar OBJECT IDENTIFIER,
     ]],
     ...  -- still extensible!
  }
You can also use extensibility markers in constraints, like:

  -- small integer now, but some day maybe larger
  SmallInt INTEGER (-128..128, ...)
> in the ASN.1 standards I’ve read (so the essential ones like basic ASN.1 and BER, not the really fun stuff like object classes, macros, or ECN). This is rather unlike the rest of those standards.

Extensibility markers are in the base ASN.1 spec, x.680.

The "Information Object System" and "object classes" and such are in x.681, x.682, and x.683. That's all a fancy way of expressing formally "parameterized types" and what kinds of things go in "typed holes", where a "typed hole" is something like a Rust-like "enum with data" where the enum is extensible through external registries. A typical example is PKIX certificate extensions:

  TBSCertificate  ::=  SEQUENCE  {
      version         [0]  Version DEFAULT v1,
      serialNumber         CertificateSerialNumber,
      signature            AlgorithmIdentifier{SIGNATURE-ALGORITHM,
                                {SignatureAlgorithms}},
      issuer               Name,
      validity             Validity,
      subject              Name,
      subjectPublicKeyInfo SubjectPublicKeyInfo,
      ... ,
      [[2:               -- If present, version MUST be v2
      issuerUniqueID  [1]  IMPLICIT UniqueIdentifier OPTIONAL,
      subjectUniqueID [2]  IMPLICIT UniqueIdentifier OPTIONAL
      ]],
      [[3:               -- If present, version MUST be v3 --
      extensions      [3]  Extensions{{CertExtensions}} OPTIONAL
      ]],
      ... }
Here `signature` and `extensions` are typed holes. A signature will be some algorithm identifier, optional algorithm parameters, and a signature byte blob. While `extensions` will be a `SEQUENCE OF` (array of) `Extension`:

  Extensions{EXTENSION:ExtensionSet} ::=
      SEQUENCE SIZE (1..MAX) OF Extension{{ExtensionSet}}
(this means that `Extensions` is an array of at least one item of type `Extension` where all those elements are constrained by the "object set" identified by the formal parameter `ExtensionSet` -- formal parameter meaning: the actual parameter is not specified here, but it is specified about where we saw `extensions` is `Extensions{{CertExtensions}}`, and so the actual parameter is `CertExtensions`. Here's what `CertExtensions` is:

   CertExtensions EXTENSION ::= {
           ext-AuthorityKeyIdentifier | ext-SubjectKeyIdentifier |
           ext-KeyUsage | ext-PrivateKeyUsagePeriod |
           ext-CertificatePolicies | ext-PolicyMappings |
           ext-SubjectAltName | ext-IssuerAltName |
           ext-SubjectDirectoryAttributes |
           ext-BasicConstraints | ext-NameConstraints |
           ext-PolicyConstraints | ext-ExtKeyUsage |
           ext-CRLDistributionPoints | ext-InhibitAnyPolicy |
           ext-FreshestCRL | ext-AuthorityInfoAccess |
           ext-SubjectInfoAccessSyntax, ... }
where each of those `ext-*` is an information object that looks like this:

   ext-SubjectAltName EXTENSION ::= { SYNTAX
       GeneralNames IDENTIFIED BY id-ce-subjectAltName }
which says that a SAN (subjectAltName) is identified by the OID `ext-SubjectAltName` and consists of a byte blob containing an encoded GeneralNames value:

   GeneralNames ::= SEQUENCE SIZE (1..MAX) OF GeneralName

   GeneralName ::= CHOICE {
        otherName                   [0]  INSTANCE OF OTHER-NAME,
        rfc822Name                  [1]  IA5String,
        dNSName                     [2]  IA5String,
        x400Address                 [3]  ORAddress,
        directoryName               [4]  Name,
        ediPartyName                [5]  EDIPartyName,
        uniformResourceIdentifier   [6]  IA5String,
        iPAddress                   [7]  OCTET STRING,
        registeredID                [8]  OBJECT IDENTIFIER
   }
All the PKIX certificate extensions, and CRL extensions, and attribute certificate extensions and ... extensions are specified like this in, for example, RFC 5912.

If you have a compiler that can handle this then you can have it generate a decoder that fully decodes the most complex certificate in one go and yields a something like a struct (or whatever the host language calls it) that nests all the things. And it can also generate you an encoder that takes a value of that sort.

The alternative is that if you want to fish out a particular thing from a certificate you would have to first decode the certificate, then find the extension you wanted by iterating over the sequence of extensions and looking for the right OID to find the byte blob containing the extension which you would then have to invoke the decoder for. This is a very manual process, it's error-prone, and it's so boring and tedious required extensions.

I want to emphasize how awesome this "decode all the way through, in one invocation" feature is. It really is the most important step to having full implementations of specs.

ECN is more obscure and less used. It was intended as a response to hardware designers' demand for "bits on the wire" docs like for TCP and IP headers. In the 80s and 90s the ITU-T thought they could get ASN.1 to be used even at layers like IP and TCP, and people working on the Internet said "lay off the crack that's crazy talk because hardware needs to efficiently decode packet headers yo!". The idea was to use ASN.1 and extend it with ways to denote how things would get encoded on the wire rather than leaving all those details to the encoding rules like BER/DER/CER, PER, OER, XER, JER, etc. Unless you have a need for ECN because you're implementing a protocol that requires it, I would steer clear of it.

As you can tell the ITU-T is in love with formal languages. And they are quite right to so be. Other standards development organizations, like the IETF for example, sometimes make heavy use of such formal languages, and other times not. For example, PKIX, Kerberos, SNMP, etc., all use ASN.1 extensively, and PKIX in particular makes the most sophisticated use of ASN.1 (see RFC 5912!), while things like TLS and SSHv2 have ad-hoc languages for their specifications, and in the case of TLS that language is not always used consistenly, so it's hard to write a compiler for it, and in the case of SSHv2 that language is much too limited to bother writing a compiler for.

You can tell that ITU-T specs are of much higher quality than IETF specs, but then the ITU-T requires $$$ to participate while the IETF is free, and the ITU-T has very good tech writers on staff, and ITU-T participants are often paid specifically for their participation. While the IETF has a paid RFC-Editor and RFC Production Center and editors, but RFC editors only get involved at the very end of RFC publication, so they can't possibly produce much better RFCs than the original authors and editors of the Internet-Drafts that precede them, and Internet-Draft authors are rarely paid to work full time on IETF work. Some IETF specs are of very high quality, but few, if any, approach the quality of the ITU-T x.68x series (ASN.1) and x.69x series (ASN.1 encoding rules).

What all of the above says is that we don't always need the highest quality, most formalized specifications, but whenever we can get them, it's really much better than when we can't.

mort96 3 months ago | parent | prev [-]

Sounds like unnecessary complexity which makes it more error prone.

cryptonector 3 months ago | parent | next [-]

> Sounds like unnecessary complexity which makes it more error prone.

No! On the contrary, it makes it less error prone. Any time you formalize what would have been English (or French, or...) text things get safer, not riskier.

cryptonector 3 months ago | parent | prev [-]

That's like saying that Rust is unnecessary complexity over C...