Remix.run Logo
mort96 3 months ago

Why wouldn't you want to explicitly number fields? Protocols evolve and get extended over time, making the numbering explicit ensures that there's no accidental backwards compat breakage from re-ordering fields. Implicit field numbers sounds like an excellent reason to not use ASN.1.

This shilling for an over-engineered 80s encoding ecosystem that nobody uses is really putting me off.

cryptonector 3 months ago | parent [-]

> Why wouldn't you want to explicitly number fields? Protocols evolve and get extended over time, making the numbering explicit ensures that there's no accidental backwards compat breakage from re-ordering fields.

ASN.1 went through this whole evolution, and ended up developing extensive support for extensibility and "automatic tagging" so you don't have to manually tag. That happened because the tagging was a) annoying, b) led to inconsistent use, c) led to mistakes, d) was almost completely unnecessary in encoding rules that aren't tag-length-value, like PER and OER.

The fact that you are not yet able to imagine that evolution, and that you are not cognizant with ASN.1's history proves the point that one should study what came before before reinventing the wheel [badly].

mananaysiempre 3 months ago | parent | next [-]

I have to admit that I could not make heads or tails of the extension marker stuff in the ASN.1 standards I’ve read (so the essential ones like basic ASN.1 and BER, not the really fun stuff like object classes, macros, or ECN). This is rather unlike the rest of those standards. So, could you elaborate on what those actually do and/or why they’re the right way to do things?

cryptonector 3 months ago | parent [-]

> So, could you elaborate on what those actually do and/or why they’re the right way to do things?

Yes.

TL;DR: formal languages allow you to have tooling that greatly reduces developer load -work and cognitive load-, which yields more complete and correct implementations of specifications that use formal languages.

  ---
My apologies for the following wall of text, but I hope you can spare the time to read it.

Suppose you don't have formal ways to express certain things like "if you see extra fields at the end of this structure, ignore them", so you write that stuff in English (or French, or...). Not every implementor will be a fluent English (or French, or ...) reader, and even the ones who are might move too fast and break things. If you make something formal in a machine-readable language, then you don't have that problem.

Formalizing things like this adds plenty of value and doesn't cost much as far as the specification language and specs using it go. It does cost something to make tooling implement it fully, but it's not really that big a deal -- this stuff is a lot simpler than -say- Clang and LLVM.

> I could not make heads or tails of the extension marker stuff

It's like this. Suppose you have a "struct" you might want to add fields to later on:

  SEQUENCE {
     foo UTF8String
     n   INTEGER
  }
well, then you an "extensibility marker" to denoted this:

  SEQUENCE {
     foo UTF8String
     n   INTEGER,
     ...
  }
This tells your tooling to ignore and skip over any extensions present when decoding.

But now you want to define some such extensions and leave the result to be extensible, so you write:

  SEQUENCE {
     foo UTF8String
     n   INTEGER,
     ...,
     [[2: -- first extension
     bar OBJECT IDENTIFIER,
     ]],
     ...  -- still extensible!
  }
You can also use extensibility markers in constraints, like:

  -- small integer now, but some day maybe larger
  SmallInt INTEGER (-128..128, ...)
> in the ASN.1 standards I’ve read (so the essential ones like basic ASN.1 and BER, not the really fun stuff like object classes, macros, or ECN). This is rather unlike the rest of those standards.

Extensibility markers are in the base ASN.1 spec, x.680.

The "Information Object System" and "object classes" and such are in x.681, x.682, and x.683. That's all a fancy way of expressing formally "parameterized types" and what kinds of things go in "typed holes", where a "typed hole" is something like a Rust-like "enum with data" where the enum is extensible through external registries. A typical example is PKIX certificate extensions:

  TBSCertificate  ::=  SEQUENCE  {
      version         [0]  Version DEFAULT v1,
      serialNumber         CertificateSerialNumber,
      signature            AlgorithmIdentifier{SIGNATURE-ALGORITHM,
                                {SignatureAlgorithms}},
      issuer               Name,
      validity             Validity,
      subject              Name,
      subjectPublicKeyInfo SubjectPublicKeyInfo,
      ... ,
      [[2:               -- If present, version MUST be v2
      issuerUniqueID  [1]  IMPLICIT UniqueIdentifier OPTIONAL,
      subjectUniqueID [2]  IMPLICIT UniqueIdentifier OPTIONAL
      ]],
      [[3:               -- If present, version MUST be v3 --
      extensions      [3]  Extensions{{CertExtensions}} OPTIONAL
      ]],
      ... }
Here `signature` and `extensions` are typed holes. A signature will be some algorithm identifier, optional algorithm parameters, and a signature byte blob. While `extensions` will be a `SEQUENCE OF` (array of) `Extension`:

  Extensions{EXTENSION:ExtensionSet} ::=
      SEQUENCE SIZE (1..MAX) OF Extension{{ExtensionSet}}
(this means that `Extensions` is an array of at least one item of type `Extension` where all those elements are constrained by the "object set" identified by the formal parameter `ExtensionSet` -- formal parameter meaning: the actual parameter is not specified here, but it is specified about where we saw `extensions` is `Extensions{{CertExtensions}}`, and so the actual parameter is `CertExtensions`. Here's what `CertExtensions` is:

   CertExtensions EXTENSION ::= {
           ext-AuthorityKeyIdentifier | ext-SubjectKeyIdentifier |
           ext-KeyUsage | ext-PrivateKeyUsagePeriod |
           ext-CertificatePolicies | ext-PolicyMappings |
           ext-SubjectAltName | ext-IssuerAltName |
           ext-SubjectDirectoryAttributes |
           ext-BasicConstraints | ext-NameConstraints |
           ext-PolicyConstraints | ext-ExtKeyUsage |
           ext-CRLDistributionPoints | ext-InhibitAnyPolicy |
           ext-FreshestCRL | ext-AuthorityInfoAccess |
           ext-SubjectInfoAccessSyntax, ... }
where each of those `ext-*` is an information object that looks like this:

   ext-SubjectAltName EXTENSION ::= { SYNTAX
       GeneralNames IDENTIFIED BY id-ce-subjectAltName }
which says that a SAN (subjectAltName) is identified by the OID `ext-SubjectAltName` and consists of a byte blob containing an encoded GeneralNames value:

   GeneralNames ::= SEQUENCE SIZE (1..MAX) OF GeneralName

   GeneralName ::= CHOICE {
        otherName                   [0]  INSTANCE OF OTHER-NAME,
        rfc822Name                  [1]  IA5String,
        dNSName                     [2]  IA5String,
        x400Address                 [3]  ORAddress,
        directoryName               [4]  Name,
        ediPartyName                [5]  EDIPartyName,
        uniformResourceIdentifier   [6]  IA5String,
        iPAddress                   [7]  OCTET STRING,
        registeredID                [8]  OBJECT IDENTIFIER
   }
All the PKIX certificate extensions, and CRL extensions, and attribute certificate extensions and ... extensions are specified like this in, for example, RFC 5912.

If you have a compiler that can handle this then you can have it generate a decoder that fully decodes the most complex certificate in one go and yields a something like a struct (or whatever the host language calls it) that nests all the things. And it can also generate you an encoder that takes a value of that sort.

The alternative is that if you want to fish out a particular thing from a certificate you would have to first decode the certificate, then find the extension you wanted by iterating over the sequence of extensions and looking for the right OID to find the byte blob containing the extension which you would then have to invoke the decoder for. This is a very manual process, it's error-prone, and it's so boring and tedious required extensions.

I want to emphasize how awesome this "decode all the way through, in one invocation" feature is. It really is the most important step to having full implementations of specs.

ECN is more obscure and less used. It was intended as a response to hardware designers' demand for "bits on the wire" docs like for TCP and IP headers. In the 80s and 90s the ITU-T thought they could get ASN.1 to be used even at layers like IP and TCP, and people working on the Internet said "lay off the crack that's crazy talk because hardware needs to efficiently decode packet headers yo!". The idea was to use ASN.1 and extend it with ways to denote how things would get encoded on the wire rather than leaving all those details to the encoding rules like BER/DER/CER, PER, OER, XER, JER, etc. Unless you have a need for ECN because you're implementing a protocol that requires it, I would steer clear of it.

As you can tell the ITU-T is in love with formal languages. And they are quite right to so be. Other standards development organizations, like the IETF for example, sometimes make heavy use of such formal languages, and other times not. For example, PKIX, Kerberos, SNMP, etc., all use ASN.1 extensively, and PKIX in particular makes the most sophisticated use of ASN.1 (see RFC 5912!), while things like TLS and SSHv2 have ad-hoc languages for their specifications, and in the case of TLS that language is not always used consistenly, so it's hard to write a compiler for it, and in the case of SSHv2 that language is much too limited to bother writing a compiler for.

You can tell that ITU-T specs are of much higher quality than IETF specs, but then the ITU-T requires $$$ to participate while the IETF is free, and the ITU-T has very good tech writers on staff, and ITU-T participants are often paid specifically for their participation. While the IETF has a paid RFC-Editor and RFC Production Center and editors, but RFC editors only get involved at the very end of RFC publication, so they can't possibly produce much better RFCs than the original authors and editors of the Internet-Drafts that precede them, and Internet-Draft authors are rarely paid to work full time on IETF work. Some IETF specs are of very high quality, but few, if any, approach the quality of the ITU-T x.68x series (ASN.1) and x.69x series (ASN.1 encoding rules).

What all of the above says is that we don't always need the highest quality, most formalized specifications, but whenever we can get them, it's really much better than when we can't.

mort96 3 months ago | parent | prev [-]

Sounds like unnecessary complexity which makes it more error prone.

cryptonector 3 months ago | parent | next [-]

> Sounds like unnecessary complexity which makes it more error prone.

No! On the contrary, it makes it less error prone. Any time you formalize what would have been English (or French, or...) text things get safer, not riskier.

cryptonector 3 months ago | parent | prev [-]

That's like saying that Rust is unnecessary complexity over C...