Remix.run Logo
flowerthoughts 3 months ago

Related: if you ever want to create your own serialization format, please at least have a cursory look at the basics of ASN.1. It's very complete both in terms of textual descriptions (how it started) and breadth of encoding rules (because it's practical.)

(You can skip the classes and macros, though they are indeed cool...)

tptacek 3 months ago | parent [-]

This sounds dangerously like a suggestion that more people use ASN.1.

cryptonector 3 months ago | parent | next [-]

Would you rather they reinvent the wheel badly? Thjat's what ProtocolBuffers is: badly reinvented ASN.1/DER!

PB is:

  - TLV (tag-length-value), like DER
  - you have to explicitly list the
    tags in the IDL as if it was ASN.1
    in 1984 (but actually, worse,
    because even back then tags were
    not always required in ASN.1, only
    for diambiguation)
  - it's super similar to DER, yet not
    not the same
  - PB was created in part because ASN.1
    had so little open source tooling,
    but PB had none until they wrote it
    so they could just have written the
    ASN.1 tooling they'd wished they had
smh
RainyDayTmrw 3 months ago | parent | next [-]

In complete fairnes to PBs, PBs have a heck of a lot less surface area than ASN.1. You could argue, why not use a subset of ASN.1, but it seems people have trouble agreeing which subset to use.

cryptonector 3 months ago | parent [-]

I don't agree with that. PB is practically the same as DER. All the attack surface area lies in the codec, specifically in the decoder.

mananaysiempre 3 months ago | parent [-]

There are two notions of surface area here: that exposed to the external input, which must be secured, and that exposed to the programmer, which must be understood. As far as the latter is concerned, one can’t really disassociate the encoding of DER from the, well, notation of ASN.1, which, while definitely not as foreign as it may first appear, is still very rich compared to the one Protobufs use. (I do think a good tutorial and a cheat-sheet comparison to more widely used IDLs would help—for certain, obscure dusty corners and jargon-laden specs have never stopped anyone from writing the COM dialect of DCE IDL.)

Even if we restrict ourselves to the former notion, the simple first stage of parsing that handles DER proper is not the only one we have to contend with: we also have to translate things like strings, dates, and times to ones the embedding environment commonly uses. Like, I’m the kind of weird pervert that would find it fun to implement transcoding between T.61 and Unicode faithfully, but has anyone ever actually put T.61 in an ASN.1 T61String? As far as I know, not as far as PKIX is concerned—seemingly every T61String in a certificate just has ISO 8859-1 or *shudder* even Windows-1252 inside it (and that’s part of the reason T61Strings are flat out prohibited in today’s Web PKI, but who can tell about private PKIs?). And I’ll have to answer questions like this about every one of a dozen obscure and/or antiquated data types that core ASN.1 has (EMBEDDED PDV anyone?..).

cryptonector 3 months ago | parent [-]

> There are two notions of surface area here: that exposed to the external input, which must be secured, and that exposed to the programmer, which must be understood. As far as the latter is concerned, one can’t really disassociate the encoding of DER from the, well, notation of ASN.1, [...]

I disagree. I say that as a part-time maintainer of an open source ASN.1 stack that generates ergonomic C from ASN.1.

> I do think a good tutorial and a cheat-sheet comparison [...]

For ASN.1? There's a great deal of content out there, and several books. I'm not sure what more can be done. Tutorials? Look around this thread. People who can't be bothered with docs nowadays also can't be bothered with tutorials -- they just rely on LLMs.

> Like, I’m the kind of weird pervert that would find it fun to implement transcoding between T.61 and Unicode faithfully, but has anyone ever actually put T.61 in an ASN.1 T61String?

Me too, but as you note, no one really does that. My approach as to PKIX is to only-allow-ASCII for string types other than UTF8String.

> And I’ll have to answer questions like this about every one of a dozen obscure and/or antiquated data types that core ASN.1 has (EMBEDDED PDV anyone?..).

Now do C++!

A "modern" subset of ASN.1 is not that much smaller than x.680 + all of x.681, x.682, and x.683.

flowerthoughts 3 months ago | parent | prev | next [-]

The one thing that grinds my gears about BER/CER/DER is that they managed to come up with two different varint encoding schemes for the tag and length.

cryptonector 3 months ago | parent [-]

Meh. One rarely ever needs tags larger than 30, and even more seldom tags larger than twice that, say.

flowerthoughts 3 months ago | parent [-]

Yeah, but if you're writing a parser for use by others, you have to implement both, even if it's "rarely" used. Or some intern somewhere will have a bad day after getting tasked with "just add this value here, it'll be an easy starter project." :)

cryptonector 3 months ago | parent [-]

And then it's a tiny bit more code. It's really not a problem.

mort96 3 months ago | parent | prev [-]

Why wouldn't you want to explicitly number fields? Protocols evolve and get extended over time, making the numbering explicit ensures that there's no accidental backwards compat breakage from re-ordering fields. Implicit field numbers sounds like an excellent reason to not use ASN.1.

This shilling for an over-engineered 80s encoding ecosystem that nobody uses is really putting me off.

cryptonector 3 months ago | parent [-]

> Why wouldn't you want to explicitly number fields? Protocols evolve and get extended over time, making the numbering explicit ensures that there's no accidental backwards compat breakage from re-ordering fields.

ASN.1 went through this whole evolution, and ended up developing extensive support for extensibility and "automatic tagging" so you don't have to manually tag. That happened because the tagging was a) annoying, b) led to inconsistent use, c) led to mistakes, d) was almost completely unnecessary in encoding rules that aren't tag-length-value, like PER and OER.

The fact that you are not yet able to imagine that evolution, and that you are not cognizant with ASN.1's history proves the point that one should study what came before before reinventing the wheel [badly].

mananaysiempre 3 months ago | parent | next [-]

I have to admit that I could not make heads or tails of the extension marker stuff in the ASN.1 standards I’ve read (so the essential ones like basic ASN.1 and BER, not the really fun stuff like object classes, macros, or ECN). This is rather unlike the rest of those standards. So, could you elaborate on what those actually do and/or why they’re the right way to do things?

cryptonector 3 months ago | parent [-]

> So, could you elaborate on what those actually do and/or why they’re the right way to do things?

Yes.

TL;DR: formal languages allow you to have tooling that greatly reduces developer load -work and cognitive load-, which yields more complete and correct implementations of specifications that use formal languages.

  ---
My apologies for the following wall of text, but I hope you can spare the time to read it.

Suppose you don't have formal ways to express certain things like "if you see extra fields at the end of this structure, ignore them", so you write that stuff in English (or French, or...). Not every implementor will be a fluent English (or French, or ...) reader, and even the ones who are might move too fast and break things. If you make something formal in a machine-readable language, then you don't have that problem.

Formalizing things like this adds plenty of value and doesn't cost much as far as the specification language and specs using it go. It does cost something to make tooling implement it fully, but it's not really that big a deal -- this stuff is a lot simpler than -say- Clang and LLVM.

> I could not make heads or tails of the extension marker stuff

It's like this. Suppose you have a "struct" you might want to add fields to later on:

  SEQUENCE {
     foo UTF8String
     n   INTEGER
  }
well, then you an "extensibility marker" to denoted this:

  SEQUENCE {
     foo UTF8String
     n   INTEGER,
     ...
  }
This tells your tooling to ignore and skip over any extensions present when decoding.

But now you want to define some such extensions and leave the result to be extensible, so you write:

  SEQUENCE {
     foo UTF8String
     n   INTEGER,
     ...,
     [[2: -- first extension
     bar OBJECT IDENTIFIER,
     ]],
     ...  -- still extensible!
  }
You can also use extensibility markers in constraints, like:

  -- small integer now, but some day maybe larger
  SmallInt INTEGER (-128..128, ...)
> in the ASN.1 standards I’ve read (so the essential ones like basic ASN.1 and BER, not the really fun stuff like object classes, macros, or ECN). This is rather unlike the rest of those standards.

Extensibility markers are in the base ASN.1 spec, x.680.

The "Information Object System" and "object classes" and such are in x.681, x.682, and x.683. That's all a fancy way of expressing formally "parameterized types" and what kinds of things go in "typed holes", where a "typed hole" is something like a Rust-like "enum with data" where the enum is extensible through external registries. A typical example is PKIX certificate extensions:

  TBSCertificate  ::=  SEQUENCE  {
      version         [0]  Version DEFAULT v1,
      serialNumber         CertificateSerialNumber,
      signature            AlgorithmIdentifier{SIGNATURE-ALGORITHM,
                                {SignatureAlgorithms}},
      issuer               Name,
      validity             Validity,
      subject              Name,
      subjectPublicKeyInfo SubjectPublicKeyInfo,
      ... ,
      [[2:               -- If present, version MUST be v2
      issuerUniqueID  [1]  IMPLICIT UniqueIdentifier OPTIONAL,
      subjectUniqueID [2]  IMPLICIT UniqueIdentifier OPTIONAL
      ]],
      [[3:               -- If present, version MUST be v3 --
      extensions      [3]  Extensions{{CertExtensions}} OPTIONAL
      ]],
      ... }
Here `signature` and `extensions` are typed holes. A signature will be some algorithm identifier, optional algorithm parameters, and a signature byte blob. While `extensions` will be a `SEQUENCE OF` (array of) `Extension`:

  Extensions{EXTENSION:ExtensionSet} ::=
      SEQUENCE SIZE (1..MAX) OF Extension{{ExtensionSet}}
(this means that `Extensions` is an array of at least one item of type `Extension` where all those elements are constrained by the "object set" identified by the formal parameter `ExtensionSet` -- formal parameter meaning: the actual parameter is not specified here, but it is specified about where we saw `extensions` is `Extensions{{CertExtensions}}`, and so the actual parameter is `CertExtensions`. Here's what `CertExtensions` is:

   CertExtensions EXTENSION ::= {
           ext-AuthorityKeyIdentifier | ext-SubjectKeyIdentifier |
           ext-KeyUsage | ext-PrivateKeyUsagePeriod |
           ext-CertificatePolicies | ext-PolicyMappings |
           ext-SubjectAltName | ext-IssuerAltName |
           ext-SubjectDirectoryAttributes |
           ext-BasicConstraints | ext-NameConstraints |
           ext-PolicyConstraints | ext-ExtKeyUsage |
           ext-CRLDistributionPoints | ext-InhibitAnyPolicy |
           ext-FreshestCRL | ext-AuthorityInfoAccess |
           ext-SubjectInfoAccessSyntax, ... }
where each of those `ext-*` is an information object that looks like this:

   ext-SubjectAltName EXTENSION ::= { SYNTAX
       GeneralNames IDENTIFIED BY id-ce-subjectAltName }
which says that a SAN (subjectAltName) is identified by the OID `ext-SubjectAltName` and consists of a byte blob containing an encoded GeneralNames value:

   GeneralNames ::= SEQUENCE SIZE (1..MAX) OF GeneralName

   GeneralName ::= CHOICE {
        otherName                   [0]  INSTANCE OF OTHER-NAME,
        rfc822Name                  [1]  IA5String,
        dNSName                     [2]  IA5String,
        x400Address                 [3]  ORAddress,
        directoryName               [4]  Name,
        ediPartyName                [5]  EDIPartyName,
        uniformResourceIdentifier   [6]  IA5String,
        iPAddress                   [7]  OCTET STRING,
        registeredID                [8]  OBJECT IDENTIFIER
   }
All the PKIX certificate extensions, and CRL extensions, and attribute certificate extensions and ... extensions are specified like this in, for example, RFC 5912.

If you have a compiler that can handle this then you can have it generate a decoder that fully decodes the most complex certificate in one go and yields a something like a struct (or whatever the host language calls it) that nests all the things. And it can also generate you an encoder that takes a value of that sort.

The alternative is that if you want to fish out a particular thing from a certificate you would have to first decode the certificate, then find the extension you wanted by iterating over the sequence of extensions and looking for the right OID to find the byte blob containing the extension which you would then have to invoke the decoder for. This is a very manual process, it's error-prone, and it's so boring and tedious required extensions.

I want to emphasize how awesome this "decode all the way through, in one invocation" feature is. It really is the most important step to having full implementations of specs.

ECN is more obscure and less used. It was intended as a response to hardware designers' demand for "bits on the wire" docs like for TCP and IP headers. In the 80s and 90s the ITU-T thought they could get ASN.1 to be used even at layers like IP and TCP, and people working on the Internet said "lay off the crack that's crazy talk because hardware needs to efficiently decode packet headers yo!". The idea was to use ASN.1 and extend it with ways to denote how things would get encoded on the wire rather than leaving all those details to the encoding rules like BER/DER/CER, PER, OER, XER, JER, etc. Unless you have a need for ECN because you're implementing a protocol that requires it, I would steer clear of it.

As you can tell the ITU-T is in love with formal languages. And they are quite right to so be. Other standards development organizations, like the IETF for example, sometimes make heavy use of such formal languages, and other times not. For example, PKIX, Kerberos, SNMP, etc., all use ASN.1 extensively, and PKIX in particular makes the most sophisticated use of ASN.1 (see RFC 5912!), while things like TLS and SSHv2 have ad-hoc languages for their specifications, and in the case of TLS that language is not always used consistenly, so it's hard to write a compiler for it, and in the case of SSHv2 that language is much too limited to bother writing a compiler for.

You can tell that ITU-T specs are of much higher quality than IETF specs, but then the ITU-T requires $$$ to participate while the IETF is free, and the ITU-T has very good tech writers on staff, and ITU-T participants are often paid specifically for their participation. While the IETF has a paid RFC-Editor and RFC Production Center and editors, but RFC editors only get involved at the very end of RFC publication, so they can't possibly produce much better RFCs than the original authors and editors of the Internet-Drafts that precede them, and Internet-Draft authors are rarely paid to work full time on IETF work. Some IETF specs are of very high quality, but few, if any, approach the quality of the ITU-T x.68x series (ASN.1) and x.69x series (ASN.1 encoding rules).

What all of the above says is that we don't always need the highest quality, most formalized specifications, but whenever we can get them, it's really much better than when we can't.

mort96 3 months ago | parent | prev [-]

Sounds like unnecessary complexity which makes it more error prone.

cryptonector 3 months ago | parent | next [-]

> Sounds like unnecessary complexity which makes it more error prone.

No! On the contrary, it makes it less error prone. Any time you formalize what would have been English (or French, or...) text things get safer, not riskier.

cryptonector 3 months ago | parent | prev [-]

That's like saying that Rust is unnecessary complexity over C...

Ekaros 3 months ago | parent | prev | next [-]

Understanding prior art and getting more comprehensive list of things that need to be considered is always good.

Not doing it is like inventing new programming language after just learning one of them.

RainyDayTmrw 3 months ago | parent | prev [-]

What should people use today, given the choice, that isn't ASN.1?

Edited to add: If they need something with a canonical byte representation, for example for hashing or MAC purposes?

viraptor 3 months ago | parent | next [-]

How much of it do you need in that representation? Usually I see that need in either: x509 where you're already using der, or tiny fragments where a custom tag-length-value would cover almost every usage without having to touch asn.

RainyDayTmrw 3 months ago | parent [-]

All I really need is serialization for structs. I'm trying to avoid inventing my own format, because it seems to be footgun-prone.

wglb 3 months ago | parent | prev | next [-]

Here are some issues: https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=asn.1

zzo38computer 3 months ago | parent [-]

Some of the bug reports here are not actually about ASN.1 even if they are in programs that also use ASN.1. However, some are actually about ASN.1.

But, there are bugs in many computer programs, whether or not they use ASN.1, anyways.

CVE-2022-0778 does not seem to be about ASN.1 (although ASN.1 is mentioned in the description); it seems to be a bug with computing a modular square root for non-prime moduli, and these numbers can come from any source and does not necessarily have anything to do with ASN.1.

CVE-2021-3712 does have to do with ASN.1 implementation, but this is a bad assumption in some other parts of the program that use the ASN.1 structure. (My own implementation also does not require the string stored in the ASN1_Value structure to be null-terminated, but none of the functions implicitly null-terminate it or expect it to be. One reason for this is to avoid memory allocations when they are not needed.)

Many programs dealing with OIDs have problems with it, since the program is badly designed; a properly designed program (which is not that difficult to do) will not have these problems with OIDs. It is rarely necessary to decode OIDs, except for display (my own implementation limits it to 160 digits per part when displaying a OID, which is probably much more than is needed, but should avoid the problem described in CVE-2023-2650 anyways). When comparing OIDs, you can compare them in binary format directly (if one is in text format, you should convert that one to binary to compare them, instead of the other way around). If you only want to validate OIDs, that can be done without decoding the numbers: Check that it is at least one byte long, the first byte is not 0x80, the last byte does not have the high bit set, and any byte that does not have the high bit set is not immediately followed by a byte 0x80. (The same validation can apply to relative OIDs, although some applications may allow relative OIDs to be empty, which is OK; but absolute OIDs are never allowed to be empty.) (Some other reports listed there also relate to OIDs. If the program is well-designed then it will not have these problems, as I described.)

My own implementation never decodes ASN.1 values until you explicitly tell it to do so, with a function according to the type being decoded, and returns an error condition if the type is incorrect. All values are stored in a ASN1_Value structure which works the same way.

Some of the CVE reports are about things which can occur just as easily in other programs not related to ASN.1. Things such as buffer overflows, improperly determining the length, integer overflows, etc, can potentially occur in any other program too.

None of the things listed by CVE reports seems to be inherent security issues with ASN.1 itself.

cryptonector 3 months ago | parent | prev [-]

First of all you should never need a canonical representation. If you think you do, you're almost certainly wrong. In particular you should not design protocols so that you have to re-encode things in order to validate signatures.

So then you don't need DER or anything like it.

Second, ASN.1 is fantastic. You should at least study it a bit before you pick something else.

Third, pick something you have good tooling for. I don't care if it's ASN.1, XDR, DCE RPC / MSRPC, JSON, CBOR, etc. Just make sure you have good tooling. And don't pick XML unless you really need it to interop with things that are already using XML.

EDIT: I generally don't care about downvotes, but in this case I do. Which part of the above was objectionable? Point 1, 2, or 3? My negativity as to XML for protocols? XML for docs is alright.

RainyDayTmrw 3 months ago | parent [-]

Interesting. What do you make of PASETO[1] and specifically PAE[2], then?

[1]: https://github.com/paseto-standard/paseto-spec/blob/master/d... [2]: https://github.com/paseto-standard/paseto-spec/blob/master/d...

cryptonector 3 months ago | parent [-]

I'll have to read the docs. I'll comment here in a few days.