Remix.run Logo
zzo38computer 3 days ago

I think it would be better to not use Unicode (so that you can use any character set), and to use "0o" instead of "0" prefix for octal numbers. Also, EDN seems to lack a proper format for binary data.

I think ASN.1 (and ASN.1X which is I added a few additional types such as key/value list and TRON string) is better. (I also made up a text-based ASN.1 format called TER which is intended to be converted to the binary DER format. It is also intended that extensions and subsets of TER can be made for specific applications if needed.) (I also wrote a DER decoder/encoder library in C, and programs that use that library, to convert TER to DER and to convert JSON to DER.)

ASN.1 (and ASN.1X) has many similar types than EDN, and a comparison can be made:

- Null (called "nil" in EDN) and booleans are available in ASN.1.

- Strings in ASN.1 are fortunately not limited to Unicode; you can also use ISO 2022, as well as octet strings and bit strings. However, there is no "single character" type.

- ASN.1 does have a Enumerated type, although the enumeration is made as numbers rather than as names. The EDN "keywords" type seems to be intended for enumerations.

- The integer and floating point types in ASN.1 are already arbitrary precision. If a reader requires a limited precision (e.g. 64-bits), it is easy to detect if it is out of range and result in an error condition.

- ASN.1 does not have a separate "list" and "vector" type, but does have a "set" type and a "sequence" type. A key/value list ("map") type is a nonstandard type in ASN.1X, but standard ASN.1 does not have a key/value list type.

- ASN.1 does have tagging, although its working is difference from EDN. ASN.1 does already have a date/time type though, so this extension is not needed. Extensions are possible by application types and private types, as well as by other methods such as External, Embedded PDV, and the nonstandard

- The rational number type (in edn.c but the main EDN specification does not seems to mention it), is not a standard type in ASN.1 but ASN.1X does have such a type.

(Some people complain that ASN.1 is complicated; this is not wrong, but you will only need to implement the parts that you will use (which is simpler when using DER rather than BER; I think BER is not very good and DER is much better), which ends up making it simpler while also capable of doing the things that would be desirable.)

(But, EDN does solve some of the problems with JSON, such as comments and a proper integer type.)

delaguardo 3 days ago | parent [-]

> EDN seems to lack a proper format for binary data

The best part of EDN that it is extendable :)

#binary/base64 "SGVsbG8sIHp6bzM4Y29tcHV0ZXIhIEhvdyBhcmUgeW91IGRvaW5nPw=="

This is a tagged literal that can be read by provided (if provided) custom reader during reading of the document. The result could be any type you want.

zzo38computer 3 days ago | parent [-]

OK, this is possible, but it seems the type that ought to be a built-in type.

Also, if there is not a binary file format for the data then you will need to always convert to/from base64 when working with this file whether or not you should need to.

Furthermore, this does not work very well when you want to deal with character sets rather than binary data, since (as far as I can tell from the specification) the input will still need to be UTF-8 and follow the EDN syntax of an existing type.

From what I can understand from the specification, the EDN decoder will still need to run and cannot be streamed if the official specification is used (which can make it inefficient), although it would probably be possible to make an implementation that can do this with streaming instead (but I don't know if the existing one does).

So, the extensibility is still restricted. (In my opinion, ASN.1 (and ASN.1X) does it better.)

delaguardo 3 days ago | parent [-]

> From what I can understand from the specification, the EDN decoder will still need to run and cannot be streamed if the official specification is used

Sorry, you understand it wrong

There is no enclosing element at the top level. Thus edn is suitable for streaming and interactive applications.

> but I don't know if the existing one does

This implementation does not do streaming for now, but it understands a concept of "reading one complete" element from buffer. The only missing part is buffer managment.

> So, the extensibility is still restricted.

Could you explain how it is restricted if you are allowed to run whatever you want during reading of edn document? You can even do IO, no restrictions at all!

Consider this:

#init/postgres {:db-spec {:host "..." :port 54321 ,,,} :specs {:user ,,,}} [#user/id 1 #user/id 2 #user/id 3]

This allows you to have a program that can lookup postgres database during reading of a document validating every returned object using provided spec (conforming the value)

> In my opinion, ASN.1 (and ASN.1X) does it better.

Please show how it does better. I'm very curious

zzo38computer 3 days ago | parent [-]

I think you might have misunderstood what I meant, because I was unclear. I meant that it would have to decode the entire EDN string literal containing the base64 data before decoding the base64, not that it would have to decode the entire file before doing so. (I might still be wrong.)

Specifically, I refer to what is quoted below:

> Upon encountering a tag, the reader will first read the next element (which may itself be or comprise other tagged elements), then pass the result to the corresponding handler for further interpretation, and the result of the handler will be the data value yielded by the tag + tagged element, i.e. reading a tag and tagged element yields one value.

> If a reader encounters a tag for which no handler is registered, the implementation can either report an error, call a designated 'unknown element' handler, or create a well-known generic representation that contains both the tag and the tagged element, as it sees fit. Note that the non-error strategies allow for readers which are capable of reading any and all edn, in spite of being unaware of the details of any extensions present.

Due to these things, EDN does not have a proper "octet string" type, even if the extension is added.

> This implementation does not do streaming for now, but it understands a concept of "reading one complete" element from buffer. The only missing part is buffer managment.

OK, then it could be improved.

> Could you explain how it is restricted if you are allowed to run whatever you want during reading of edn document? You can even do IO, no restrictions at all!

Perhaps the above explains how it is restricted. It does not prevent you from looking up data in a database, etc; it is the data model of EDN itself which is restricted; it is not restricting what you do with it.

> Please show how it does better. I'm very curious

Since all types using the same framing, you can do "lazy decoding" if appropriate (you can also use custom decoders in any part of the file, and this can depend on the schema), and ASN.1 does have a built-in octet string type (as well as bit string, unrestricted character string, etc), and you can add implicit or explicit tagging (I prefer to use implicit if the underlying type is sequence or octet string, and explicit otherwise), as well as types such as External (and the nonstandard ASN1_IDENTIFIED_DATA type), you can easily define any type and can easily skip past any field of any type.

> #init/postgres {:db-spec {:host "..." :port 54321 ,,,} :specs {:user ,,,}} [#user/id 1 #user/id 2 #user/id 3]

Even with TER (the below does not use any extensions to TER itself, but extensions to TER are also possible; even if not, whoever reads the resulting DER can handle the application-specific types as needed), you can:

  [ [P:(database.example) 54321] [0A:1 0A:2 0A:3] ]
In this case, the "0A:" prefix means application type 0, which has a meaning specific to the application; presumably for this application, application type 0 would correspond to user IDs. This example uses implicit types for the user IDs; if you want explicit types instead, then you can write:

  [ [P:(database.example) 54321] [0A[1] 0A[2] 0A[3]] ]
Or, if you want to extend TER instead, then you might define your own keyword, e.g. "userid{1}" instead of "0A:1" or "0A[1]".

(TER is not one of the official ASN.1 formats; it is one that I invented for the purpose of having a text format for ASN.1 which can then be converted to DER; most programs would be expected to use DER rather than TER.)

delaguardo 3 days ago | parent [-]

> I meant that it would have to decode the entire EDN string literal containing the base64 data before decoding the base64

yes, any edn reader implementation will read the complete base64 string from the example before giving this string to a custom reader. I understand now what you explain. However, I don't know what I can do about it. I use edn daily, it works great to me, and I have no immediate plans to replace it with something else.

Anyway, the example you shared looks interesting, I'll definitely read more about it. Thank you.