Remix.run Logo
woodruffw 8 hours ago

The flattening out is the problem: most (all?) widely used YAML parsers represent the YAML document using the JSON object model, which means that there's no model or element representation of the anchors themselves.

That in turn means that there's no way to construct a source span back to the anchor itself, because the parsed representation doesn't know where the anchor came from (only that it was flattened).

VGHN7XDuOXPAzol 8 hours ago | parent [-]

This is something that a custom parser library could figure out, no? The same as how you have format-preserving TOML libraries, for instance.

I think it makes way more sense for GitHub to support YAML anchors given they are after all part of the YAML spec. Otherwise, don't call it YAML! (This was a criticism of mine for many years, I'm very glad they finally saw the light and rectified this bug)

woodruffw 7 hours ago | parent [-]

> This is something that a custom parser library could figure out, no? The same as how you have format-preserving TOML libraries, for instance.

Yes, it's just difficult. The point made in the post isn't that it's impossible, but that it significantly changes the amount of of "ground work" that static analysis tools have to do to produce useful results for GitHub Actions.

> I think it makes way more sense for GitHub to support YAML anchors given they are after all part of the YAML spec. Otherwise, don't call it YAML! (This was a criticism of mine for many years, I'm very glad they finally saw the light and rectified this bug)

It's worth noting that GitHub doesn't support other parts of the YAML spec: they intentionally use their own bespoke YAML parser, and they don't have the "Norway" problem because they intentionally don't apply the boolean value rules from YAML.

All in all, I think conformance with YAML is a red herring here: GitHub Actions is already its own thing, and that thing should be easy to analyze. Adding anchors makes it harder to analyze.

VGHN7XDuOXPAzol 7 hours ago | parent [-]

> conformance with YAML

maybe, but not entirely sure. 'Two wrongs don't make a right' kind of thinking on my side here.

But if they call it GFY and do what they want, then that would probably be better for everyone involved.

> they don't have the "Norway" problem because they intentionally don't apply the boolean value rules from YAML.

I think this is YAML 1.2. I have not done or seen a breakdown to see if GitHub is aiming for YAML 1.2 or not but they appear to think that way, given the discussion around merge keys

--

(though it's still not clear why flattening the YAML would not be sufficient for a static analysis tool. If the error report references a key that was actually merged out, I think users would still understand the report; it's not clear to me that's a bad thing actually)

woodruffw 7 hours ago | parent [-]

> But if they call it GFY and do what they want, then that would probably be better for everyone involved.

Yes, agreed.

> I think this is YAML 1.2. I have not done or seen a breakdown to see if GitHub is aiming for YAML 1.2 or not but they appear to think that way, given the discussion around merge keys

I think GitHub has been pretty ambiguous about this: it's not clear to me at all that they intend to support either version of the spec explicitly. Part of the larger problem here is that programming language ecosystems as a whole don't consistently support either 1.1 or 1.2, so GitHub is (I expect) attempting to strike a happy balance between their own engineering goals and what common language implementations of YAML actually parse (and how they parse it). None of this makes for a great conformance story :-)

> (though it's still not clear why flattening the YAML would not be sufficient for a static analysis tool. If the error report references a key that was actually merged out, I think users would still understand the report; it's not clear to me that's a bad thing actually)

The error report includes source spans, so the tool needs to map back to the original location of the anchor rather than its unrolled document position.

(This is table stakes for integration with formats like SARIF, which expect static analysis results to have physical source locations. It's not good enough to just say "there's a bug in this element and you need to find out where that's introduced," unfortunately.)