Remix.run Logo
pacoWebConsult 8 hours ago

Can YAML go away entirely and instead allow pipelines to be defined with an actual language? What benefits does the runner-interpreted yaml-defined pipeline paradigm actually achieve? Especially with runners that can't be executed and tested locally, working with them is a nightmare.

jayd16 8 hours ago | parent | next [-]

Why do we think an arbitrary language is easier to reason about? If it was so easy you could just do it now. The yaml could be extremely simple and just call into your app, but most don't bother.

I'm certainly willing to believe that yaml is not the ideal answer but unless we're comparing it to a concrete alternative, I feel like this is just a "grass is always greener" type take.

VGHN7XDuOXPAzol 7 hours ago | parent | next [-]

Is it actually possible to just have the YAML that calls into your app today, without losing the granularity or other important features?

I am not sure you can do this whilst having the granular job reporting (i.e. either you need one YAML block per job or you have all your jobs in one single 'status' item?) Is it actually doable?

ok123456 6 hours ago | parent | prev | next [-]

You don't have to reason about them.

You write a compiler that enforces stronger invariants above and beyond everything is an array/string/list/number/pointer.

Good general-purpose programming languages provide type systems that do just this. It is criminal that the industry simply ignores this and chooses to use blobs of YAML/JSON/XML with disastrous results---creating ad-hoc programming languages without a typesystem in their chosen poison.

jayd16 5 hours ago | parent | next [-]

The real issue isn't the code part. You can just call into whatever arbitrary thing you want for the actual script part.

YAML is used for the declarative part of structuring the job graph. The host (in this case, GitHub) would need to call into your code to build the job graph. Which means it would need to compile your code, which means it needs its own build step. This means it would need to run on a build machine that uses minutes because GitHub is not going to just run arbitrary code for free.

There's no guarantee that your arbitrary language is thread safe or idempotent so it can't really run in parallel like how a declarative file could be used.

So now you're in a situation where you add another spin up and tear down step even if your actual graph gen call is zero cost.

There's a reason it works the way it does.

silverwind 3 hours ago | parent | prev | next [-]

There is https://github.com/SchemaStore/schemastore which is effectively a type system for yaml/json.

oblio 3 hours ago | parent | prev [-]

> You write a compiler that enforces stronger invariants above and beyond everything is an array/string/list/number/pointer.

https://www.reddit.com/r/funny/comments/eccj2/how_to_draw_an...

tracker1 6 hours ago | parent | prev | next [-]

I've done exactly this a few times... ensure my scripting host is present then use scripts for everything. I can use the same scripts locally without issue and they work the same on self-hosted runners.

Note: mostly using Deno these days for this, though I will use .net/grate for db projects.

esafak 7 hours ago | parent | prev | next [-]

> If it was so easy you could just do it now.

Some do just that: dagger.io. It is not all roses but debugging is certainly easier.

iLoveOncall 6 hours ago | parent | prev [-]

There is a battle tested example of YAML vs programming languages in CloudFormation templates vs CDK.

I don't think anybody serious has any argument in favor of CloudFormation templates.

biimugan 5 hours ago | parent | prev | next [-]

I agree somewhat with the proposition that YAML is annoying for configuring something like a workflow engine (CI systems) or Kubernetes. But having it defined in YAML is actually preferable in an enterprise context. It makes it trivial to run something like OPA policy against the configuration so that enterprise standards and governance can be enforced.

When something is written in a real programming language (that doesn't just compile down to YAML or some other data format), this becomes much more challenging. What should you do in that case? Attempt to parse the configuration into an AST and operate over the AST? But in many programming languages, the AST can become arbitrarily complex. Behavior can be implemented in such a way as to make it difficult to discover or introspect.

Of course, YAML can also become difficult to parse too. If the system consuming the YAML supports in-band signalling -- i.e. proprietary non-YAML directives -- then you would need to first normalize the YAML using that system to interpret and expand those signals. But in principal, that's still at least more tractable than trying to parse an AST.

catlifeonmars 4 hours ago | parent [-]

> If the system consuming the YAML supports in-band signalling -- i.e. proprietary non-YAML directives -- then you would need to first normalize the YAML using that system to interpret and expand those signals.

cough CloudFormation cough

rickette 6 hours ago | parent | prev | next [-]

GitHub Actions originally supported HCL (Hashicorp Configuration Language) instead of YAML. But the YAML force was too strong: https://github.blog/changelog/2019-09-17-github-actions-will....

nothrabannosir 5 hours ago | parent | next [-]

HCL is same s**, different smell. Equally hamstrung. It’s the reason hashicorp came out with an actually programmable version of the hcl semantics: CDKTF.

freeplay 5 hours ago | parent | prev [-]

If you have worked with HCL in any serious capacity, you'll be happy they didn't go that route.

Here's some fun examples to see why HCL sucks:

- Create an if/elseif/else statement

- Do anything remotely complex with a for loop (tip: you're probably going to have to use `flatten` a lot)

oblio 3 hours ago | parent [-]

Stuff like HCL and Ansible YAML makes me want to require mandatory training in Ant contrib tasks for developers creating them:

https://ant-contrib.sourceforge.net/tasks/tasks/if.html

  <if>
    <equals arg1="${foo}" arg2="bar" />
      <then>
      <echo message="The value of property foo is 'bar'" />
    </then>
    <elseif>
      <equals arg1="${foo}" arg2="foo" />
      <then>
        <echo message="The value of property foo is 'foo'" />
      </then>
    </elseif>
    <else>
      <echo message="The value of property foo is not 'foo' or 'bar'" />
    </else>
</if>

https://ant-contrib.sourceforge.net/tasks/tasks/for.html

  <for param="file">
    <path>
      <fileset dir="${test.dir}/mains" includes="*.cpp"/>
    </path>
    <sequential>
      <propertyregex override="yes"
        property="program"  input="@{file}"
        regexp=".*/([^\.]\*)\.cpp" replace="\1"/>
      <mkdir dir="${obj.dir}/${program}"/>
      <mkdir dir="${build.bin.dir}"/>
      <cc link="executable" objdir="${obj.dir}/${program}"
        outfile="${build.bin.dir}/${program}">
        <compiler refid="compiler.options"/>
        <fileset file="@{file}"/>
        <linker refid="linker-libs"/>
      </cc>
    </sequential>
  </for>
Yes, programming with them was as fun as you're imagining.
0xbadcafebee 4 hours ago | parent | prev | next [-]

A custom language in GHA would be worse. You'd be limited by whatever language they supported, and any problems with it would have to go through their support team. It adds more burden on GHA (they spending more time/money on support) without creating value (new features you want).

You already don't have to use YAML. Use whatever language you want to define the configuration, and then dump it as YAML. By using your own language and outputting YAML, you get to implement any solution you want, and GitHub gets to spend more cycles building features.

Simple example:

  1. Create a couple inherited Python classes
  2. Write class functions to enable/disable GHA features and validate them
  3. Have the functions store data in the class object
  4. Use a library to output the class as YAML
  5. Now craft your GHA config by simply calling a Python object
  6. Run code, save output file, apply to your repo
I don't know why nobody has made this yet, but it wouldn't be hard. Read GHA docs, write Python classes to match, output as YAML.

If you want more than GHA features support [via configuration], use the GHA API (https://docs.github.com/en/rest/actions) or scripted workflows feature (https://github.com/actions/github-script).

tarkaTheRotter 3 hours ago | parent [-]

Hey. I'm currently making Typeflows to solve this (amongst) another few pain points, and am planning to make it available in JVM (this exists now)/TS and Python at least.

There are existing solutions around, but do miss out a bunch of things that are blatantly missing in the space:

- workflow visualisations (this is already working - you can see an example of workflow relationship and breakdowns on a non-trivial example at https://github.com/http4k/http4k/tree/master/.github/typeflo...);

- running workflows through an event simulator so you can tell cause and effect when it comes to what triggers what. Testing workflows anyone? :)

- security testing on workflows - to avoid the many footguns that there are in GHA around secrets etc;

- compliance tests around permitted Action versions;

- publishing of reusable repository files as binary dependencies that can be upgraded and compiled into your projects - including not just GHA actions and workflows but also things like version files, composable Copilot/Claude/Cursor instruction files;

- GitLab, CircleCI, Bitbucket, Azure DevOps support using the same approach and in multiple languages;

Early days yet, but am planning to make it free for OSS and paid for commercial users. I'm also dogfooding it on one of my other open source projects so to make sure that it can handle non-trivial cases. Lots to do - and hopefully it will be valuable enough for commercial companies to pay for!

Wish me luck!

https://typeflows.io/

time4tea 7 hours ago | parent | prev | next [-]

Have you seen https://typeflows.io/

It takes your programming language version and turns it into github actions yaml, so you dont need to do any of that sort of thing.

herpdyderp 6 hours ago | parent | next [-]

It has pricing tiers!? That's crazy, just use https://www.npmjs.com/package/github-actions-workflow-ts

tarkaTheRotter 5 hours ago | parent | next [-]

Hey - Typeflows maintainer here. We know that there are other similar libraries out there that do some of the same thing as Typeflows, but am hoping to go much much further than anything out there to help out teams struggling with their pipelines. Examples of things on the roadmap:

- workflow visualisations (this is already working - you can see an example of workflow relationship and breakdowns on a non-trivial example at https://github.com/http4k/http4k/tree/master/.github/typeflo...);

- running workflows through an event simulator so you can tell cause and effect when it comes to what triggers what; - security testing on workflows - to avoid the many footguns that there are in GHA around secrets etc;

- compliance tests around permitted Action versions;

- publishing of reusable repository files as binary dependencies that can be upgraded and compiled into your projects - including not just GHA actions and workflows but also things like version files, composable Copilot/Claude/Cursor instruction files;

- GitLab, CircleCI, Bitbucket, Azure DevOps support using the same approach and in multiple languages;

Lots to do - and hopefully it will be valuable enough for commercial companies to pay for!

:)

Arcuru 5 hours ago | parent | prev [-]

Its "pricing tiers" are "always free for OSS" and "TBD" for commercial use.

I like the things I depend on to actually have a funding model, so that's actually more appealing to me than something fully free.

Levitating 7 hours ago | parent | prev [-]

Website barely loads on my old phone and I can't see any examples of the syntax.

tarkaTheRotter 5 hours ago | parent [-]

Hey - maintainer here. Sorry about your bad experience and thanks for mentioning it! The Core Web Vitals test did come back ok - but evidently there's more to do so will get that sorted. (Web design not a strong point! ). The code examples should be showing on smaller screens when in landscape on mobile (they looked awful in portrait) - but will also look at that as well!

Could I possibly ask you to reply with the model of your phone so can make sure it works ok after have fixed?

bigstrat2003 5 hours ago | parent | prev | next [-]

I agree. I like YAML for a lot of things, but this is very much not one of them. CI pipelines are sufficiently complex that you will very quickly exceed the capabilities of "it's just a simple plain text markup". You need a real programming language.

jbjbjbjb 8 hours ago | parent | prev | next [-]

I couldn’t agree more. I think we should just write our pipelines in languages our teams are familiar with and prioritise being able to run them locally.

delusional 8 hours ago | parent [-]

> prioritise being able to run them locally.

That is the key function any serious CI platform needs to tackle to get me interested. FORCE me to write something that can run locally. I'll accept using containers, or maybe even VMs, but make sure that whatever I build for your server ALSO runs on my machine.

I absolutely detest working on GitHub Actions because all too often it ends up requiring that I create a new repo where I can commit to master (because for some reason everybody loves writing actions that only work on master). Which means I have to move all the fucking secrets too.

Solve that for me PLEASE. Don't give me more YAML features.

freeplay 5 hours ago | parent | next [-]

+1

Working with ADO pipelines is painful.

- Make change locally

- Push change

- Run pipeline

- Wait forever because ADO is slow

- Debug the error caused by some syntax issue in their bastardized version of yaml

- Repeat

cyanydeez 3 hours ago | parent | prev [-]

Gitlab does this locallyand i asdume in their cloud

AaronAPU 3 hours ago | parent | prev | next [-]

I generate all my GH YAML files via Python. The thought of writing them by hand makes me want to vomit, one of the best design choices I ever made.

ericHosick 8 hours ago | parent | prev | next [-]

Yes! Hopefully a language that supports code as data (homoiconicity).

ivanjermakov 8 hours ago | parent | prev | next [-]

Notable mentions are Zig build system and nob: https://github.com/tsoding/nob.h.

ZYbCRq22HbJ2y7 8 hours ago | parent | prev | next [-]

You could make a builder to do this for you. It could build your actions in a pre-commit hook or whatever.

Although, I think it is generally an accepted practice to use declarative configuration over imperative configuration? In part, maybe what the article is getting at, maybe?

baq 6 hours ago | parent [-]

YAML is neither declarative nor imperative. It's just a tree (or graph, with references) serialization to text.

ZYbCRq22HbJ2y7 an hour ago | parent | next [-]

I didn't say anything about YAML? Also, I think the term "declarative configuration" has drifted to mean different things over time depending on where it is used.

2 hours ago | parent | prev [-]
[deleted]
zft 4 hours ago | parent | prev | next [-]

jenkins supports groovy dsl jobs. I would not say using it made anything easier

oblio 3 hours ago | parent | next [-]

Well, Groovy is a bit of a basket case programming language, so that doesn't help.

I say this as someone that built entire Jenkins Groovy frameworks for automating large Jenkins setups (think hundreds of nodes, thousands of Jenkins jobs, stuff like that).

lukasmark744 3 hours ago | parent [-]

[dead]

4 hours ago | parent | prev [-]
[deleted]
wiether 8 hours ago | parent | prev | next [-]

Basically what we ended up doing at work is creating some kind of YAML generator.

We write Bash or Python, and our tool will produce the YAML pipeline reflecting it.

So we dont need to maintain YAML with over-complicated format.

The resulting YAML is not meant to be read by an actual human since its absolute garbage, but the code we want to run is running when we want, without having to maintain the YAML.

And we can easily test it locally.

easterncalculus 6 hours ago | parent [-]

I work on a monorepo that does this using Typescript, for type checking. It's a mess. Huge learning curve for some type checking that very often will build perfectly fine but fail a type-check in CI.

Honestly, just having a linter should be enough. Ideally, anything complicated in your build should just be put into a script anyways - it minimizes the amount of lines in that massive YAML file and the potential for merge conflicts when making small changes.

red_hare 8 hours ago | parent | prev | next [-]

I'm surprised by this take. I love YAML for this use case. Easy to write and read by hand, while also being easy to write and read with code in just about every language.

baq 8 hours ago | parent | next [-]

YAML is a serialization format. I like YAML as much as I like base64, that is I don't care about it unless you make me write it by hand, then I care very much.

GitHub Actions have a lot of rules, logic and multiple sublanguages in lots of places (e.g. conditions, shell scripts, etc.) YAML is completely superficial, XML would be an improvement due to less whitespace sensitivity alone.

pacoWebConsult 8 hours ago | parent | prev | next [-]

Sure, easy to read, but quite difficult to /reason/ about in your head, let alone have proper language server/compiler support given the abstraction over provider events and runner state. I have never written a CI pipeline correctly without multiple iterations of pushing updates to the pipeline definition, and I don't think I'm alone on that.

shadowgovt 8 hours ago | parent | prev | next [-]

Easy to write and read until it gets about a page or two long. Then you have to figure out stuff like "Oh gee, I'm no nesting layer 18, so that's... The object.... That is.... The array of.... The objects of....."

Plus it has exactly enough convenience-feature-related sharp edges to be risky to hand to a newbie, while wearing the dress of something that should be too bog-simple to have that problem. I, too, enjoy languages that arbitrarily decide the Norwegian TLD is actually a Boolean "false."

TheDong 8 hours ago | parent | prev | next [-]

> Easy to write and read by hand, while also being easy to write and read with code in just about every language

Language implementations for yaml vary _wildly_.

What does the following parse as:

    some_map:
      key: value
      no: cap
If I google "yaml online" and paste it in, one gives me:

{'some_map': {False: 'cap', 'key': 'value'}}

The other gives me:

{'some_map': {'false': 'cap', 'key': 'value'}}

... and neither gives what a human probably intended, huh?

karmarepellent 6 hours ago | parent | next [-]

This is why I've become a fan of StrictYAML [0]. Of course it is not supported by many projects, but at least you are given the option to dispense with all the unnecessary features and their associated pitfalls in the context of your own projects.

Most notably it only offers three base types (scalar string, array, object) and moves the work of parsing values to stronger types (such as int8 or boolean) to your codebase where you tend to wrap values parsed from YAML into other types anyway.

Less surprises and headaches, but very niche, unfortunately.

[0] https://hitchdev.com/strictyaml/

sauercrowd 8 hours ago | parent | prev | next [-]

That only matters if you're parsing the same yaml file with different parsers, which GitHub doesn't (and I doubt most people do - it's mostly used for config files)

sevensor 8 hours ago | parent [-]

“The meaning of YAML is implementation-defined” is a big reason I stay far away whenever I can.

Perz1val 8 hours ago | parent | prev [-]

The classic Norway bug

Pxtl 8 hours ago | parent | prev [-]

It's less about YAML itself than the MS yaml-based API for interacting with build-servers. It's just so hard to check and test and debug.

everfrustrated 2 hours ago | parent [-]

This so much this. Vscode has a very good syntax check github actions yaml so it's not yaml that's the problem.

It's the workflow for developing pipelines that's the problem. If I had something I could run locally - even in a debug dry-run only form that would go a long way to debugging job dependencies, etc. Testing failure cases flow conditional logic in the expected manner etc.

mhh__ 8 hours ago | parent | prev | next [-]

I'm not convinced there should be anything to define at all versus basically just some extremely broad but bare platform and a slot to stick an executable in.

Pxtl 8 hours ago | parent | prev | next [-]

Yes. Most of my custom pipeline stuff is a thin wrapper around a normal-ass scripting-language because the yaml/macro stuff is so hard to check and debug.

datadrivenangel 8 hours ago | parent | prev | next [-]

Locally hard to test is the point. Lockin.

imiric 8 hours ago | parent | prev | next [-]

Agreed. YAML is not a great format to begin with, but using it for anything slightly more sophisticated (looking at you Ansible, k8s, etc.) is an exercise in frustration.

I really enjoyed working with the Earthfile format[1] used for Earthly CI, which unfortunately seems like a dead end now. It's a mix of Dockerfile and Makefile, which made it made very familiar to read and write. Best of all, it allowed running the pipeline locally exactly as it would run remotely, which made development and troubleshooting so much easier. The fact GH Actions doesn't have something equivalent is awful UX[2].

Honestly, I wish the industry hadn't settled on GitHub and GH Actions. We need better tooling and better stewards of open source than a giant corporation who has historically been hostile to open source.

[1]: https://earthly.dev/earthfile

[2]: Yes, I'm aware of `act`, but I've had nothing but issues with it.

verdverm 4 hours ago | parent | prev | next [-]

I use CUE and generate the yaml, don't care what a giant unreadable slop it is anymore

I use CUE to read yamhell too

6 hours ago | parent | prev | next [-]
[deleted]
giancarlostoro 8 hours ago | parent | prev [-]

Wouldn't Terraform solve this? You can have all your infrastructure as code in a git repo.