Writing C for eBPF is cumbersome and you'd like to avoid it. Okay, that's reasonable. But I don't think it would be a good idea to write a compiler that emits eBPF binary from (a tiny subset of) Python. Why not just write code in pseudo-Python (or whatever language you're comfortable with) and have it translated by an LLM, and paste it in the source code? That would be much better because there would be fewer layers and a significant reduction in runtime cost.

▲

tecleandor a day ago | parent | next [-]

I don't understand...

So, instead of having a defined and documented subset of Python that compiles to eBPF in a deterministic way... use an undefined pseudo language and let the LLM have fun with it without understanding if the result C is correct?

What would be the advantage?

▲

drivenextfunc a day ago | parent [-]

The behavior of CPython and a few other implementations of Python (such as PyPy) is well documented and well understood. The semantics of the tiny subset of Python that this Python-to-eBPF compiler understands is not. For example, inferring from the fact that it statically compiles Python-ish AST to LLVM IR, you can have a rough idea that dynamic elements of Python semantics are unlikely to be compiled, but you cannot know exactly which elements without carefully reading the documentation or source code of the compiler. You can guess globals() or locals() won't work, maybe .__dict__ won't as well, but how about type() or isinstance()? You don't know without digging into the documentation (which may be lacking), because the subset of Python this compiler understands is rather arbitrary.

And also, having an LLM translate Python-ish pseudo code into C does not imply that you cannot examine it before putting it into a program. You can manually review it and make modifications as you want. It just reduces time spent compared with writing C code by hand.

	▲	tecleandor a day ago \| parent [-]
		But then we have to write the pseudocode anyway (that cannot be corrected by my IDE, so I don't know if I have pseudomistakes [sorry for the pun]), the LLM 'transpile' (that's not understood at all), and you have to review the C code anyway, so you have to know eBPF code really well. Would that represent a time advantage?

▲

Twirrim a day ago | parent | prev | next [-]

Are you seriously asking why someone might want to do something guaranteed to behave exactly as they defined it, when they could have an LLM hallucinate code that touches the core of their system, instead?

Why would anyone go with the inaccurate option?

▲

otabdeveloper4 a day ago | parent | prev | next [-]

LLMs will never be able to write eBPF code.

eBPF is a weird, formally validated secure subset of C. No "normal" C program will ever pass the eBPF validation checks.

▲

nickysielicki a day ago | parent [-]

LLMs can easily already write eBPF code. Try it.

▲

otabdeveloper4 a day ago | parent [-]

> tell me how you never actually developed an eBPF program without telling me you never actually developed an eBPF program

▲

nickysielicki 18 hours ago | parent [-]

Just try it. Here’s an example that I know it will work flawlessly for, because I used it for this: at $formerjob, all laptops come with a piece of malware called “connections”, which obnoxiously pops up at some point during the day (stealing window/mouse focus) and asks you some asinine survey question about morale on your team and/or the company values. There are a few good ways to solve this: apparmor/selinux (but this runs the risk of your config file conflicting with the rules shipped by IT), a simple bash script that runs pkill every 5 seconds (too slow and it still steals focus, too fast and your laptop fans start spinning), etc. A better way is to use a bpf hook on execve.

Ask an LLM to write a simple ebpf program which kills any program with a specific name/path. Even crappy local models can handle this with ease.

If you’re talking about more complicated map-based programs, you’re probably right that it will struggle a bit, but it will still figure it out. The eBPF api is not very different than any other C api at the end of the day. It will do fine without the standard library, if you ask it to.

▲

otabdeveloper4 11 hours ago | parent [-]

By eBPF I mean things like XDP network filters.

The issue here is the static formal validation the kernel does before loading your eBPF program.

(Even humans don't really know how it works. You need to use specific byte width types and access memory in specific patterns or the validation will fail.)

	▲	nickysielicki 5 hours ago \| parent [-]
		Respectfully, you don’t know what you’re talking about. 1. If you meant XDP, you should have said XDP, not eBPF. 2. The kernel does that validation on all ebpf code that it loads, regardless of whether XDP is involved. 3. Humans know how it works.

▲

vrighter a day ago | parent | prev [-]

"translated by an llm"

smh my head