Remix.run Logo
ot 2 hours ago

You can do even faster, about 8ns (almost an additional 10x improvement) by using software perf events: PERF_COUNT_SW_TASK_CLOCK is thread CPU time, it can be read through a shared page (so no syscall, see perf_event_mmap_page), and then you add the delta since the last context switch with a single rdtsc call within a seqlock.

This is not well documented unfortunately, and I'm not aware of open-source implementations of this.

EDIT: Or maybe not, I'm not sure if PERF_COUNT_SW_TASK_CLOCK allows to select only user time. The kernel can definitely do it, but I don't know if the wiring is there. However this definitely works for overall thread CPU time.

jerrinot 2 hours ago | parent [-]

That's a brilliant trick. The setup overhead and permission requirements for perf_event might be heavy for arbitrary threads, but for long-lived threads it looks pretty awesome! Thanks for sharing!

ot 2 hours ago | parent [-]

Yes you need some lazy setup in thread-local state to use this. And short-lived threads should be avoided anyway :)