|
| ▲ | jhallenworld 5 hours ago | parent | next [-] |
| I have been using Walter Bright's libc code from Zortech-C for microcontrollers, where I care about code size more than anything else: https://github.com/nklabs/libnklabs/blob/main/src/nkprintf_f...
https://github.com/nklabs/libnklabs/blob/main/src/nkstrtod.c
https://github.com/nklabs/libnklabs/blob/main/src/nkdectab.c nkprintf_fp.c+nkdectab.c: 2494 bytes schubfach.cc: 10K bytes.. the code is small, but there is a giant table of numbers. Also this is just dtoa, not a full printf formatter. OTOH, the old code is not round-trip accurate. Russ Cox should make a C version of his code.. |
| |
|
| ▲ | magicalhippo 4 hours ago | parent | prev | next [-] |
| What about reasonably fast but smallest code, for running on a microcontroller? Anything signifactly better in terms of compiled size (including lookups)? |
| |
| ▲ | vitaut 5 minutes ago | parent [-] | | If you compress the table (see my earlier comment) and use plain Schubfach then you can get really small binary size and decent perf. IIRC Dragonbox with the compressed table was ~30% slower which is a reasonable price to pay and still faster than most algorithms including Ryu. | | |
|
|
| ▲ | andrepd 5 hours ago | parent | prev [-] |
| I implemented Teju Jaguá in Rust, based of the original C impl https://github.com/andrepd/teju-jagua-rs. Comparing to Zmij, I do wonder how much speedup is there on the core part of the algorithm (f2^e -> f10^e) vs on the printing part of the problem (f*10^e -> decimal string)! Benchmarks on my crate show a comparable amount of time spent on each of those parts. |
| |
| ▲ | vitaut 3 minutes ago | parent [-] | | I don't have exact numbers but from measuring perf changes per commit it seemed that most improvements came from "printing" (switching to BCD and SIMD) and basic optimizations like removing poorly predicted conditional branches. |
|