Identifying a representative usage scenario to optimize towards and then implementing that scenario in a microbenchmark test driver are both massively difficult to get right, and a "failure" in this regard, as the author found, can be hard to detect before you sink a lot of time into it.

Even for seemingly trivial scenarios like searching an array, the contents and length of the array make a massive difference in results and how to optimize (as shown in the last section of this write-up where I tried to benchmark search algorithms correctly https://www.jasonthorsness.com/23).

I've not seen a perfect solution to this that isn't just "thinking carefully about the test setup" (profile-guided optimization/production profiles replayed for benchmarks seem like maybe it could be an alternative, but I haven't seen that used much).

▲

bgirard 5 days ago | parent | next [-]

> Identifying a representative usage scenario to optimize towards and then implementing that scenario in a microbenchmark test driver are both massively difficult to get right, and a "failure" in this regard, as the author found, can be hard to detect before you sink a lot of time into it.

I can vouch for this. I've been doing web performance for nearly 15 years and finding clear representative problems is one the hardest parts of my job. Once I understood this and worked on getting better at finding representative issues, it became the single thing that I did to boost my productivity the most.

▲

jasonthorsness 5 days ago | parent | next [-]

What have you found are the most useful approaches to collecting the representative issues in a way you can reuse and test? I haven’t worked as much in the web space.

	▲	bgirard 5 days ago \| parent [-]
		A combination of: 1) RUM, 2) Field tracing, 3) Local tracing with throttling (CPU, Network), 4) Focusing either on known problems, or if I'm on less certain territory I will do more vetting to make sure I'm chasing something real. I'll minimize the time spent on issues that I can only catch in one trace, but sometimes it's also all you get so I'll time box that work carefully. It's more of an art than a science.

▲

viraptor 5 days ago | parent | prev [-]

Tracing and continuous profiling made this task significantly easier than it was in the past, fortunately.

▲

bgirard 5 days ago | parent | next [-]

It's great. Much harder on Web because it's difficult to get rich information from the browser. This is why I contributed to the JS Self-Profiling API in the first place: https://developer.mozilla.org/en-US/docs/Web/API/JS_Self-Pro...

▲

CodesInChaos 5 days ago | parent | prev [-]

Can your recommend tools for continuous profiling? I'm mainly interested in Rust and C# on Linux.

I'm not sure if it has gotten easier. Async is harder to profile than classic threaded code using blocking IO. And for database bound code, I'm not sure which databases output detailed enough performance information.

	▲	viraptor 5 days ago \| parent [-]
		https://grafana.com/products/cloud/profiles-for-continuous-p... can be self hosted as https://grafana.com/oss/pyroscope/ And there's also https://docs.datadoghq.com/profiler/

▲

sfink 5 days ago | parent | prev | next [-]

It's a subset of profile-guided optimization, but I fairly frequently will gather summary histograms from more or less representative cases and then use those to validate my assumptions. (And once in a while, write a benchmark that produces a similar distribution, but honestly it usually comes down to "I think this case is really rare" and checking whether it is indeed rare.)

▲

anonymoushn 5 days ago | parent | prev | next [-]

I have asked for like the ability to apply a filter that maps real strings from real users to a very small number of character classes, so that I may run some codec on data that is equivalent to user data for the purposes of the codec. I have heard, this is not a good use of my time, and if I really care so much I should instead make various guesses about the production data distribution of user-created strings (that need to be json-escaped) and then we can deploy each of them and keep the best, if we can even tell the difference.

	▲	jasonthorsness 5 days ago \| parent [-]
		If the real data is sensitive, it's hard to distill test data from it that succeeds in fully removing the sensitivity but that is still useful. Depending on the domain even median string length could be sensitive.

▲

kazinator 5 days ago | parent | prev | next [-]

If they were forking that JVM in order to offer it as a development tool to a broad developer base, then such an optimization might be worth keeping.

Someone out there might have an application in which they are serializing large numbers of large integers.

The case for abandoning it is not as strong, since you don't know who is doing what.

It's a guessing game in programming language run-times and compilers: "will anyone need this improvement?" You have room to invent "yes" vibes. :)

	▲	grogers 5 days ago \| parent \| next [-]
		If the distribution of numbers you are serializing is uniformly random, you generally wouldn't choose a variable length encoding. Sometimes the choice is made for you of course, but not usually.
	▲	necovek 5 days ago \| parent \| prev [-]
		The problem is that this is a complex assembly implementation that needs to be maintained over the course of decades in the future. Unless you have a good reason to keep it, you should drop it.

▲

spott 5 days ago | parent | prev [-]

Why is just capturing real values not the answer here?

Something like grab .01% of the inputs to that function for a day or something like that (maybe a lower fraction over a week or month).

Is the cost of grabbing this that high?

	▲	adrianN 5 days ago \| parent [-]
		That works for some cases, but if you have a service that processes customer data you probably don’t want to make copies without consulting with legal.