Remix.run Logo
heipei 5 days ago

Counterpoint: I once wrote a paper on accelerating blockciphers (AES et al) using CUDA and while doing so I realised that most (if not all) previous academic work which had claimed incredible speedups had done so by benchmarking exclusively on zero-bytes. Since these blockciphers are implemenented using lookup tables this meant perfect cache hits on every block to be encrypted. Benchmarking on random data painted a very different, and in my opinion more realistic picture.

atiedebee 5 days ago | parent | next [-]

Why not use real world data instead? Grab a large pdf or archive and use that as the benchmark.

bluGill 5 days ago | parent | next [-]

A common case is to compress data before encrypting it so random is realistic. Pdf might allow some optimization (how I don't know) that is not representative

jbaber 5 days ago | parent | prev [-]

Or at least the canonical test vectors. All zeros is a choice.

almostgotcaught 5 days ago | parent | prev | next [-]

There are an enormous number of papers like this. I wrote a paper accelerating a small CNN classifier on FPGA and when I compared against the previous SOTA GPU implementation and the numbers were way off from the paper. I did a git blame on their repo and found that after the paper was published they deleted the lines that short-circuited eval if the sample was all 0 (which much of their synthetic data was ¯\_(ツ)_/¯).

michaelcampbell 3 days ago | parent | prev | next [-]

I'm sure I'm getting this wrong, but I think I remember someone pointing out that Borland's Turbo Pascal "compiled lines per second" figure no one could even come close to replicating, until someone wrote a legitimate Pascal program that had essentially that number of lines containing only ";", or something along those lines.

It WAS still a great compiler, and way faster than the competition at the time.

Marazan 4 days ago | parent | prev | next [-]

Back when Genetic Algorithms were a hot topic I discovered that a large number of papers discussion optimal parameterisation of the the approach (mutation rate, cross-over, populations etc) were written using '1-Max' as the "problem" to be solved by the GA. (1-Max being attempting to make every bit of the candidate string a 1 rather than 0)

This literally made the GA encoding exactly the same as the solution and also very, very obviously favoured techniques that would MAKE ALL THE BITS 1!

imtringued 4 days ago | parent [-]

This reminds me of this paper:

https://link.springer.com/article/10.1007/s10710-021-09425-5

They do the convergence analysis on the linear system Ax = 0, which means any iteration matrix (including a zero matrix) that produces shrinking numbers will converge to the obvious solution x = 0 and the genetic program just happens to produce iteration matrices with lots of zeros.

rdc12 5 days ago | parent | prev [-]

Do you have a link to that paper?