That's interesting. I made measurements with Mono and CoreCLR some years ago, but only with a single thread, and I came to the conclusion that their performance was essentially the same (see https://rochus.hashnode.dev/is-the-mono-clr-really-slower-th...). Can someone explain what benchmarks were actually used? Was it just the "Simple benchmark code" in listing 1?

▲

to11mtm a day ago | parent | next [-]

I think a lot of the devil is in the details, especially when we look at NET8/NET10 and the various other 'boosts' they have added to code.

But also, as far as this article, it's noting a noting a more specific use case that is fairly 'real world'; Reading a file (I/O), doing some form of deserialization (likely with a library unless format is proprietary) and whatever 'generating a map' means.

Again, this all feels pretty realistic for a use case so it's good food for thought.

> Can someone explain what benchmarks were actually used?

This honestly would be useful in the article itself, as well as, per above, some 'deep dives' into where the performance issues were. Was it in file I/O (possibly Interop related?) Was it due to some pattern in the serialization library? Was it the object allocation pattern (When I think of C# code friendly for Mono I think of Cysharp libraries which sometimes do curious things)? Not diving deeper into the profiling doesn't help anyone know where the focus needs to be (unless it's a more general thing, in which case I'd hope for a better deep dive on that aspect.)

Edited to add:

Reading your article again, I wonder whether your compiler is just not doing the right things to take advantage of the performance boosts available via CoreCLR?

E.x. can you do things like stackalloc temp buffers to avoid allocation, and does the stdlib do those things where it is advantageous?

Also, I know I vaguely hit on this above, but also wondering whether the IL being done is just 'not hitting the pattern'. where a lot of CoreCLR will do it's best magic if things are arranged a specific way in IL based on how Roslyn outputs, but even for the 'expected' C# case, deviations can lead to breaking the opt.

▲

WorldMaker 6 hours ago | parent | next [-]

> Reading your article again, I wonder whether your compiler is just not doing the right things to take advantage of the performance boosts available via CoreCLR?

> E.x. can you do things like stackalloc temp buffers to avoid allocation, and does the stdlib do those things where it is advantageous?

The C# standard lib (often called the base class library or BCL) has seen a ton of Span<T>/Memory<T>/stackalloc internal usage adoption in .NET 6+, with each release adding more of them. Things like File IO and serialization/deserialization particularly see a lot of notable performance improvements just from upgrading each .NET version. .NET10 is faster than .NET9 with a lot of the same code, and so forth.

Mono still benefits from some of these BCL improvements (as more of the BCL is shared than not these days, and Blazor WASM for the moment is still more Mono than CoreCLR so some investment has continued), but not all of them and not always in the same ways.

▲

Rochus a day ago | parent | prev [-]

The goal of my compiler is not to get out maximum performance, neither of CoreCLR nor Mono. Just look at it as a random compiler which is not C#, especially not MS's C# which is highly in sync and optimized for specific features of the CoreCLR engine (which might appear in a future ECMA-335 standard). So the test essentially was to see what both, CoreCLR and Mono, do with non-optimized CIL generated by not their own compiler. This is a legal test case because ECMA-335 and its compatible CLR were exactly built for this use-case. Yes, the CIL output of my compiler could be much more improved, and I could even get more performance out of e.g. CoreCLR by using the specific knowledge of the engine (which you cannot find in the standard) which also the MS C# compiler uses. But that was not my goal. Both engine got the same CIL code and I just measured how fast it run on both engines on the same machine.

	▲	16 hours ago \| parent [-]
		[deleted]

▲

LeFantome 6 hours ago | parent | prev | next [-]

I think the “some years ago” is pretty relevant.

.NET has heavily invested in performance. If I understand your article correctly, you tested .NET 5 which will be much slower at this point than .NET 10 is.

I also think it matters what you mean by “Mono”. Mono, the original stand-alone project has not seen meaningful updates in many years. Mono is also one of the two runtimes in the currently shipping .NET though and I suspect this runtime has received a lot of love that may not have flowed back to the original Mono project.

▲

eterm a day ago | parent | prev [-]

What's going on with the Mandelbrot result in that post?

I don't beleive such a large regression from .NET framework to CoreCLR.

▲

Rochus 3 hours ago | parent | next [-]

The Mono and .Net 4 times were too short; the true time is unknown. I only left the Mandelbrot result because I got a decently looking figure for CoreCLR, but the actual factor to Mono is unreliable. If the Mono result was 1, the factor would still be seven. I have no idea why it is that much faster.

▲

to11mtm a day ago | parent | prev [-]

NGL would be nice if there was a clear link to the cases used both for OP as well as who you are replying to... Kinda get it in OP's case tho.

▲

Rochus 9 hours ago | parent [-]

I measured the raw horsepower of the JIT engine itself, not the speed of the standard library (BCL). My results show that the Mono engine is surprisingly capable when executing pure IL code, and that much of the 'slowness' people attribute to Mono actually comes from the libraries, not the runtime itself.

In contrast, the posted article uses a very specific, non-standard, and "apple-to-oranges" benchmark. It is essentially comparing a complete game engine initialization against a minimal console app (as far as I understand), which explains the massive 3x-15x differences reported. The author is actually measuring "Unity Engine Overhead + Mono vs. Raw .NET", not actually "Mono vs. .NET" as advertized. The "15x" figure comes very likely from the specific microbenchmark (struct heavy loop) where Mono's optimizer fails, extrapolated to imply the whole runtime is that much slower.

▲

eterm 5 hours ago | parent [-]

Can we reproduce your results for Mandelbrot?

	▲	Rochus 4 hours ago \| parent [-]
		You can find all necessary information/data in the article (see references). Finding the same hardware that I used might be an issue though. Concerning Mandelbrot, I wouldn't spend too much time, because the runtime was so short for some targets that it likely has a big error margin compared to the other results. For my purpose this is not critical because or the geometric mean over all factors.