Remix.run Logo
iliketrains 20 hours ago

Author here, thanks for your perspective. Here some thoughts:

> approach of separating the simulation and presentation layers isn't all that uncommon

I agree that some level of separation is is not that uncommon, but games usually depend on things from their respective engine, especially on things like datatypes (e.g. Vector3) or math libraries. The reason I mention that our game is unique in this way is that its non-rendering code does not depend on any Unity types or DLLs. And I think that is quite uncommon, especially for a game made in Unity.

> Most games don't ship on the mono backend, but instead on il2cpp

I think this really depends. If we take absolute numbers, roughly 20% of Unity games on Steam use IL2CPP [1]. Of course many simple games won't be using it so the sample is skewed is we want to measure "how many players play games with IL2CPP tech". But there are still many and higher perf of managed code would certainly have an impact.

We don't use IL2CPP because we use many features that are not compatible with it. For example DLC and mods loading at runtime via DLLs, reflection for custom serialization, things like [FieldOffset] for efficient struct packing and for GPU communication, etc.

Also, having managed code makes the game "hackabe". Some modders use IL injection to be able to hook to places where our APIs don't allow. This is good and bad, but so far this allowed modders to progress faster than we expected so it's a net positive.

> In modern Unity, if you want to achieve performance, you'd be better off taking the approach of utilizing the burst compiler and HPC#

Yeah, and I really wish we would not need to do that. Burst and HPC# are messy and add a lot of unnecessary complexity and artificial limitations.

The thing is, if Mono and .NET were both equally "slow", then sure, let's do some HPC# tricks to get high performance, but it is not! Modern .NET is fast, but Unity devs cannot take advantage of it, which is frustrating.

By the way, the final trace with parallel workers was just C#'s workers threads and thread pool.

> Profiling the editor is always a fools errand

Maybe, but we (devs) spend 99% of our time in the editor. And perf gains from editor usually translate to the Release build with very similar percentage gains (I know this is generally not true, but in my experience it is). We have done many significant optimizations before and measurements from the editor were always useful indicator.

What is not very useful is Unity's profiler, especially with "deep profile" enabled. It adds constant cost per method, highly exaggerating cost of small methods. So we have our own tracing system that does not do this.

> I've seen a lot of mention around GC through this comment section, and professional Unity projects tend to go out of their way to minimize these at runtime

Yes, minimizing allocations is key, but there are many cases where they are hard to avoid. Things like strings processing for UI generates a lot of garbage every frame. And there are APIs that simply don't have an allocation-free options. CoreCLR would allow to further cut down on allocations and have better APIs available.

Just the fact that the current GC is non-moving means that the memory consumption goes up over time due to fragmentation. We have had numerous reports of "memory" leaks where players report that after periodic load/quit-to-menu loops, memory consumption goes up over time.

Even if we got fast CoreCLR C# code execution, these issues would prevail, so improved CG would be the next on the list.

[1] https://steamdb.info/stats/releases/?tech=SDK.UnityIL2CPP

animal531 7 hours ago | parent | next [-]

What I agree on is that if we had modern .NET available we'd get a free 2-3x improvement, it would definitely be great. BUT having said that, if you're into performance but unwilling to use the tools available then that's on you.

From the article it seems that you're using some form of threading to create things, but you don't really specify which and/or how.

The default C# implementations are usually quite poor performance wise, so if you used for example the default thread pool I can definitively say that I've achieved a 3x speedup over that by using my own thread pool implementation which would yield about the same 30s -> 12s reduction.

Burst threading/scheduling in general is also a lot better than the standard one, in general if I feed it a logic heavy method (so no vectorization) then I can beat it by a bit, but not close to the 3x of the normal thread pool.

But then if your generation is number heavy (vs logic) then having used Burst you could probably drop that calculation time down to 2-3 seconds (in the same as if you used Vector<256> numerics).

Finally you touch on GC, that's definitely a problem. The Mono variant has been upgraded by them over time, but C# remains C# which was never meant for gaming. Even if we had access to the modern one there would still be issues with it. As with all the other C# libraries etc., they never considered gaming a target where what we want is extremely fast access/latency with no hiccups. C# in the business world doesn't really care if it loses 16ms (or 160ms) here and there due to garbage, it's usually not a problem there.

Coding in Unity means having to go over every instance of allocation outside of startup and eliminating them, you mention API's that still need to allocate which I've never run into myself. Again modern isn't going to simply make those go away.

mrsmrtss 5 hours ago | parent [-]

Regarding GC pauses, there is an interesting alternative GC with ultra low pauses for .NET called Satori. It's primarly discussed here https://github.com/dotnet/runtime/discussions/115627, and the GC itself can be found here https://github.com/VSadov/Satori

timmytokyo 20 hours ago | parent | prev | next [-]

>We don't use IL2CPP because we use many features that are not compatible with it. For example DLC and mods loading at runtime via DLLs, reflection for custom serialization, things like [FieldOffset] for efficient struct packing and for GPU communication, etc.

FieldOffset is supported by IL2CPP at compile time [0]. You can also install new DLLs and force the player to restart if you want downloadable mod support.

It's true that you can't do reflection for serialization, but there are better, more performant alternatives for that use case, in my experience.

[0] https://docs.unity3d.com/Manual/scripting-restrictions.html

iliketrains 19 hours ago | parent | next [-]

> You can also install new DLLs and force the player to restart if you want downloadable mod support.

I am not aware of an easy way to load (managed) mods as DLLs to IL2CPP-compiled game. I am thinking about `Assembly.LoadFrom("Mod.dll")`.

Can you elaborate how this is done?

> there are better, more performant alternatives for that use case, in my experience.

We actually use reflection to emit optimal code for generic serializers that avoid boxing and increase performance.

There may be alternatives, we explored things like FlatBuffers and their variants, but nothing came close to our system in terms of ease of use, versioning support, and performance.

If you have some suggestions, I'd be interested to see what options are out there for C#.

> FieldOffset is supported by IL2CPP at compile time

You are right, I miss-remembered this one, you cannot get it via reflection, but it works.

timmytokyo 19 hours ago | parent [-]

>I am not aware of an easy way to load (managed) mods as DLLs to IL2CPP-compiled game. I am thinking about `Assembly.LoadFrom("Mod.dll")`.

Ah, I was thinking native DLLs (which is what we're using on a project I'm working on). I think you're right that it's impossible for an IL2CPP-built player to interoperate with a managed (Mono) DLL.

>If you have some suggestions [re: serialization], I'd be interested to see what options are out there for C#.

We wrote a custom, garbage-free JSON serializer/deserializer that uses a fluent API style. We also explored a custom codegen solution (similar to FlatBuffers or protobuf) but abandoned it because the expected perf (and ergonomic) benefits would have been minor. The trickiest part with Unity codegen is generating code that creates little to no garbage.

mastax 19 hours ago | parent | prev [-]

Does unity have source generators support? Could make for a good alternative to reflection.

CreepGin 19 hours ago | parent [-]

Yes and it works well IME. https://docs.unity3d.com/6000.3/Documentation/Manual/roslyn-...

Now I think about it, writing SourceGenerators is actually a great fit for AI agents.

luaKmua 20 hours ago | parent | prev [-]

Hey there, always appreciate a dialog

Per the separation, I think this was far more common both in older unity games, and also professional settings.

For games shipping on mono on steam, that statistic isn't surprising to me given the amount of indie games on there and Unity's prevalence in that environment. My post in general can be read in a professional setting (ie, career game devs). The IL injection is a totally reasonable consideration, but does (currently) lock you out of platforms where AoT is a requirement. You can also support mods/DLC via addressables, and there has been improvement of modding tools for il2cpp, however you're correct it's not nearly as easy.

Going to completely disagree that Burst and HPC# are unnecessary and messy. This is for a few reasons. The restrictions that HPC# enforce essentially are the same you already have if you want to write performant C# code as you just simply use Unity's allocators for your memory up front and then operate on those. Depending on how you do this, you either can eliminate your per frame allocations, or likely eliminate some of the fragmentation you were referring to. Modern .Net is fast, of course, but it's not burst compiled HPC# fast. There are so many things that the compiler and LLVM can do based on those assumptions. Agreed C# strings are always a pain if you actually need to interpolate things at runtime. We always try to avoid these as much as we can, and intern common ones.

The fragmentation you mention on after large operations is (in my experience) indicative of save/load systems, or possibly level init code that do tons of allocations causing that to froth up. That or tons of reflection stuff, which is also usually nono for runtime perf code. The memory profiler used to have a helpful fragmentation view for that, but Unity removed it unfortunately.

Rohansi 18 hours ago | parent | next [-]

> Modern .Net is fast, of course, but it's not burst compiled HPC# fast.

Sure, but the fact that it is competitive with Burst makes it disappointing. If I'm going to go through the trouble of writing code in a different (and not portable!) way then it better be significantly faster. Especially when most code cannot be written as Burst jobs unless you use their (new) ECS.

https://github.com/tbg10101/dotnet-burst-comparison

gr4vityWall 5 hours ago | parent [-]

I wonder what those benchmarks would look like with .NET 10 and an AVX512-capable CPU.

iliketrains 16 hours ago | parent | prev | next [-]

> Going to completely disagree that Burst and HPC# are unnecessary and messy.

Making a managed code burst-compatible comes with real constraints that go beyond "write performant C#". In Burstable code, you generally can't interact with managed objects/GC-dependent APIs, so the design is pushed towards unmanaged structs in native collections. And this design spreads. The more logic is to be covered by Burst, the more things has to be broken down to native containers of unmanaged structs.

I agree that designing things in data-oriented way is good, but why to force this additional boundary and special types on devs instead of just letting them write it in C#? Writing burstable code can increase complexity, one has to manage memory/lifetimes, data layout, and job-friendly boundaries, copying data between native and managed collections, etc., not just "writing fast C#".

In a complex simulation game, my experience is that there are definitely things that fit the "raw data, batch processing" model, but not all gameplay/simulation logic does. Things like inheritance, events, graphs, AI (the dumb "game" version, no NN), UI, exceptions, etc. And on top of it all, debugging complications.

Wouldn't you be relieved with announcement: "C# is now as fast as Burst, have fun!"? You'd be able to do the same data-oriented design where necessary, but keep all the other tings handy standing by when needed. It's so close, yet, so far!

> The fragmentation you mention

What you say makes sense. I've actually spent a lot of time debugging this and I did find some "leaks" where references to "dead objects" were keeping them from being GC'd. But after sorting all these out, Unity's memory profiler was showing that "Empty Heap Space" was the culprit, that one kept increasing after every iteration. My running theory is that the heap is just more and more fragmented, and some static objects randomly scattered around it are keeping it from being shrunk. ¯\_(ツ)_/¯

CreepGin 19 hours ago | parent | prev [-]

Yeah to me, Burst+Jobs and Compute shaders are so easy to work with in Unity, I haven't felt the need to squeeze more perf out of C# in a long time.

For modding and OTA stuff I just use a scripting language with good interop (I made OneJS partially for this purpose). No more AOT issue and no more waiting for domain reload, etc.