Remix.run Logo
andrewstuart 10 days ago

How is there so much duplication?

breve 10 days ago | parent | next [-]

They duplicate files to reduce load times. Here's how Arrowhead Game Studios themselves tell it:

https://www.arrowheadgamestudios.com/2025/10/helldivers-2-te...

imtringued 10 days ago | parent [-]

I don't think this is the real explanation. If they gave the filesystem a list of files to fetch in parallel (async file IO), the concept of "seek time" would become almost meaningless. This optimization will make fetching from both HDDs and SSDs faster. They would be going out of their way to make their product worse for no reason.

toast0 2 days ago | parent | next [-]

Solid state drives tend to respond well to parallel reads, so it's not so clear. If you're reading one at a time, sequential access is going to be better though.

But for a mechanical drive, you'll get much better throughput on sequential reads than random reads, even with command queuing. I think earlier discussion showed it wasn't very effective in this case and taking 6x the space for a marginal benefit for the small % of users with mechanical drives isn't worth while...

seg_lol 2 days ago | parent [-]

Every storage medium, including ram, benefits from sequential access. But it doesn't have to be super long sequential access, the seek time, or block open time just needs to be short relative to the next block read.

Xss3 10 days ago | parent | prev | next [-]

If they fill your harddrive youre less likely to install other games. If you see a huge install size youre less likely to uninstall with plans to reinstall later because thatd take a long time.

ukd1 2 days ago | parent [-]

Unfortunately this actually is believable. SMH.

pixl97 2 days ago | parent | prev | next [-]

>If they gave the filesystem a list of files to fetch in parallel (async file IO)

This does not work if you're doing tons of small IO and you want something fast.

Lets say were on a HDD with 200IOPS and we need to read 3000 small files randomly across the hard drive.

Well, at minimum this is going to take 15's seconds plus any additional seek time.

Now, lets say we zip up those files in a solid archive. You'll read it in half a second. The problem comes in when different levels all need different 3000 files. Then you end deduping a bunch of stuff.

Now, where this typically falls apart for modern game assets is they are getting very large which tends to negate seek times by a lot.

imtringued 2 days ago | parent [-]

I haven't found any asynchronous IOPS numbers on HDDS anywhere. The internet IOPs are just 1000ms/seek time with a 8ms seek time for moving from the outer to the inner track, which is only really relevant for the synchronous file IO case.

For asynchronous IO you can just do inward/outward passes to amortize the seek time over multiple files.

While it may not have been obvious, I have taken archiving or bundling of assets into a bigger file for granted. The obvious benefit is that the HDD knows that it should store game files continuously. This has nothing to do with file duplication though and is a somewhat irrelevant topic, because it costs nothing and only has benefits.

The asynchronous file IO case for bundled files is even better, since you can just hand over the internal file offsets to the async file IO operations and get all the relevant data in parallel so your only constraint is deciding on an optimal lower bound for the block size, which is high for HDDs and low for SSDs.

gruez 2 days ago | parent | next [-]

>I haven't found any asynchronous IOPS numbers on HDDS anywhere. The internet IOPs are just 1000ms/seek time with a 8ms seek time for moving from the outer to the inner track, which is only really relevant for the synchronous file IO case.

>For asynchronous IO you can just do inward/outward passes to amortize the seek time over multiple files.

Here's a random blog post that has benchmarks for a 2015 HDD:

https://davemateer.com/2020/04/19/Disk-performance-CrystalDi...

It shows 1.5MB/s for random 4K performance with high queue depth, which works out to just under 400 IOPS. 1 queue depth (so synchronous) performance is around a third.

pixl97 2 days ago | parent | prev [-]

>I haven't found any asynchronous IOPS numbers on HDDS anywhere.

As the other user stated, just look up Crystal Disk Info results for both HDDs and SSD and you'll see hard drives do about 1/3rd of a MBPs on random file IO while the same hard drive will do 400MBps on a contiguous read. For things like this reading a zip and decompressing in memory is "typically" (again, you have to test this) orders of magnitude faster.

jayd16 2 days ago | parent | prev | next [-]

The technique has the most impact on games running off physical disc.

It's a well known technique but happened to not be useful for their use case.

extraduder_ire 2 days ago | parent | prev | next [-]

"97% of the time: premature optimization is the root of all evil."

MLgulabio 2 days ago | parent | prev [-]

[dead]

crest 2 days ago | parent | prev | next [-]

The idea is to duplicate assets so loading a "level" is just sequential reading from the file system. It's required on optical media and can be very useful on spinning disks too. On SSDs it's insane. The logic should've been the other way around. Do a speed test on start an offer to "optimise for spinning media" if the performance metrics look like it would help.

If the game was ~20GB instead of ~150GB almost no player with the required CPU+GPU+RAM combination would be forced to put it on a HDD instead of a SSD.

immibis 2 days ago | parent [-]

This idea of one continuous block per level dates back to the PS1 days.

Hard drives are much, much faster than optical media - on the order of 80 seeks per second and 300 MB/s sequential versus, like, 4 seeks per second and 60 MB/s sequential (for DVD-ROM).

You still want to load sequential blocks as much as possible, but you can afford to have a few. (Assuming a traditional engine design, no megatextures etc) you probably want to load each texture from a separate file, but you can certainly afford to load a block of grass textures, a block of snow textures, etc. Also throughput is 1000x higher than a PS1 (300 kB/s) so you can presumably afford to skip parts of your sequential runs.

immibis a day ago | parent [-]

I meant to write that you probably DON'T want to load each texture from a separate file, but it would be fine to have them in blocks.

jy14898 10 days ago | parent | prev [-]

The post stated that it was believed duplication improved loading times on computers with HDDs rather than SSDs

dontlaugh 10 days ago | parent | next [-]

Which is true. It’s an old technique going back to CD games consoles, to avoid seeks.

SergeAx 10 days ago | parent [-]

Is it really possible to control file locations on HDD via Windows NTFS API?

dontlaugh 10 days ago | parent | next [-]

No, not at all. But by putting every asset a level (for example) needs in the same file, you can pretty much guarantee you can read it all sequentially without additional seeks.

That does force you to duplicate some assets a lot. It's also more important the slower your seeks are. This technique is perfect for disc media, since it has a fixed physical size (so wasting space on it is irrelevant) and slow seeks.

viraptor 2 days ago | parent [-]

> by putting every asset a level (for example) needs in the same file, you can pretty much guarantee you can read it all sequentially

I'd love to see it analysed. Specifically, the average number of nonseq jumps vs overall size of the level. I'm sure you could avoid jumps within megabytes. But if someone ever got closer to filling up the disk in the past, the chances of contiguous gigabytes are much lower. This paper effectively says that if you have long files, there's almost guaranteed gaps https://dfrws.org/wp-content/uploads/2021/01/2021_APAC_paper... so at that point, you may be better off preallocating the individual does where eating the cost of switching between them.

toast0 2 days ago | parent | next [-]

From that paper, table 4, large files had an average # of fragments around 100, but a median of 4 fragments. A handful of fragments for a 1 GB level file is probably a lot less seeking than reading 1 GB of data out of a 20 GB aggregated asset database.

But it also depends on how the assets are organized, you can probably group the level specific assets into a sequential section, and maybe shared assets could be somewhat grouped so related assets are sequential.

dontlaugh 2 days ago | parent | prev | next [-]

Sure. I’ve seen people that do packaging for games measure various techniques for hard disks typical of the time, maybe a decade ago. It was definitely worth it then to duplicate some assets to avoid seeks.

Nowadays? No. Even those with hard disks will have lots more RAM and thus disk cache. And you are even guaranteed SSDs on consoles. I think in general no one tries this technique anymore.

wcoenen 2 days ago | parent | prev | next [-]

> But if someone ever got closer to filling up the disk in the past, the chances of contiguous gigabytes are much lower.

By default, Windows automatically defragments filesystems weekly if necessary. It can be configured in the "defragment and optimize drives" dialog.

pixl97 2 days ago | parent [-]

Not 'full' de-fragmentation, Microsoft labs did a study and after 64MB slabs of contiguous files you don't gain much so they don't care about getting gigabytes fully defragmented.

https://web.archive.org/web/20100529025623/http://blogs.tech...

old article on the process

justsomehnguy 2 days ago | parent | prev | next [-]

> But if someone ever got closer to filling up the disk in the past, the chances of contiguous gigabytes are much lower

Someone installing a 150GB game sure do have 150GB+ of free space and there would be a lot of continuous free space.

jayd16 2 days ago | parent | prev [-]

It's an optimistic optimization so it doesn't really matter if the large blobs get broken up. The idea is that it's still better than 100k small files.

toast0 2 days ago | parent | prev [-]

Not really. But when you write a large file at once (like with an installer), you'll tend to get a good amount of sequential allocation (unless your free space is highly fragmented). If you load that large file sequentially, you benefit from drive read ahead and OS read ahead --- when the file is fragmented, the OS will issue speculative reads for the next fragment automatically and hide some of the latency.

If you break it up into smaller files, those are likely to be allocated all over the disk; plus you'll have delays on reading because windows defender makes opening files slow. If you have a single large file that contains all resources, even if that file is mostly sequential, there will be sections that you don't need, and read ahead cache may work against you, as it will tend to read things you don't need.

pjc50 2 days ago | parent | prev | next [-]

Key word is "believed". It doesn't sound like they actually benchmarked.

wongogue 2 days ago | parent [-]

There is nothing to believe. Random 4K reads for HDD is slow.

debugnik 2 days ago | parent [-]

I assume asset reads nowadays are much heavier than 4 kB though, specially if assets meant to be loaded together are bundled together in one file. So games now should be spending less time seeking relative to their total read size. Combined with HDD caches and parallel reads, this practice of duplicating over 100 GBs across bundles is most likely a cargo-cult by now.

Which makes me think: Has there been any advances in disk scheduling in the last decade?

khannn 2 days ago | parent | prev [-]

Who cares? I've installed every graphically intensive game on SSDs since the original OCZ Vertex was released.

teamonkey 2 days ago | parent [-]

Their concern was that one person in a squad loading on HDD could slow down the level loading for all players in a squad, even if they used a SSD, so they used a very normal and time-tested optimisation technique to prevent that.

khannn 2 days ago | parent [-]

Their technique makes it so that the normal person with a ~base SSD of 512 GB can't reasonably install the game. Heck of a job Brownie.

teamonkey 2 days ago | parent [-]

Nonsense. I play it on a 512GB SSD and it’s fine.

khannn 13 hours ago | parent [-]

It's hard for me to use a laptop with win11 and one game (BG3) installed on a 512 GB SSD.