Remix.run Logo
IshKebab 5 days ago

LFS is bad. The server implementations suck. It conflates object contents with the storage method. It's opt-in, in a terrible way - if you do the obvious thing you get tiny text files instead of the files you actually want.

I dunno if their solution is any better but it's fairly unarguable that LFS is bad.

jayd16 5 days ago | parent | next [-]

It does seem like this proposal has exactly the same issue. Unless this new method blocks cloning when unable to access the promisors, you'll end up with similar problems of broken large files.

cowsandmilk 5 days ago | parent [-]

How so? This proposal doesn’t require you to run `git lfs install` to get the correct files…

jayd16 5 days ago | parent | next [-]

If the architecture is irrelevant and it's just a matter of turning it on by default they could have done that with LFS long ago.

thayne 5 days ago | parent [-]

Git lfs can't do it by default because:

1. It is a separate tool that has to be installed separately from git

2. It works by using git filters and git hooks, which need to be set up locally.

Something built in to git doesn't have those problems.

xg15 5 days ago | parent [-]

But then they could have just taken the LFS plugin and made it a core part of git, if that were the only problems.

thayne 2 days ago | parent [-]

If it didn't have those problems, it wouldn't really be git lfs, it would be something else.

vlovich123 5 days ago | parent | prev [-]

And what happens when an object is missing from the cloud storage or that storage has been migrated multiple times and someone turns down the old storage that’s needed for archival versions?

atq2119 5 days ago | parent [-]

You obviously get errors in that case, which is not great.

But GP's point was that there is an entire other category of errors with git-lfs that are eliminated with this more native approach. Git-lfs allows you to get into an inconsistent state e.g. when you interrupt a git action that just doesn't happen with native git.

jayd16 5 days ago | parent [-]

It's yet to be seen what it actually eliminates and what they're willing to actually enable by default.

The architecture does seem to still be in the general framing of "treat large files as special and host them differently." That is the crux of the problem in the first place.

I think it would shock no one to find that the official system also needs to be enabled and also falls back to a mode where it supports fetching and merging pointers without full file content.

I do hope all the UX problems will be fixed. I just don't see them going away naturally and we have to put our trust in the hope that the git maintainers will make enjoyable, seamless and safe commands.

ozim 5 days ago | parent | prev [-]

I think maybe not storing large files in repo but managing those separately.

Mostly I did not run into such use case but in general I don’t see any upsides trying to shove some big files together with code within repositories.

tsimionescu 5 days ago | parent [-]

That is a complete no-go for many use cases. Large files can have exactly the same use cases as your code: you need to branch them, you need to know when and why they changed, you need to check how an old build with an old version of the large file worked, etc. Just because code tends to be small doesn't mean that all source files for a real program are going to be small too.

ozim 5 days ago | parent [-]

Yeah but GIT is not the tool for that.

That is why I don’t understand why people „need to use GIT”.

You still can make something else like keeping versions and keeping track of those versions in many different ways.

You can store a reference in repo like a link or whatever.

da_chicken 4 days ago | parent | next [-]

A version control system is a tool for managing a project, not exclusively a tool for managing source code.

Wanting to split up the project into multiple storage spaces is inherently hostile to managing the project. People want it together because it's important that it stays together as a basic function of managing a project of digital files. The need to track and maintain digital version numbers and linking them to release numbers and build plans is just a requirement.

That's what actual, real projects demand. Any projects that involve digital assets is going to involve binary, often large, data files. Any projects that involve large tables of pre-determined or historic data will involve large files that may be text or binary which contain data the project requires. They won't have everything encompassed by the project as a text file. It's weird when that's true for a project. It's a unique situation to the Linux kernel because it, somewhat uniquely, doesn't have graphics or large, predetermined data blocks. Well, not all projects that need to be managed by git share 100% of the attributes of the Linux kernel.

This idea that everything in a git project must be all small text file is incredibly bizarre. Are you making a video game? A website? A web application? A data driven API? Does it have geographic data? Does it required images? Video? Music or sound? Are you providing static documentation that must be included?

So the choices are:

1. Git is useful general purpose VCS for real world projects. 2. Git does not permit binary or large files.

Tracking versioning on large files is not some massively complex problem. Not needing to care about diffing and merging simplifies how those files are managed.

ozim 4 days ago | parent | next [-]

That’s what I disagree with. For me Git is for managing source code. Everything else is trying to fit square peg through round hole.

There are other tools for managing projects and better ways to version large files or binary assets.

Git is great at handling text changes and that’s it. It sucks with binary blobs.

biggusdickus69 4 days ago | parent | prev [-]

Git is an scm, not a vcs. By design.

IshKebab 5 days ago | parent | prev | next [-]

> Yeah but GIT is not the tool for that.

Yes because Git currently is not good at tracking large file. That's not some fundamental property of Git; it can be improved.

Btw it isn't GIT.

ozim 4 days ago | parent [-]

I care to disagree on it being an improvement or not being good at tracking large files being flaw of Git.

rcxdude 3 days ago | parent [-]

ok, but does it affect you if it also addresses other people's use-cases?

tsimionescu 5 days ago | parent | prev | next [-]

The important point is that you don't want two separate histories. Maybe if your use case is very heavy on large files, you can choose a different SCM, which is better at this use case (SVN, Perforce). But using different SCMs for different files is a recipe for disaster.

mafuy 5 days ago | parent | prev | next [-]

Git is the right tool. It's just bad at this job.

jayd16 5 days ago | parent | prev [-]

That's pretty much what git LFS is...