Remix.run Logo
jauer 5 days ago

TFA asserts that Git LFS is bad for several reasons including because proprietary with vendor lock-in which I don't think is fair to claim. GitHub provided an open client and server which negates that.

LFS does break disconnected/offline/sneakernet operations which wasn't mentioned and is not awesome, but those are niche workflows. It sounds like that would also be broken with promisors.

The `git partial clone` examples are cool!

The description of Large Object Promisors makes it sound like they take the client-side complexity in LFS, move it server-side, and then increases the complexity? Instead of the client uploading to a git server and to a LFS server it uploads to a git server which in turn uploads to an object store, but the client will download directly from the object store? Obviously different tradeoffs there. I'm curious how often people will get bit by uploading to public git servers which upload to hidden promisor remotes.

IshKebab 5 days ago | parent | next [-]

LFS is bad. The server implementations suck. It conflates object contents with the storage method. It's opt-in, in a terrible way - if you do the obvious thing you get tiny text files instead of the files you actually want.

I dunno if their solution is any better but it's fairly unarguable that LFS is bad.

jayd16 5 days ago | parent | next [-]

It does seem like this proposal has exactly the same issue. Unless this new method blocks cloning when unable to access the promisors, you'll end up with similar problems of broken large files.

cowsandmilk 5 days ago | parent [-]

How so? This proposal doesn’t require you to run `git lfs install` to get the correct files…

jayd16 5 days ago | parent | next [-]

If the architecture is irrelevant and it's just a matter of turning it on by default they could have done that with LFS long ago.

thayne 5 days ago | parent [-]

Git lfs can't do it by default because:

1. It is a separate tool that has to be installed separately from git

2. It works by using git filters and git hooks, which need to be set up locally.

Something built in to git doesn't have those problems.

xg15 5 days ago | parent [-]

But then they could have just taken the LFS plugin and made it a core part of git, if that were the only problems.

thayne 2 days ago | parent [-]

If it didn't have those problems, it wouldn't really be git lfs, it would be something else.

vlovich123 5 days ago | parent | prev [-]

And what happens when an object is missing from the cloud storage or that storage has been migrated multiple times and someone turns down the old storage that’s needed for archival versions?

atq2119 5 days ago | parent [-]

You obviously get errors in that case, which is not great.

But GP's point was that there is an entire other category of errors with git-lfs that are eliminated with this more native approach. Git-lfs allows you to get into an inconsistent state e.g. when you interrupt a git action that just doesn't happen with native git.

jayd16 5 days ago | parent [-]

It's yet to be seen what it actually eliminates and what they're willing to actually enable by default.

The architecture does seem to still be in the general framing of "treat large files as special and host them differently." That is the crux of the problem in the first place.

I think it would shock no one to find that the official system also needs to be enabled and also falls back to a mode where it supports fetching and merging pointers without full file content.

I do hope all the UX problems will be fixed. I just don't see them going away naturally and we have to put our trust in the hope that the git maintainers will make enjoyable, seamless and safe commands.

ozim 5 days ago | parent | prev [-]

I think maybe not storing large files in repo but managing those separately.

Mostly I did not run into such use case but in general I don’t see any upsides trying to shove some big files together with code within repositories.

tsimionescu 5 days ago | parent [-]

That is a complete no-go for many use cases. Large files can have exactly the same use cases as your code: you need to branch them, you need to know when and why they changed, you need to check how an old build with an old version of the large file worked, etc. Just because code tends to be small doesn't mean that all source files for a real program are going to be small too.

ozim 5 days ago | parent [-]

Yeah but GIT is not the tool for that.

That is why I don’t understand why people „need to use GIT”.

You still can make something else like keeping versions and keeping track of those versions in many different ways.

You can store a reference in repo like a link or whatever.

da_chicken 4 days ago | parent | next [-]

A version control system is a tool for managing a project, not exclusively a tool for managing source code.

Wanting to split up the project into multiple storage spaces is inherently hostile to managing the project. People want it together because it's important that it stays together as a basic function of managing a project of digital files. The need to track and maintain digital version numbers and linking them to release numbers and build plans is just a requirement.

That's what actual, real projects demand. Any projects that involve digital assets is going to involve binary, often large, data files. Any projects that involve large tables of pre-determined or historic data will involve large files that may be text or binary which contain data the project requires. They won't have everything encompassed by the project as a text file. It's weird when that's true for a project. It's a unique situation to the Linux kernel because it, somewhat uniquely, doesn't have graphics or large, predetermined data blocks. Well, not all projects that need to be managed by git share 100% of the attributes of the Linux kernel.

This idea that everything in a git project must be all small text file is incredibly bizarre. Are you making a video game? A website? A web application? A data driven API? Does it have geographic data? Does it required images? Video? Music or sound? Are you providing static documentation that must be included?

So the choices are:

1. Git is useful general purpose VCS for real world projects. 2. Git does not permit binary or large files.

Tracking versioning on large files is not some massively complex problem. Not needing to care about diffing and merging simplifies how those files are managed.

ozim 4 days ago | parent | next [-]

That’s what I disagree with. For me Git is for managing source code. Everything else is trying to fit square peg through round hole.

There are other tools for managing projects and better ways to version large files or binary assets.

Git is great at handling text changes and that’s it. It sucks with binary blobs.

biggusdickus69 4 days ago | parent | prev [-]

Git is an scm, not a vcs. By design.

IshKebab 5 days ago | parent | prev | next [-]

> Yeah but GIT is not the tool for that.

Yes because Git currently is not good at tracking large file. That's not some fundamental property of Git; it can be improved.

Btw it isn't GIT.

ozim 4 days ago | parent [-]

I care to disagree on it being an improvement or not being good at tracking large files being flaw of Git.

rcxdude 3 days ago | parent [-]

ok, but does it affect you if it also addresses other people's use-cases?

tsimionescu 5 days ago | parent | prev | next [-]

The important point is that you don't want two separate histories. Maybe if your use case is very heavy on large files, you can choose a different SCM, which is better at this use case (SVN, Perforce). But using different SCMs for different files is a recipe for disaster.

mafuy 5 days ago | parent | prev | next [-]

Git is the right tool. It's just bad at this job.

jayd16 5 days ago | parent | prev [-]

That's pretty much what git LFS is...

AceJohnny2 5 days ago | parent | prev | next [-]

Another way that LFS is bad, as I recently discovered, is that the migration will pollute the `.gitattributes` of ancestor commits that do not contain the LFS objects.

In other words, if you migrate a repo that has commits A->B->C, and C adds the large files, then commits A & B will gain a `.gitattributes` referring to the large files that do not exist in A & B.

This is because the migration function will carry its ~gitattributes structure backwards as it walks the history, for caching purposes, and not cross-reference it against the current commit.

actinium226 5 days ago | parent [-]

That doesn't sound right. There's no way it's adding a file to previous commits, that would change the hash and thereby break a lot of things.

AceJohnny2 5 days ago | parent [-]

`git lfs migrate ` rewrites the commits to convert large files in the repo to/from LFS pointers, so yes it does change the hashes. That's a well-documented effect.

https://github.com/git-lfs/git-lfs/blob/main/docs/man/git-lf...

Now, granted, usually people run migrate to only convert new local commits, so by nature of the ref include/exclude system it will not touch older commits. But in my case I was converting an entire repo into one using LFS. I hoped it would preserve those commits in a base branch that didn't contain large files, but my disappointment was said .gitattributes pollution.

actinium226 5 days ago | parent [-]

From the documentation, like 2 paragraphs in:

> In all modes, by default git lfs migrate operates only on the currently checked-out branch, and only on files (of any size and type) added in commits which do not exist on any remote. Multiple options are available to override these defaults.

Were your remotes not configured correctly?

AceJohnny2 5 days ago | parent [-]

Let me repeat myself:

> But in my case I was converting an entire repo into one using LFS.

then check out the section in the manual "INCLUDE AND EXCLUDE REFERENCES"

actinium226 4 days ago | parent [-]

OK, and your main complaint was that it added .gitattributes to all previous commits? What if someone were to go back and add a .bin to the earlier commits, you would still want it in LFS, right? I'm not sure what "cross referencing to the current commit" would mean in that case. I don't see why you would want to use the .gitattributes from a different branch, like main or something. It seems very un-git-like for an operation to reference another branch without being explicitly told to do so.

But anyway, yes, LFS rewrites history if you want to apply it to history. I agree it's sub-par; it's disruptive and risks breaking links to specific githashes.

rcxdude 3 days ago | parent [-]

The issue is the migration is unlike starting to use it on a repo, because the metadata propagates 'backwards in time' instead of reflecting what is actually in the repo at that commit.

gradientsrneat 5 days ago | parent | prev | next [-]

> LFS does break disconnected/offline/sneakernet operations which wasn't mentioned and is not awesome

Yea, I had the same thought. And TBD on large object promisors.

Git annex is somewhat more decentralized as it can track the presence of large files across different remotes. And it can pull large files from filesystem repos such as USB drives. The downside is that it's much more complicated and difficult to use. Some code forges used to support it, but support has since been dropped.

cma 5 days ago | parent | prev [-]

Git LFS didn't work with SSH, you had to get an SSL cert which github knew was a barrier for people self hosting at home. I think gitlab got it patched for SSH finally though.

remram 5 days ago | parent [-]

letsencrypt launched 3 years before git-lfs

cma 5 days ago | parent | next [-]

That's already a domain name and a more complicated setup without a public static IP in home environments, and in corporate environments now you're dealing with a whole process etc. that might be easier to get through by.. paying out for github LFS.

I think it is a much bigger barrier than ssh and have seen it be one on short timeline projects where it's getting set up for the first time and they just end up paying github crazy per GB costs, or rat nests of tunnels vpn configurations for different repos to keep remote access with encryption with a whole lot more trouble than just an ssh path.

IndrekR 5 days ago | parent | prev [-]

Letsencrypt was founded 2012, but become available in the wild December 2015. git-lfs mid-2014. So same era in general.

remram 3 days ago | parent [-]

You're right, I had the wrong date for LFS on GitHub.