For literally decades, I’ve observed that there are systems that make each operation cheap and systems that work hard to scale out. The former frequently seems to wildly outperform the latter.

GitHub, for example, seems to implement the main repository /pulls page as a search query, which is hinted at by the prefilled search bar and was mostly confirmed last week when the search backend failed and pull requests didn’t load. But it could have been implemented as a plain API call that just loads open pull requests, and that API exists and did not go down.

If GitHub focused a bit on identifying their top 95% of high level operations (page loads including resulting API calls, for example) and making them efficient, I bet they could get a 5x or better reduction in backend load by simplifying them.

(Don’t even get me started on the diff viewer. I realize that much of its awfulness is the horribly inefficient front end, which does not directly load the back end, but I expect there is plenty of room for improvement. The plain git command line features are very fast.)

▲

mnky9800n 7 hours ago | parent | next [-]

Are you telling me you don’t want a chat interface to greet you when you log in to GitHub?

	▲	amluto 6 hours ago \| parent [-]
		That’s sort of orthogonal. But if GitHub actually invoked an LLM on initial page load, that would be about par for the course, and it would be amusing for GitHub to then complain that they’ve grown so quickly that their systems can’t keep up.

▲

stabbles 6 hours ago | parent | prev | next [-]

I noticed the same https://news.ycombinator.com/item?id=47940213. My working hypothesis is that, given that a filter was always required (prs and issues are likely rows in the same database with a bool property to distinguish them), someone thought it'd be good to use the search API uniformly. But search is on the derivative of the underlying data, in contrast to the specific APIs for listing issues and prs.

▲

munk-a 6 hours ago | parent [-]

Working in an organization without a mono-repository I've actually found it extremely difficult to keep a tab on PRs and issues across multiple repositories. For a problem that should be resolved by a "For me" page that just lists out all your active incoming and outgoing PRs their multi-page solution involving search filters that often need to be reset feels extremely weak. I've worked on large multi-tenant solutions before and a page where you can "SELECT * FROM everything LIMIT 10" is the absolute last thing you want to give to users.

It is bizarre to me that so much of their tooling defaults to acting across the whole of github data points without guiding the user towards (or even making available as far as I can tell) a way to easily scope requests down outside of a complex search filter.

▲

davideg 5 hours ago | parent [-]

Do you mean like https://github.com/pulls and https://github.com/issues ?

These are in the top left hamburger menu from the Home dashboard (edit: actually on all pages).

▲

munk-a 5 hours ago | parent [-]

Hey, that's awesome and nevermind me. I just got stumbled by their UI.

There's probably a fair argument about how discoverable these are (especially given their labeling as "All Issues" and "All Pull Requests") but that tip is quite helpful to me personally. Thanks for sharing it, I really appreciate it!

▲

amluto 2 hours ago | parent [-]

And yet these are still (apparently) implemented as search queries instead of direct database queries.

	▲	munk-a 2 hours ago \| parent [-]
		There may be some magic they do to better optimize within-user-searching. It's something that they could hide in implementation details so we can't be sure unless they spill the beans but it's feasible - especially with the default search parameters they're using. I'd still love something a bit more obvious and intuitive but if it's just a UX failure that makes me feel a lot better.

▲

wavemode 7 hours ago | parent | prev | next [-]

Git itself is kind of a fundamentally computationally inefficient way to store and retrieve information. If the problem to solve were simply "store and version this text", 14 billion commits in a year would not even be considered a lot.

In other words, a centralized version control system built from the ground up to operate at scale would do far more for scalability than anything GitHub could possibly do to optimize their Git operations. Every major tech company (Amazon, Meta, Google, etc) is already doing something like this internally.

Though this would require people to start using a github-specific client rather than the traditional git+ssh. (Though the github client could still maintain a git repo locally, for compat.)

▲

munk-a 6 hours ago | parent | next [-]

I can guarantee you one thing - github's problem isn't coming from git.

Considering all the ci/cd pipelines, PR & issue discussions, social media tracking, rich data and else that github hosts if their true issue is the actual meat and potatoes of running git I would be gobsmacked.

▲

stabbles 6 hours ago | parent | prev [-]

What are you referring to when you say it's "fundamentally computationally inefficient"? It's pretty efficient because it's content-addressed, plus optimizations to reduce storage and data transfer with packfiles.

	▲	galangalalgol 6 hours ago \| parent [-]
		I suspect they were referring to some of the things git allows for non centralized version control. There are simplifications if you just wanted a centralized system like cvs had.

▲

the_sleaze_ 7 hours ago | parent | prev [-]

I think you need to broaden your focus here - I can't really remember any significant downtime before the Microsoft acquisition and the data supports my memories.

Microsoft bought Github and migrated to Azure, which is explains the findings. The query performance was fine before they started serving from Azure.

I mean honestly, as though there isn't one single person competent enough to read some logs and horizontally scale a few read only dbs to meet demand? That's not it

▲

AlexB138 7 hours ago | parent | next [-]

> I think you need to broaden your focus here - I can't really remember any significant downtime before the Microsoft acquisition and the data supports my memories.

This is the opposite of my recollection, actually. I distinctly remember having conversations about Github struggling to scale well before MS was involved, and people claiming that MS had somehow saved Github because it had stabilized and begun adding features again.

> The query performance was fine before they started serving from Azure.

This may be correct though. The Azure migration seems more aligned with the timeline of struggling to scale.

	▲	the_sleaze_ 5 hours ago \| parent [-]
		> I distinctly remember having conversations about Github struggling to scale well before MS was involved Do you have any sources to back your claim up? At what point did Github fail to scale their search endpoints? > This may be correct It is.

▲

nvme0n1p1 6 hours ago | parent | prev | next [-]

I don't know why this is downvoted. The data backs you up: https://damrnelson.github.io/github-historical-uptime/

▲

evanelias 5 hours ago | parent [-]

I'm skeptical about that page's accuracy. For example, if you go to the breakdown tab, it shows Actions having 100% availability when the graph starts (Apr 2016), yet Actions didn't even exist until late 2018, and wasn't GA until a full year after that. So if the math behind the "average" tab is treating NULLs as 100% uptime, this just isn't a correct measurement.

The page also notes it obtains its data from the official status page, but big tech companies have been known to under-report outages. My general sense is they've gotten better about this in recent years; if so, that means historical data will give an erroneously rosy picture of uptime.

▲

the_sleaze_ 5 hours ago | parent [-]

I think we can agree the data is correct enough to ascribe a trend with a strong statistical significance no? Enough to draw a conclusion

▲

evanelias 5 hours ago | parent [-]

We can clearly draw a conclusion that their availability is getting worse, but that's not what your original comment claimed.

You said "I can't really remember any significant downtime before the Microsoft acquisition and the data supports my memories", but my memories differ (as do other commenters), and the accuracy of the supporting data seems questionable.

	▲	the_sleaze_ 5 hours ago \| parent [-]
		ok.

▲

philistine 7 hours ago | parent | prev [-]

I mean, are any of the other forges, which I presume are also seeing logarithmic increase in commits, also failing as hard as Github?

	▲	the_sleaze_ 5 hours ago \| parent [-]
		I totally agree, you should expect a similar increase and degradation in Gitlab which we do not.