| ▲ | skylerwiernik 7 hours ago |
| This is extremely interesting how fast this happened. Either AI use surged massively in the last quarter, or this is a very sneaky move by Anthropic. Looking at my own stats, I don't think I'm using Claude Code much more than I used to, but my commits have gone way up. I have a feeling they've tuned the models recently to commit more often, which gives the illusion of more work being done. |
|
| ▲ | crystal_revenge 6 hours ago | parent | next [-] |
| > Either AI use surged massively in the last quarter December 2025 is considered by many people to be a major step function in agentic coding (both due to improvements in harnesses and LLMs themselves). I know my coding has forever changed since then. Before I was basically always hands on the keyboard while working with AI. Now I'm running experiments with multiple agents over the weekend, only periodically checking in if they have any questions or need further instruction. The last quarter is where I personally first started to see how this was all going to change things (despite having worked on both the research and product side of AI for the last few years). > I have a feeling they've tuned the models recently to commit more often, which gives the illusion of more work being done. Agents certainly are committing more often, but I know, at least for these projects, there really is work being done. An example: I had an agent auto-researching a forecast I was working on. This is something I've done manually for over a decade now. The iteration process is tedious and time consuming, and would often take weeks of setting up and ultimately poorly documenting many, many experiments to see what works. Now I can "set it and forget it", and get the same results I would have in hours (with much more surface area covered and much better documentation). Each experiment is a branch (or work-tree) so yes there are a lot of commits happening, but the results are measurably real. I often think the big divide related to the success with agents is whether or not the quality of ones work can be objectively measured. For those of us doing work that can be measured, the impact of agents is still hard to comprehend. |
| |
| ▲ | overfeed 4 hours ago | parent | next [-] | | > Each experiment is a branch (or work-tree) so yes there are a lot of commits happening, but the results are measurably real. If you are correct , and GitHub is scaling its compute mostly as a reaction to this externality (agents churning through code that will mostly be discarded), then you can look forward to getting billed for your usage. After all, it is hard to build a scalable system without back-pressure. | | |
| ▲ | crystal_revenge 3 hours ago | parent [-] | | I've already started moving my personal projects off github and onto forgejo running on my homelab. I know a lot of people doing the same. With a hermes-agent for a sysadmin I can debug problems from my phone, so I wouldn't be surprised if I have more "9s" that GH. But if it ends up costing extra for GH, especially for work usage, then it's just a simple calculation of "is this worth it?" which I suspect for most cases will be 'yes'. | | |
| ▲ | overfeed 3 hours ago | parent [-] | | > [...]it's just a simple calculation of "is this worth it?" which I suspect for most cases will be 'yes' Once the landgrab-stage flat-pricing goes away, it will become a case-by-case calculation because unsupervised agents can (and will) run up your billing with zero understanding of the business value of what they're instructed to solve. | | |
| ▲ | crystal_revenge an hour ago | parent [-] | | > with zero understanding of the business value What kind of products/services are you building where you aren't able to tie your eval suite to business value? If you can't, then why are you building whatever is it you are in the first place? By far one of the biggest changes I think we'll see in things being built by agents is reducing the gap between code and value. The first stage is to start making it possible to measure quality (evals) and the second stage is to more closely align measurable equality with value. The business value of the tokens spent on my team was discussed my first day. > Once the landgrab-stage flat-pricing goes away Aside from the above point, I'm already running local LLMs on my homelab that, while not quite what I want for truly production work, have been able to iterate on and solve real, non-trivial research tasks for effectively zero cost (energy cost was roughly on par with running an old light bulb). The way open, local models have been developing there will be many cases where if proprietary providers over-charge it won't be a deal breaker to just switch to local models. Not to mention that there are plenty of open, but non-local models that are already 5x cheaper and roughly on par with the mainstream model providers. |
|
|
| |
| ▲ | iknowstuff 4 hours ago | parent | prev [-] | | Whats your setup? How are your agents not running out of context and becoming dumb as a rock after ~100k tokens? Do you have a heartbeat thing on spawning more agents every time? | | |
| ▲ | crystal_revenge 4 hours ago | parent | next [-] | | The most important thing for any agentic task is to build up and continue to record context as a project develops. The start of basically any project involves building up and documenting context around the project itself (and for a new company, the organization itself), this is kept at multiple levels of granularity (cross project, project specific, task specific, and human readable documentation). All experiments are planned out and documented as they go. This becomes extremely important because after a weekend of running experiments stakeholders (and myself) often have questions, with everything in memory or some other stored context it's trivial to get answers to all sorts of questions. Maybe it's because of this, but in both Claude Code and Codex I haven't run into any issues with models getting "dumb as a rock", even after compaction (or occasional full terminal crashes) they seem to have no trouble marching on. | |
| ▲ | martinald 4 hours ago | parent | prev [-] | | Opus has 1M context now. In my experience it starts getting increasingly dumb after about 700k, but below that it is very usable. I don't think I've ever ran out of context window since they brought that out. |
|
|
|
| ▲ | martinald 6 hours ago | parent | prev | next [-] |
| Many things at once I suspect: 1. Models have got way better, which means you are far more likely to get something working. I know I used to have little 'tool'/'weekend projects' all the time that wouldn't get off the starting blocks before, now it takes a few minutes often to build them, and once I've built them I tend to want to have them saved on github. Quite how useful they turn out to be is another question though... 2. Related, because the models are a lot better I can generate far more code per unit time. On Sonnet last year I'd have to babysit the model and constantly 'steer' it, which meant a lot of the CC time was actually me reviewing it. Now with Opus4.7 it can often just churn away for 10-30minutes and get something reasonable. 3. Most importantly, just the volume of new users to coding agents - loads of new developers shipping far more far frequently. 4. Many users who were not on github, now signing up and pushing code to it. "Vibe coders" basically who don't have SWE experience and their agent tells them git would be a good idea. Each of these would be a big increase in scale, but combined it is vvv high |
|
| ▲ | tossandthrow 7 hours ago | parent | prev [-] |
| I don't think commits per se puts pressure on the infrastructure. More likely pulls and pushes, and, naturally, the ci minutes they identify as the main issue. |
| |
| ▲ | NewJazz 6 hours ago | parent [-] | | But CI only increased by a factor of 2 since last year. Did they really not foresee that happening? And how does that affect git and api operations. | | |
| ▲ | munk-a 6 hours ago | parent [-] | | It really shouldn't. The technical summary they released[1] is a very interesting read from a software engineering perspective. It seems to be blindsided by the increased traffic and gives stats related to commits/PRs (which should be relatively cheap for github to process) without any insight into their web traffic or details on how much actions are costing them. If they were super transparent they'd release information about their request response time and resourcing to fulfill that. Their current path to resolution is to migrate their codebase to a new language[2], continue to drop their inhouse ops for Azure resources and get off MySQL. Maybe one or two of those steps are legitimately a good idea - I don't have an inside scope - but technology migrations are always fraught with issues. It's quite possible these changes are just a result of them vibe-coding a mature codebase into a new language. 1. https://github.blog/news-insights/company-news/an-update-on-... 2. I'll grant that Ruby isn't the best language to use as scale but I think we're all old enough to realize that language choice is far less impactful on performance than code quality. | | |
| ▲ | hosh 5 hours ago | parent | next [-] | | Azure’s core hypervisor orchestrator was half-baked at launch and it has never been fixed. This long read blog series explains a lot for me — for example, why the FedRamp certification program was never able to get a straight answer from Azure about how they handled secrets. https://isolveproblems.substack.com/p/how-microsoft-vaporize... https://www.kunalganglani.com/blog/microsoft-fedramp-failure... | |
| ▲ | evanelias 5 hours ago | parent | prev | next [-] | | > migrate their codebase to a new language[2], continue to drop their inhouse ops for Azure resources and get off MySQL The recent blog post you're linking to mentioned moving data only for webhooks off MySQL, not all relational data used by the entire site; and moving "performance or scale sensitive code out of Ruby", again not the entire codebase. Do you have an official source suggesting these migrations are more comprehensive than that? | | |
| ▲ | munk-a 5 hours ago | parent [-] | | I do not know - this is the only source I'm aware of and the wording is vague enough that the above is just my interpretation of it. It could be highly targeted but the manner of wording indicates a strong preference that smells of a large migration. | | |
| ▲ | evanelias 5 hours ago | parent [-] | | What part of the wording gives you that impression? On these topics, the post literally just says the following: "bottlenecks that appeared faster than expected from moving webhooks to a different backend (out of MySQL)" "Similarly, we accelerated parts of migrating performance or scale sensitive code out of Ruby monolith into Go" (in a paragraph specifically about "critical services like git and GitHub Actions") Both of those sound highly targeted to me! | | |
| ▲ | munk-a 5 hours ago | parent [-] | | > While we were already in progress of migrating out of our smaller custom data centers into public cloud, we started working on path to multi cloud. This longer-term measure is necessary to achieve the level of resilience, low latency, and flexibility that will be needed in the future. That paragraph read, to me at least, that the initial targeted changes were just the tip of the iceberg and that much heavier lifting than initially budgeted were now in scope. | | |
| ▲ | evanelias 4 hours ago | parent [-] | | "smaller custom data centers into public cloud" is talking about their Azure migration, so "multi cloud" would almost certainly mean extending a presence into AWS and/or GCP (or maybe others like OCI). I'm sorry but I really don't see how you're drawing conclusions about this meaning a move off of Ruby and MySQL entirely. That's a huuuge logical leap away from what is written in this post, and you originally stated it in a way that indicated this was a fact. |
|
|
|
| |
| ▲ | spockz 6 hours ago | parent | prev [-] | | Re 2, I would generally agree and there is a lot that can be done with caching. However, since writing services in Rust and Golang, there is whole other tier in speed. Architecture matters, code quality also matters, but Golang and Rust help a lot in making very fast services. | | |
| ▲ | munk-a 6 hours ago | parent [-] | | Yeah I don't disagree. To clarify. Rust, Golang etc - they give you a very noticeable advantage when it comes to writing good performant software with the assumption that you're putting in the effort on the design side. But poorly written Rust is likely going to be indistinguishable from poorly written Ruby. |
|
|
|
|