Remix.run Logo
crystal_revenge 6 hours ago

> Either AI use surged massively in the last quarter

December 2025 is considered by many people to be a major step function in agentic coding (both due to improvements in harnesses and LLMs themselves). I know my coding has forever changed since then.

Before I was basically always hands on the keyboard while working with AI. Now I'm running experiments with multiple agents over the weekend, only periodically checking in if they have any questions or need further instruction.

The last quarter is where I personally first started to see how this was all going to change things (despite having worked on both the research and product side of AI for the last few years).

> I have a feeling they've tuned the models recently to commit more often, which gives the illusion of more work being done.

Agents certainly are committing more often, but I know, at least for these projects, there really is work being done. An example: I had an agent auto-researching a forecast I was working on. This is something I've done manually for over a decade now. The iteration process is tedious and time consuming, and would often take weeks of setting up and ultimately poorly documenting many, many experiments to see what works. Now I can "set it and forget it", and get the same results I would have in hours (with much more surface area covered and much better documentation). Each experiment is a branch (or work-tree) so yes there are a lot of commits happening, but the results are measurably real.

I often think the big divide related to the success with agents is whether or not the quality of ones work can be objectively measured. For those of us doing work that can be measured, the impact of agents is still hard to comprehend.

overfeed 4 hours ago | parent | next [-]

> Each experiment is a branch (or work-tree) so yes there are a lot of commits happening, but the results are measurably real.

If you are correct , and GitHub is scaling its compute mostly as a reaction to this externality (agents churning through code that will mostly be discarded), then you can look forward to getting billed for your usage. After all, it is hard to build a scalable system without back-pressure.

crystal_revenge 4 hours ago | parent [-]

I've already started moving my personal projects off github and onto forgejo running on my homelab. I know a lot of people doing the same. With a hermes-agent for a sysadmin I can debug problems from my phone, so I wouldn't be surprised if I have more "9s" that GH.

But if it ends up costing extra for GH, especially for work usage, then it's just a simple calculation of "is this worth it?" which I suspect for most cases will be 'yes'.

overfeed 3 hours ago | parent [-]

> [...]it's just a simple calculation of "is this worth it?" which I suspect for most cases will be 'yes'

Once the landgrab-stage flat-pricing goes away, it will become a case-by-case calculation because unsupervised agents can (and will) run up your billing with zero understanding of the business value of what they're instructed to solve.

crystal_revenge an hour ago | parent [-]

> with zero understanding of the business value

What kind of products/services are you building where you aren't able to tie your eval suite to business value? If you can't, then why are you building whatever is it you are in the first place?

By far one of the biggest changes I think we'll see in things being built by agents is reducing the gap between code and value. The first stage is to start making it possible to measure quality (evals) and the second stage is to more closely align measurable equality with value. The business value of the tokens spent on my team was discussed my first day.

> Once the landgrab-stage flat-pricing goes away

Aside from the above point, I'm already running local LLMs on my homelab that, while not quite what I want for truly production work, have been able to iterate on and solve real, non-trivial research tasks for effectively zero cost (energy cost was roughly on par with running an old light bulb).

The way open, local models have been developing there will be many cases where if proprietary providers over-charge it won't be a deal breaker to just switch to local models. Not to mention that there are plenty of open, but non-local models that are already 5x cheaper and roughly on par with the mainstream model providers.

iknowstuff 4 hours ago | parent | prev [-]

Whats your setup? How are your agents not running out of context and becoming dumb as a rock after ~100k tokens? Do you have a heartbeat thing on spawning more agents every time?

crystal_revenge 4 hours ago | parent | next [-]

The most important thing for any agentic task is to build up and continue to record context as a project develops.

The start of basically any project involves building up and documenting context around the project itself (and for a new company, the organization itself), this is kept at multiple levels of granularity (cross project, project specific, task specific, and human readable documentation). All experiments are planned out and documented as they go.

This becomes extremely important because after a weekend of running experiments stakeholders (and myself) often have questions, with everything in memory or some other stored context it's trivial to get answers to all sorts of questions.

Maybe it's because of this, but in both Claude Code and Codex I haven't run into any issues with models getting "dumb as a rock", even after compaction (or occasional full terminal crashes) they seem to have no trouble marching on.

martinald 4 hours ago | parent | prev [-]

Opus has 1M context now. In my experience it starts getting increasingly dumb after about 700k, but below that it is very usable. I don't think I've ever ran out of context window since they brought that out.