| ▲ | jasonthorsness a day ago |
| “GitHub says Claude Sonnet 4 soars in agentic scenarios and will introduce it as the base model for the new coding agent in GitHub Copilot.” Maybe this model will push the “Assign to CoPilot” closer to the dream of having package upgrades and other mostly-mechanical stuff handled automatically. This tech could lead to a huge revival of older projects as the maintenance burden falls. |
|
| ▲ | rco8786 a day ago | parent | next [-] |
| It could be! But that's also what people said about all the models before it! |
| |
| ▲ | kmacdough a day ago | parent [-] | | And they might all be right! > This tech could lead to... I don't think he's saying this is the version that will suddenly trigger a Renaissance. Rather, it's one solid step that makes the path ever more promising. Sure, everyone gets a bit overexcited each release until they find the bounds. But the bounds are expanding, and the need for careful prompt engineering is diminishing. Ever since 3.7, Claude has been a regular part of my process for the mundane. And so far 4.0 seems to take less fighting for me. A good question would be when can AI take a basic prompt, gather its own requirements and build meaningful PRs off basic prompt. I suspect it's still at least a couple of paradigm shifts away. But those seem to be coming every year or faster. | | |
| ▲ | sagarpatil a day ago | parent [-] | | Did you not see the live stream? They took a feature request for excalidraw (table support) and Claude 4 worked on it for 90 minutes and the PR was working as expected. I’m not sure if they were using sonnet or opus. | | |
| ▲ | andrepd 14 hours ago | parent [-] | | Pre-prepared demos don't impress me. | | |
| ▲ | spiderfarmer 3 hours ago | parent [-] | | By that logic sport athletes don’t impress you. Movies don’t impress you. Theater doesn’t impress you. Your date won’t impress you. Becoming a parent won’t impress you. Most things in life take years of preparation. |
|
|
|
|
|
| ▲ | max_on_hn a day ago | parent | prev | next [-] |
| I am incredibly eager to see what affordable coding agents can do for open source :) in fact, I should really be giving away CheepCode[0] credits to open source projects. Pending any sort of formal structure, if you see this comment and want free coding agent runs, email me and I’ll set you up! [0] My headless coding agents product, similar to “assign to copilot” but works from your task board (Linear, Jira, etc) on multiple tasks in parallel. So far simple/routine features are already quite successful. In general the better the tests, the better the resulting code (and yes, it can and does write its own tests). |
| |
| ▲ | troupo 17 hours ago | parent | next [-] | | > I am incredibly eager to see what affordable coding agents can do for open source :) Oh, we know exactly what they will do: they will drive devs insane: https://www.reddit.com/r/ExperiencedDevs/comments/1krttqo/my... | | |
| ▲ | losvedir 6 hours ago | parent [-] | | I dunno, looking through those issues I'd be more annoyed by all the randos grandstanding in my PRs. | | |
| ▲ | troupo 5 hours ago | parent [-] | | And not all the "fix this - i fixed - no you didn't - here's the fix — there's no fix" back and forth with the AI? There's very little grandstanding in the comments. They are all very tame, all things considered. |
|
| |
| ▲ | dr_dshiv 18 hours ago | parent | prev [-] | | Especially since the EU just made open source contributors liable for cybersecurity (Cyber Resilience Act). Just let AI contribute and ur good | | |
| ▲ | hn111 17 hours ago | parent [-] | | Didn’t they make an exception for open-source projects? https://opensource.org/blog/the-european-regulators-listened... | | |
| ▲ | dr_dshiv 9 hours ago | parent | next [-] | | “Anyone opensourcing anything while in the course of ‘commercial activity’ will be fully liable. Effectively they rugpulled the Apache2 / MIT licenses... all opensource released by small businesses is fucked. where the was no red tape now there is infinite liability” This is my current understanding, from a friend not a lawyer. Would appreciate any insight from folks here. | | | |
| ▲ | andrepd 14 hours ago | parent | prev [-] | | Yeah, just the usual hn FUD about the EU. | | |
|
|
|
|
| ▲ | epolanski 15 hours ago | parent | prev | next [-] |
| > having package upgrades and other mostly-mechanical stuff handled automatically Those are already non-issues mostly solved by bots. In any case, where I think AI could help here would be by summarizing changes, conflicts, impact on codebase and possibly also conduct security scans. |
|
| ▲ | BaculumMeumEst a day ago | parent | prev | next [-] |
| Anyone see news of when it’s planned to go live in copilot? |
| |
| ▲ | vinhphm a day ago | parent | next [-] | | The option just shown up in Copilot settings page for me | | |
| ▲ | BaculumMeumEst a day ago | parent | next [-] | | Same! Rock and roll! | | | |
| ▲ | bbor a day ago | parent | prev [-] | | Turns out Opus 4 starts at their $40/mo ("Pro+") plan which is sad, and they serve o4-mini and Gemini as well so it's a bit less exclusive than this announcement implies. That said, I have a random question for any Anthropic-heads out there: GitHub says "Claude Opus 4 is hosted by Anthropic PBC. Claude Sonnet 4 is hosted by Anthropic 1P."[1]. What's Anthropic 1P? Based on the only Kagi result being a deployment tutorial[2] and the fact that GitHub negotiated a "zero retention agreement" with the PBC but not whatever "1P" is, I'm assuming it's a spinoff cloud company that only serves Claude...? No mention on the Wikipedia or any business docs I could find, either. Anyway, off to see if I can access it from inside SublimeText via LSP! [1] https://docs.github.com/en/copilot/using-github-copilot/ai-m... [2] https://github.com/anthropics/prompt-eng-interactive-tutoria... | | |
| ▲ | Workaccount2 a day ago | parent | next [-] | | Google launched Jules two days ago, which is the gemini coding agent[1]. I was pretty quickly accepted into the beta and you get 5 free tasks a day. So far I have found it pretty powerful, its also the first time an LLM has ever stopped while working to ask me a question or for clarification. [1]https://jules.google/ | |
| ▲ | l1n a day ago | parent | prev [-] | | 1P = Anthropic's first party API, e.g. not through Bedrock or Vertex |
|
| |
| ▲ | minimaxir a day ago | parent | prev [-] | | The keynote confirms it is available now. | | |
| ▲ | jasonthorsness a day ago | parent [-] | | Gotta love keynotes with concurrent immediate availability | | |
| ▲ | brookst a day ago | parent [-] | | Not if you work there | | |
| ▲ | echelon a day ago | parent [-] | | That's just a few weeks of DR + prep, a feature freeze, and oncall with bated breath. Nothing any rank and file hasn't been through before with a company that relies on keynotes and flashy releases for growth. Stressful, but part and parcel. And well-compensated. | | |
| ▲ | brookst a day ago | parent [-] | | Sometimes. When things work great. Sometimes you just hear “BTW your previously-soft-released feature will be on stage day after tomorrow, probably don’t make any changes until after the event, and expect 10x traffic” |
|
|
|
|
|
|
| ▲ | phito 5 hours ago | parent | prev | next [-] |
| I don't see how a LLM could do better than a bot, eg renovate |
|
| ▲ | ModernMech a day ago | parent | prev | next [-] |
| That's kind of my benchmark for whether or not these models are useful. I've got a project that needs some extensive refactoring to get working again. Mostly upgrading packages, but also it will require updating the code to some new language semantics that didn't exist when it was written. So far, current AI models can make essentially zero progress on this task. I'll keep trying until they can! |
| |
| ▲ | yosito a day ago | parent | next [-] | | Personally, I don't believe AI is ever going to get to that level. I'd love to be proven wrong, but I really don't believe that an LLM is the right tool for a job that requires novel thinking about out of the ordinary problems like all the weird edge cases and poor documentation that comes up when trying to upgrade old software. | | |
| ▲ | 9dev a day ago | parent | next [-] | | Actually, I think the opposite: Upgrading a project that needs dependency updates to new major versions—let’s say Zod 4, or Tailwind 3—requires reading the upgrade guides and documentation, and transferring that into the project. In other words, transforming text. It’s thankless, stupid toil. I’m very confident I will not be doing this much more often in my career. | | |
| ▲ | mikepurvis a day ago | parent | next [-] | | Absolutely, this should be exactly the kind of task a bot should be perfect for. There's no abstraction, no design work, no refactoring, no consideration of stakeholders, just finding instances of whatever is old and busted and changing it for the new hotness. | | |
| ▲ | maoberlehner 17 hours ago | parent | next [-] | | It seems logical, but still, my experience is the complete opposite. I think that it is an inherent problem with the technology. "Upgrade from Library v4 to Library v5" probably heavily triggers all the weights related to "Library," which most likely is a cocktail of all the training data from all the versions (makes me wonder how LLMs are even as good as they are at writing code with one version consistently - I assume because the weights related to a particular version become reinforced by every token matching the syntax of a particular version - and I guess this is the problem for those kinds of tasks). For the (complex) upgrade use case, LLMs fail completely in my tests. I think in this case, the only way it can succeed is by searching (and finding!) for an explicit upgrade guide that describes how to upgrade from version v4 to v5 with all the edge cases relevant for your project in it. More often than not, a guide like this just does not exist. And then you need (human?) ingenuity, not just "rename `oldMethodName` to `newMethodName` (when talking about a major upgrade like Angular 0 to Angular X or Vue 2 to Vue 3 and so on). | |
| ▲ | dvfjsdhgfv a day ago | parent | prev [-] | | So that was my conviction, too. However, in my tests it seems like upgrading to a version a model hasn't seen is for some reason problematic, in spite of giving it the complete docs, examples of new API usage etc. This happens even with small snippets, even though they can deal with large code fragments with older APIs they are very "familiar" with. | | |
| ▲ | mikepurvis a day ago | parent [-] | | Okay so less of a "this isn't going to work at all" and more just not ready for prime-time yet. |
|
| |
| ▲ | cardanome a day ago | parent | prev | next [-] | | Theoretically we don't even need AI. If semantics were defined well enough and maintainers actually concerned about and properly tracking breaking changes we could have tools that automatically upgrade our code. Just a bunch of simple scripts that perform text transformations. The problem is purely social. There are language ecosystems where great care is taken to not break stuff and where you can let your project rot for a decade or two and still come back to and it will perfectly compile with the newest release. And then there is the JS world where people introduce churn just for the sake of their ego. Maintaining a project is orders of magnitudes more complex than creating a new green field project. It takes a lot of discipline. There is just a lot, a lot of context to keep in mind that really challenges even the human brain. That is why we see so many useless rewrites of existing software. It is easier, more exciting and most importantly something to brag about on your CV. Ai will only cause more churn because it makes it easier to create more churn. Ultimately leaving humans with more maintenance work and less fun time. | | |
| ▲ | afavour a day ago | parent [-] | | > and maintainers actually concerned about and properly tracking breaking changes we could have tools that automatically upgrade our code In some cases perhaps. But breaking changes aren’t usually “we renamed methodA to methodB”, it’s “we changed the functionality for X,Y, Z reasons”. It would be very difficult to somehow declaratively write out how someone changes their code to accommodate for that, it might change their approach entirely! | | |
| ▲ | mdaniel a day ago | parent [-] | | There are programmatic upgrade tools, some projects ship them even right now https://github.com/codemod-com/codemod I think there are others in that space but that's the one I knew of. I think it's a relevant space for Semgrep, too, but I don't know if they are interested in that case |
|
| |
| ▲ | MobiusHorizons 21 hours ago | parent | prev | next [-] | | Except that for breaking changes you frequently need to know why it was done the old way in order to know what behavior it ago have after the update. | |
| ▲ | yosito a day ago | parent | prev [-] | | That assumes accurate documentation, upgrade guides that cover every edge case, and the miracle of package updates not causing a cascade of unforeseen compatibility issues. | | |
| |
| ▲ | csomar 20 hours ago | parent | prev | next [-] | | That's the easiest task for an LLM to do. Upgrading from x.y to z.y is for the most part syntax changes. The issue is that most of the documentation sucks. The LLM issue is that it doesn't have access to that documentation in the first place. Coding LLMs should interact with LSPs like humans do. You ask the LSP for all possible functions, you read the function docs and then you type from the available list of options. LLMs can in theory do that but everyone is busy burning GPUs. | |
| ▲ | dakna a day ago | parent | prev | next [-] | | Google demoed an automated version upgrade for Android libraries during I/O 2025. The agent does multiple rounds and checks error messages during each build until all dependencies work together. Agentic Experiences: Version Upgrade Agent https://youtu.be/ubyPjBesW-8?si=VX0MhDoQ19Sc3oe- | | |
| ▲ | yosito a day ago | parent [-] | | So it works in controlled and predictable circumstances. That doesn't mean it works in unknown circumstances. |
| |
| ▲ | a day ago | parent | prev [-] | | [deleted] |
| |
| ▲ | mewpmewp2 13 hours ago | parent | prev | next [-] | | I think this type of thing needs agent which has access to the documentation to read about nuances of the language and package versions, definitely a way to investigate types, interfaces. Problem is that training data has so much mixed data it can easily confuse the AI to mix up versions, APIs etc. | |
| ▲ | tmpz22 a day ago | parent | prev [-] | | And IMO it has a long way to go. There is a lot of nuance when orchestrating dependencies that can cause subtle errors in an application that are not easily remedied. For example a lot of llms (I've seen it in Gemini 2.5, and Claude 3.7) will code non-existent methods in dynamic languages. While these runtime errors are often auto-fixable, sometimes they aren't, and breaking out of an agentic workflow to deep dive the problem is quite frustrating - if mostly because agentic coding entices us into being so lazy. | | |
| ▲ | mikepurvis a day ago | parent | next [-] | | "... and breaking out of an agentic workflow to deep dive the problem is quite frustrating" Maybe that's the problem that needs solving then? The threshold doesn't have to be "bot capable of doing entire task end to end", like it could also be "bot does 90% of task, the worst and most boring part, human steps in at the end to help with the one bit that is more tricky". Or better yet, the bot is able to recognize its own limitations and proactively surface these instances, be like hey human I'm not sure what to do in this case; based on the docs I think it should be A or B, but I also feel like C should be possible yet I can't get any of them to work, what do you think? As humans, it's perfectly normal to put up a WIP PR and then solicit this type of feedback from our colleagues; why would a bot be any different? | | |
| ▲ | dvfjsdhgfv a day ago | parent [-] | | > Maybe that's the problem that needs solving then? The threshold doesn't have to be "bot capable of doing entire task end to end", like it could also be "bot does 90% of task, the worst and most boring part, human steps in at the end to help with the one bit that is more tricky". Still, the big short-term danger being you're left with code that seems to work well but has subtle bugs in it, and the long-term danger is that you're left with a codebase you're not familiar with. | | |
| ▲ | mikepurvis a day ago | parent [-] | | Being left with an unfamiliar codebase is always a concern and comes about through regular attrition, particularly if inadequate review is not in place or people are cycling in and out of the org too fast for proper knowledge transfer (so, cultural problems basically). If anything, I'd bet that agent-written code will get better review than average because the turn around time on fixes is fast and no one will sass you for nit-picking, so it's "worth it" to look closely and ensure it's done just the way you want. |
|
| |
| ▲ | jasonthorsness a day ago | parent | prev | next [-] | | The agents will definitely need a way to evaluate their work just as well as a human would - whether that's a full test suite, tests + directions on some manual verification as well, or whatever. If they can't use the same tools as a human would they'll never be able to improve things safely. | |
| ▲ | soperj a day ago | parent | prev [-] | | > if mostly because agentic coding entices us into being so lazy. Any coding I've done with Claude has been to ask it to build specific methods, if you don't understand what's actually happening, then you're building something that's unmaintainable. I feel like it's reducing typing and syntax errors, sometime it leads me down a wrong path. | | |
| ▲ | weq 21 hours ago | parent [-] | | I can just imagine it now, you launch your AI coded first product and get a bug in production, and the only way the AI can fix the bug is to re-write and deploy the app with a different library. Your then proceed to show the changelog to the CCB for approval including explaining the fix to the client trying to explain its risk profile for their signoff. "Yeh, we solved the duplicate name appearing the table issue by moving databases engines and UI frameworks to ones more suited to the task" |
|
|
|
|
| ▲ | ed_elliott_asc a day ago | parent | prev [-] |
| Until it pushes a severe vulnerability which takes a big service doen |