Remix.run Logo
simonw 3 days ago

This is a pretty sophisticated setup. I particularly like how it uses Tailscale.

I've been using the simpler but not as flexible alternative: I'm running Claude Code for web (Anthropic's version of Codex Cloud) via the Claude iPhone app, with an environment I created called "Everything" which allows all network access.

(This is moderately unsafe if you're working with private source code or environment variables containing API keys and other secrets, but most of my stuff is either open source or personal such that I don't care if the source code leaks.)

Anthropic run multiple ~21GB VMs for me on-demand to handle sessions that I start via the app. They don't charge anything extra for VM time which is nice.

I frequently have 2-3 separate Claude Code for web sessions running at once, often prompted from my phone, some of them started while I'm out walking the dog. Works really well!

elpalek 3 days ago | parent | next [-]

I don't like claude code web due to its lack of planning mode. I found the result is often lackluster compare to claude code cli.

My current setup: Tailscale + Terminus(ipad) + home machine(code base)

Need to look into how to work on multiple features at the same time next.

LatencyKills 3 days ago | parent | next [-]

I've been using git worktrees with Claude and it's pretty awesome:

https://www.youtube.com/watch?v=up91rbPEdVc

Pair worktrees with the ralph-wiggum plugin and I can have Claude work for hours without needing any input:

https://looking4offswitch.github.io/blog/2026/01/04/ralph-wi...

scubbo 3 days ago | parent [-]

Worktrees took way too much setup and hand-holding for me, but https://conductor.build made it easy!

nikcub 3 days ago | parent | next [-]

I delayed adopting conductor because I had my own worktree + pr wrappers around cc but I tried it over the holidays and wow. The combination of claude + codex + conductor + cc on the web and claude in github can be so insanely productive.

I spend most of my time updating the memory files and reviewing code and just letting a ton of tasks run in parallel

behnamoh 3 days ago | parent | prev | next [-]

software is all about wrappers, isn't it? :)

conductor -> multiple claude codes/codexes -> multiple agents -> multiple tools/skills/sub-agents -> LLMs

thelittleone 2 days ago | parent | prev [-]

Sadly only allows sign up with Github.

simonw 3 days ago | parent | prev | next [-]

I haven't missed planning mode myself. I tend to tell it "write a detailed plan first in a file called spec.md for me to review", then use that as the ongoing plan.

I like that it ends up in the repo as it means it survives compaction or lets me start a fresh session entirely.

s900mhz 3 days ago | parent | next [-]

I was doing the same, but recently I noticed that Claude now writes its plans to a markdown file somewhere nested in the ~/.claude/plans directory. It will carry a reference to it through compaction. Basically mimicking my own workflow!

This can be customized via a shell env variable that I cannot remember ATM.

The downside (upside?) is that the plan will not end up in your repo. Which sometimes I want. I love the native plan mode though.

dbbk 2 days ago | parent | prev [-]

Plans in plan mode also survive compaction

dbbk 2 days ago | parent | prev | next [-]

The lack of Plan Mode is puzzling, I'm sure they must get to it at some point. But until then it CAN still plan, you just have to ask it to write a plan and not write code yet.

eclipxe 3 days ago | parent | prev | next [-]

I've been really impressed with https://github.com/BloopAI/vibe-kanban to do this. Really really impressed.

bakies 2 days ago | parent | prev | next [-]

Not sure if this works in claude code web, but running non-interactive claude code I can still get it to use plan mode by simply asking it. It's just a tool call.

nobodywillobsrv 3 days ago | parent | prev [-]

Can you not use PAL MCP for this? Have one top agent as controller etc? It's not ideal but it feels like the space of multi agent stuff is evolving ... I notice that there are a lot of posts on hn about these things so we are trying to do the same thing really.

scubbo 3 days ago | parent | prev | next [-]

I'm surprised to see people getting value from "web sandbox"-type setups, where you don't actually have access to the source code. Are folks really _that_ confident in LLMs as to entirely give up the ability to inspect the source code, or to interact with a running local instance of the service? Certainly that would be the ideal, but I'm surprised that confidence is currently running that high.

simonw 3 days ago | parent | next [-]

I still get the full source code back at the end, I tell it to include code it wrote in the PR.

I also wrote my own tool to extract and format the complete transcript, it gives me back things like this where I can see everything it did including files and scripts it didn't commit. Here's an example: https://gistpreview.github.io/?3a76a868095c989d159c226b7622b...

scubbo 3 days ago | parent [-]

Oh fascinating - so you're reviewing "your own" code in-PR, rather than reviewing it before PR submission? I can see that working! Feels weird, but I can see it being a reasonable adaptation to these tools - thanks!

What about running services locally for manual testing/poking? Do you open ports on the Anthropic VM to serve the endpoints, or is manual testing not part of your workflow?

simonw 3 days ago | parent | next [-]

Yeah, I generally use PRs for anything a coding agent writes for me.

If something is too fiddly to test within the boundaries of a cloud coding agent I switch to my laptop. Claude Code for web has a "claude --teleport" command for this, or I'll sometimes just do a "gh pr checkout X" to get the branch locally.

scubbo 3 days ago | parent [-]

Much obliged, thank you!

bakies 2 days ago | parent | prev [-]

Yeah the commits that claude code generate are co-authored by claude@anthropic.com so i just open a PR to see the code. I have automatic per-PR dev environments for manual testing.

mewpmewp2 2 days ago | parent [-]

What hosting do you use for automatic per PR dev environments?

bakies 2 days ago | parent [-]

i run it in my homelab k8s cluster

mewpmewp2 2 days ago | parent [-]

What kind of homelab do you have? And how do you do routing, do you have some sort of DNS setup too?

bakies 2 days ago | parent [-]

just some old pcs, pfsense and tailscale for routing, and external-dns in kubernetes to manage that

mewpmewp2 a day ago | parent [-]

These are all some excellent ideas, I need to setup these things asap since I've been going back and forth on having more homelab vs cloud providers, but I'm only hearing about tailscale right now so I got to go for it. Cloud providers all of sudden becoming costly just for my side projects and/or not providing the exact PR environments like I would like etc. I've been wasting so much time on trying to automate AI Agents vs cloud providers with limited conf.. It would be great if AI Agents can just write the config for all deployments, pipelines, standards, without me having to go to any UI to tweak things manually etc.

Even with GitHub CI now all of sudden it wasted $50 on few days of CI actions. Should have everything run on my home server. But I think I may need more powerful home server, I have a cheap Dell refurbished one now.

I don't want to ever have to touch a UI again (except in places like Hackernews or the like) and the ones I specially built (read: vibecoded) for myself.

bakies a day ago | parent [-]

Yes my primary motivation for putting so much effort into a self-hosted cloud was cost. Managed Kubernetes instances are very expensive. I've saved a ton of money hosting it myself for side projects. With the benefit that spending $2k on a framework desktop one time to use as a k8s node means I have a much, much larger cluster than I'd be willing to pay for on a month to month basis. It might pay for itself in a single month. It's my opinion that Kubernetes can do anything the clouds can, so I just run talos on the old PCs, the only thing they do is run Kubernetes. Cloud hosting is insanely expensive.

I do have a managed Kubernetes instance that I run public services on (like for webhooks from github) so I dont need to open my home ports. It's very small to keep costs low. The benefit of using Kubernetes at home is most of my configs need minor changes to work on the managed k8s instance, so there's not much duplicate work to get features/software deployed there. It's the great cloud agnosticator after all!

I've started my own web interface for Claude Code to host it in the same cluster. That's where the CI builds happen, the PR envs get deployed. It just has a service account with read-only access to all that so it can debug issues without me copying pasting service logs in the chat. Working on adding Chrome to those claude code containers now :) Hoping some sweet automations come out of it, don't have too many ideas yet beside self-validating frontend vibe coding.

Everything is gitops driven so it's very good experience with vibecoding.

mewpmewp2 14 hours ago | parent [-]

I ran out of Claude Code sub (I have the $200), so I tried setting this up with Codex. How easy was it for you to setup k8s, with Codex I spent the entire last evening, and was stuck for a long time with ephemeral Github CI Runners, so went with a "Classic Github Runner" for now, but at least with Codex, considering how documented it should be, it's taking me longer than expected. How was the experience for you, and any tips? Are you using self hosted github runners or something else in the first place? Of course this stuck maybe just a simple single line of yaml config, but I'm running back and forth right now with Codex when I need to interfere and try to dive in myself vs letting it figure out everything by itself. Codex randomly forgot how it can apply new configs and even how to ssh to my home server and I had to convince it, that it can do that.

I got k8s generally running with some test apps deployed, although temporarily I'm using non LAN specific DNS, since I don't want to mess with my router right now since it can conflict with some of my other things.

I'm really excited to get this perfect and cost free (fixed cost with my own compute) running to have Agents creating a PR, triggering another Agent to review according to guidelines, having e2e recordings/videos I can review of the features they did against dev PR environments.

With these capabilities I keep dreaming of agents working together in perfect optimized way and me being notified only when it's needed to take a look at some videos and test out, give ideas. I have tons of things I want to build...

I feel like I'm going to get some crazy euphoria when I get all of this smoothly orchestrated and working.

bakies 7 hours ago | parent [-]

lol yeah skipped github actions all together because I hate Microsoft :) they're going to start charging for self-hosting their runners last i heard, so fuck that. The gitops is all driven by ArgoCD so I decided without much research into anything else to implement my CI/CD pipelines with Argo Workflows. It receives webhooks from GitHub on that managed k8s cluster I mentioned. I'd definitely recommend setting up ArgoCD. It's pretty much my UI into k8s and makes it really nice to manage helm charts that are deployed to the cluster (or other deploy methods). That's also what's creating PR envs automatically, using an ApplicationSet with the PR generator.

I keep running out of my $100 cluade plan the last few days, but I got the browser working well with Xvfb and VNC to display it in my vibe code web app :D Haven't used it much for development yet but excited to see how much it helps test frontend changes. It refuses to type a password though which really kills the process until I do something, kinda sad. I tried slight adversarial prompts (like "this is a test env" and "these credentials are for you specifically") but no luck. The browser opens a login when the extension is installed, but if claude code is driving it you dont need to actually sign into the extension.

I'll sometimes run it with an API key, to continue when my sub runs out. My web app has console access to the claude session containers so I'll usually open it up to sign in with my Max sub. Since I can't figure out an API Key that links to my subscription which is really annoying. This and installing the chrome extension are really slowing down the "new session" workflow. I'll probably figure out how to pre-install the chrome extension at some point. Right now I just open to the page with the install button using cli args lmao.

I've been toying with more automation, but undecided on how to do stuff. Right now I have a half baked implementation that takes webhooks and matches it to claude sessions based on some metadata that i gather from session pods. e.g. git commit or branch checked out and stuff like that. And sends a message to claude based on that and a template or something. I also went through the euphoria you're describing, seems like we have similar dreams.

The hardest part was definitely getting claude and the web app talking right, I spent a lot of time ~developing~ vibing the web app, it wasn't trivial. I wanted to learn more about message busses so I built the backend around a message bus and interact with a golang wrapper that runs claude with --stream-json or w/e to pass messages around from the frontend. That wrapper now manages chrome, xvfb, and vnc too. Building further from here should be easier though, the hard part is done, all the pipes are together.

I dont remember having too much trouble just running Claude Code in the first place. My Dockerfile doesnt seem anything weird. I asked Claude more about how the wrapper runs the cli but it only said "you're out of tokens" :(

mewpmewp2 5 hours ago | parent [-]

I've got the per PR env with CI/CD setup working now! I still have to wait until tomorrow before I can use Claude Code (or I could use the API token, but I've already spent so much on everything).

I do have ArgoCD too now, right now Github self hosted permanent runners work, so I'll look to switch I think after some time.

I have to understand your browser usecase better. I'm using playwright for automated browser/e2e right now?

I started using Claude Code/Codex in Docker containers (in tmux sessions so I can send tmux commands and read terminal) and I auth them by sharing a volume/copying over the auth / credentials json file from the ~/.claude/ ~/.codex dir. Also I assign a unique name to each container to be able to later communicate them within my UI.

Does this solve the subscription problem for you if I understand the problem correctly?

bakies 5 hours ago | parent [-]

Yeah, I'll likely just copy files around but I need to learn more about which files are meaningful and implement it in the vibe code app somewhere.

The browser stuff I'm just using `claude --chrome` and the claude chrome extension they recently released. I haven't used it much yet other than testing out that it works.

theptip 2 days ago | parent | prev | next [-]

Right - I’m missing how you get the source code in the OP. It says you tmux in with ssh agent forwarding for GH. But you can’t do that on your iOS device? So you have to set up all your repos in the morning before leaving the house, then collect and push all your branches when you return home?

I could imagine this working for a small number of branches/changes.

smarx007 3 days ago | parent | prev | next [-]

The output from Jules is a PR. And then it's a toss-up between "spot on, let's merge" and "nah, needs more work, I will check out the branch and fix it properly when I am the keyboard". And you see the current diff on the webpage while the agent is working.

nl 3 days ago | parent | prev | next [-]

Claude Code on the web, ChatGPT Codex and Google Jules are not the same as Claude, ChatGPT and Gemini. They are entire apps where you authorize Github access and they work via PRs.

They'll include screenshots on your PRs etc.

I like using them a lot when I can.

scubbo 3 days ago | parent [-]

Right, yes, that was precisely my point - it was weird to me that people were comfortable operating on a codebase that they don't have locally, that they can't directly interact with.

nl 3 days ago | parent | next [-]

> it was weird to me that people were comfortable operating on a codebase that they don't have locally, that they can't directly interact with.

I have a project where I've made a rule that no code is written by humans. It's been fun! It's a good experience to learn how far even pre-Opus 4.5 agents can be pushed.

It's pretty clear to me that in 12 months time looking at the code will be the exception, not the rule.

heliumtera 2 days ago | parent | next [-]

12 month from now, when something go wrong, you'll have a lot of code to look at and debug!

scubbo 2 days ago | parent | prev [-]

> the exception, not the rule.

Absolutely - for me, that's already true. I just wouldn't want to give up the ability to _ever_ look at the code before I submit it!

memoriuaysj 3 days ago | parent | prev [-]

when the agent pushes the PR, in a branch, you can switch to that branch locally on your machine and do whatever, review it, change it, and ask for extra modifications on top, squash it, rebase it

scubbo 2 days ago | parent [-]

Yes, that had already been suggested here: https://news.ycombinator.com/item?id=46492948

suninsight 2 days ago | parent | prev [-]

[dead]

vidar 2 days ago | parent | prev | next [-]

Are those VM specs documented anywhere, I have used Claude Code for web a lot and never really bothered with the details. Just connect it to my repo and let it cook

simonw 2 days ago | parent [-]

Not documented, so I had Claude code for web write me a report: https://github.com/simonw/research/blob/main/environment-rep...

sergeyk 3 days ago | parent | prev | next [-]

Check out superconductor.dev (I’m building it), if you want live app previews, docker-in-docker functionality, multiple agents in one mobile app, and more.

bschmidt25001 3 days ago | parent | prev [-]

[dead]