It's looking like we'll have Chinese OSS to thank for being able to host our own intelligence, free from the whims of proprietary megacorps.

I know it doesn't make financial sense to self-host given how cheap OSS inference APIs are now, but it's comforting not being beholden to anyone or requiring a persistent internet connection for on-premise intelligence.

Didn't expect to go back to macOS but they're basically the only feasible consumer option for running large models locally.

▲

btbuildem 4 hours ago | parent | next [-]

> doesn't make financial sense to self-host

I guess that's debatable. I regularly run out of quota on my claude max subscription. When that happens, I can sort of kind of get by with my modest setup (2x RTX3090) and quantized Qwen3.

And this does not even account for privacy and availability. I'm in Canada, and as the US is slowly consumed by its spiral of self-destruction, I fully expect at some point a digital iron curtain will go up. I think it's prudent to have alternatives, especially with these paradigm-shattering tools.

▲

jsheard 4 hours ago | parent | next [-]

I think AI may be the only place you could get away with calling a 2x350W GPU rig "modest".

That's like ten normal computers worth of power for the GPUs alone.

▲

dymk 2 hours ago | parent | next [-]

That's maybe a few dollars to tens of dollars in electricity per month depending on where in the US you live

▲

bigyabai an hour ago | parent | prev | next [-]

> That's like ten normal computers worth of power for the GPUs alone.

Maybe if your "computer" in question is a smartphone? Remember that the M3 Ultra is a 300w+ chip that won't beat one of those 3090s in compute or raster efficiency.

▲

jsheard an hour ago | parent [-]

I wouldn't class the M3 Ultra as a "normal" computer either. That's a big-ass workstation. I was thinking along the lines of a typical Macbook or Mac Mini or Windows laptop, which are fine for 99% of anyone who isn't looking to run gigantic AI models locally.

	▲	bigyabai an hour ago \| parent [-]
		Those aren't "normal" computers, either. They're iPad chips running in the TDP envelope of a tablet, usually with iPad-level performance to match.

▲

kataklasm 3 hours ago | parent | prev [-]

Did you even try to read and understand the parent comment? They said they regularly run out of quota on the exact subscription you're advising they subscribe to.

	▲	h3half 3 hours ago \| parent [-]
		Pot, kettle

▲

wongarsu 4 hours ago | parent | prev | next [-]

Self-hosting training (or gaming) makes a lot of sense, and once you have the hardware self-hosting inference on it is an easy step.

But if you have to factor in hardware costs self-hosting doesn't seem attractive. All the models I can self-host I can browse on openrouter and instantly get a provider who can get great prices. With most of the cost being in the GPUs themselves it just makes more sense to have others do it with better batching and GPU utilization

▲

zozbot234 4 hours ago | parent [-]

If you can get near 100% utilization for your own GPUs (i.e. you're letting requests run overnight and not insisting on any kind of realtime response) it starts to make sense. OpenRouter doesn't have any kind of batched requests API that would let you leverage that possibility.

▲

spmurrayzzz 3 hours ago | parent | next [-]

For inference, even with continuous batching, getting 100% MFUs is basically impossible to do in practice. Even the frontier labs struggle with this in highly efficient infiniband clusters. Its slightly better with training workloads just due to all the batching and parallel compute, but still mostly unattainable with consumer rigs (you spend a lot of time waiting for I/O).

I also don't think the 100% util is necessary either, to be fair. I get a lot of value out of my two rigs (2x rtx pro 6000, and 4x 3090) even though it may not be 24/7 100% MFU. I'm always training, generating datasets, running agents, etc. I would never consider this a positive ROI measured against capex though, that's not really the point.

	▲	zozbot234 3 hours ago \| parent [-]
		Isn't this just saying that your GPU use is bottlenecked by things such as VRAM bandwidth and RAM-VRAM transfers? That's normal and expected.

▲

sowbug 3 hours ago | parent | prev [-]

In Silicon Valley we pay PG&E close to 50 cents per kWh. An RTX 6000 PC uses about 1 kW at full load, and renting such a machine from vast.ai costs 60 cents/hour as of this morning. It's very hard for heavy-load local AI to make sense here.

▲

btbuildem 3 hours ago | parent | next [-]

Yikes.. I pay ~7¢ per kWh in Quebec. In the winter the inference rig doubles as a space heater for the office, I don't feel bad about running local energy-wise.

▲

Imustaskforhelp 3 hours ago | parent | prev [-]

And you are forgetting the fact that things like vast.ai subscriptions would STILL be more expensive than Openrouter's api pricing and even more so in the case of AI subscriptions which actively LOSE money for the company.

So I would still point out the GP (Original comment) where yes, it might not make financial sense to run these AI Models [They make sense when you want privacy etc, which are all fair concerns but just not financial sense]

But the fact that these models are open source still means that they can be run when maybe in future the dynamics might shift and it might make sense running such large models locally. Even just giving this possibility and also the fact that multiple providers could now compete in say openrouter etc. as well. All facts included, definitely makes me appreciate GLM & Kimi compared to proprietory counterparts.

Edit: I highly recommend this video a lot https://www.youtube.com/watch?v=SmYNK0kqaDI [AI subscription vs H100]

This video is honestly one of the best in my opinion about this topic that I watched.

▲

HumanOstrich 3 hours ago | parent [-]

Why did you quote yourself at the end of this comment?

	▲	Imustaskforhelp 2 hours ago \| parent [-]
		Oops sorry. Fixed it now but I am trying a HN progressive extension and what it does is if I have any text selected it can actually quote it and I think this is what might've happened or such a bug I am not sure. It's fixed now :)

▲

Aurornis 3 hours ago | parent | prev | next [-]

> I regularly run out of quota on my claude max subscription. When that happens, I can sort of kind of get by with my modest setup (2x RTX3090) and quantized Qwen3.

When talking about fallback from Claude plans, The correct financial comparison would be the same model hosted on OpenRouter.

You could buy a lot of tokens for the price of a pair of 3090s and a machine to run them.

	▲	bigyabai an hour ago \| parent [-]
		> You could buy a lot of tokens for the price of a pair of 3090s and a machine to run them. That's a subjective opinion, to which the answer is "no you can't" for many people.

▲

mythz 4 hours ago | parent | prev | next [-]

Did the napkin math on M3 Ultra ROI when DeepSeek V3 launched: at $0.70/2M tokens and 30 tps, a $10K M3 Ultra would take ~30 years of non-stop inference to break even - without even factoring in electricity. Clearly people aren't self-hosting to save money.

I've got a lite GLM sub $72/yr which would require 138 years to burn through the $10K M3 Ultra sticker price. Even GLM's highest cost Max tier (20x lite) at $720/yr would buy you ~14 years.

▲

ljosifov 3 hours ago | parent | next [-]

Everyone should do the calculation for themselves. I too pay for couple of subs. But I'm noticing having an agent work for me 24/7 changes the calculation somewhat. Often not taken into account: the price of input tokens. To produce 1K of code for me, the agent may need to churn through 1M of tokens of codebase. IDK if that will be cached by the API provider or not, but that makes x5-7 times price difference. OK discussion today about that and more https://x.com/alexocheema/status/2020626466522685499

▲

wongarsu 4 hours ago | parent | prev | next [-]

And it's worth noting that you can get DeepSeek at those prices from DeepSeek (Chinese), DeepInfra (US with Bulgarian founder), NovitaAI (US), AtlasCloud (US with Chinese founder), ParaSail (US), etc. There is no shortage of companies offering inference, with varying levels of trustworthiness, certificates and promises around (lack of) data retention. You just have to pick one you trust

▲

oceanplexian 3 hours ago | parent | prev | next [-]

Doing inference with a Mac Mini to save money is more or less holding it wrong. Of course if you buy some overpriced Apple hardware it’s going to take years to break even.

Buy a couple real GPUs and do tensor parallelism and concurrent batch requests with vllm and it becomes extremely cost competitive to run your own hardware.

	▲	mythz 3 hours ago \| parent [-]
		> Doing inference with a Mac Mini to save money is more or less holding it wrong. No one's running these large models on a Mac Mini. > Of course if you buy some overpriced Apple hardware it’s going to take years to break even. Great, where can I find cheaper hardware that can run GLM 5's 745B or Kimi K2.5 1T models? Currently it requires 2x M3 Ultras (1TB VRAM) to run Kimi K2.5 at 24 tok/s [1] What are the better value alternatives? [1] https://x.com/alexocheema/status/2016404573917683754

▲

DeathArrow 3 hours ago | parent | prev | next [-]

I don't think an Apple PC can run full Deepseek or GLM models.

Even if you quantize the hell out of the models to fit in the memory, they will be very slow.

▲

retr0rocket 4 hours ago | parent | prev [-]

[dead]

▲

visarga 3 hours ago | parent | prev | next [-]

Your $5,000 PC with 2 GPUs could have bought you 2 years of Claude Max, a model much more powerful and with longer context. In 2 years you could make that investment back in pay raise.

▲

benterix 2 hours ago | parent | next [-]

> In 2 years you could make that investment back in pay raise.

Could you elaborate? I fail to grasp the implication here.

▲

tw1984 2 hours ago | parent | prev | next [-]

> In 2 years you could make that investment back in pay raise.

you can't be a happy uber driver making more money in the next 24 months by having a fancy car fitted with the best FSD in town when all cars in your town have the same FSD.

▲

visarga 2 hours ago | parent [-]

But they don't have the same human in the loop though.

	▲	tw1984 an hour ago \| parent [-]
		that software is called autonomous agents, the term autonomous has nothing to do with human in the loop, it is the complete opposite.

▲

dymk an hour ago | parent | prev [-]

This claim has so many assumptions mixed in it's utterly useless

▲

7thpower 4 hours ago | parent | prev | next [-]

Unless you already had those cards, it probably still doesn’t make sense from a purely financial perspective unless you have other things you’re discounting for.

Doesn’t mean you shouldn’t do it though.

▲

flaviolivolsi 4 hours ago | parent | prev | next [-]

How does your quantized Qwen3 compares in code quality to Opus?

	▲	Aurornis 4 hours ago \| parent \| next [-]
		Not the person you’re responding to, but my experience with models up through Qwen3-coder-next is that they’re not even close. They can do a lot of simple tasks in common frameworks well. Doing anything beyond basic work will just burn tokens for hours while you review and reject code.
	▲	btbuildem 3 hours ago \| parent \| prev [-]
		It's just as fast, but not nearly as clever. I can push the context size to 120k locally, but quality of the work it delivers starts to falter above say 40k. Generally you have to feed it more bite-sized pieces, and keep one chat to one topic. It's definitely a step down from SOTA.

▲

4 hours ago | parent | prev [-]

[deleted]

▲

fauigerzigerk 4 hours ago | parent | prev | next [-]

>...free from the whims of proprietary megacorps

In one sense yes, but the training data is not open, nor is the data selection criteria (inclusions/exclusions, censorship, safety, etc). So we are still subject to the whims of someone much more powerful that ourselves.

The good thing is that open weights models can be finetuned to correct any biases that we may find.

▲

NiloCK 5 hours ago | parent | prev | next [-]

> Didn't expect to go back to macOS but their basically the only feasible consumer option for running large models locally.

I presume here you are referring to running on the device in your lap.

How about a headless linux inference box in the closet / basement?

Return of the home network!

▲

jannniii 4 hours ago | parent | next [-]

Indeed and I got two words for you:

Strix Halo

▲

SillyUsername 3 hours ago | parent | next [-]

Also, cheaper... X99 + 8x DDR4 + 2696V4 + 4x Tesla P4s running on llama.cpp. Total cost about $500 including case and a 650W PSU, excluding RAM. Running TDP about 200W non peak 550W peak (everything slammed, but I've never seen it and I've an AC monitor on the socket). GLM 4.5 Air (60GB Q3-XL) when properly tuned runs at 8.5 to 10 tokens / second, with context size of 8K. Throw in a P100 too and you'll see 11-12.5 t/s (still tuning this one). Performance doesn't drop as much for larger model sizes as the internode communication and DDR4 2400 is the limiter, not the GPUs. I've been using this with 4 channel 96GB ram, recently updated to 128GB.

	▲	Aurornis 3 hours ago \| parent [-]
		> Also, cheaper... X99 + 8x DDR4 + 2696V4 + 4x Tesla P4s running on llama.cpp. Total cost about $500 including case and a 650W PSU, excluding RAM. Excluding RAM in your pricing is misleading right now. That’s a lot of work and money just to get 10 tokens/sec

▲

esafak 4 hours ago | parent | prev [-]

How much memory does yours have, what are you running on it, with what cache size, and how fast?

▲

Aurornis 5 hours ago | parent | prev | next [-]

Apple devices have high memory bandwidth necessary to run LLMs at reasonable rates.

It’s possible to build a Linux box that does the same but you’ll be spending a lot more to get there. With Apple, a $500 Mac Mini has memory bandwidth that you just can’t get anywhere else for the price.

▲

ingenieroariel 4 hours ago | parent | next [-]

With Apple devices you get very fast predictions once it gets going but it is inferior to nvidia precisely during prefetch (processing prompt/context) before it really gets going.

For our code assistant use cases the local inference on Macs will tend to favor workflows where there is a lot of generation and little reading and this is the opposite of how many of use use Claude Code.

Source: I started getting Mac Studios with max ram as soon as the first llama model was released.

▲

Aurornis 4 hours ago | parent | next [-]

> With Apple devices you get very fast predictions once it gets going but it is inferior to nvidia precisely during prefetch (processing prompt/context) before it really gets going

I have a Mac and an nVidia build and I’m not disagreeing

But nobody is building a useful nVidia LLM box for the price of a $500 Mac Mini

You’re also not getting as much RAM as a Mac Studio unless you’re stacking multiple $8,000 nVidia RTX 6000s.

There is always something faster in LLM hardware. Apple is popular for the price points of average consumers.

▲

storus 4 hours ago | parent | prev | next [-]

This. It's awful to wait 15 minutes for M3 Ultra to start generating tokens when your coding agent has 100k+ tokens in its context. This can be partially offset by adding DGX Spark to accelerate this phase. M5 Ultra should be like DGX Spark for prefill and M3 Ultra for token generation but who know when it will pop up and for how much? And it still will be at around 3080 GPU levels just with 512GB RAM.

▲

zozbot234 4 hours ago | parent | prev | next [-]

All Apple devices have a NPU which is potentially able to save power for compute bound operations like prefill (at least if you're ok with FP16 FMA/INT8 MADD arithmetic). It's just a matter of hooking up support to the main local AI frameworks. This is not a speedup per se but gives you more headroom wrt. power and thermals for everything else, so should yield higher performance overall.

	▲	d3k 3 hours ago \| parent [-]
		AFAIK, only CoreML can use Apple's NPU (ANE). Pytorch, MLX and the other kids on the block use MPS (the GPU). I think the limitations you mentioned relate to that (but I might be missing something)

▲

FuckButtons 3 hours ago | parent | prev [-]

Vllm-mlx with prefix caching helps with this.

▲

ac29 3 hours ago | parent | prev | next [-]

> a $500 Mac Mini has memory bandwidth that you just can’t get anywhere else for the price.

The cheapest new mac mini is $600 on Apple's US store.

And it has a 128-bit memory interface using LPDDR5X/7500, nothing exotic. The laptop I bought last year for <$500 has roughly the same memory speed and new machines are even faster.

	▲	jsheard 3 hours ago \| parent [-]
		> The cheapest new mac mini is $600 on Apple's US store. And you're only getting 16GB at that base spec. It's $1000 for 32GB, or $2000 for 64GB plus the requisite SOC upgrade. > And it has a 128-bit memory interface using LPDDR5X/7500, nothing exotic. Yeah, 128-bit is table stakes and AMD is making 256-bit SOCs as well now. Apple's higher end Max/Ultra chips are the ones which stand out with their 512 and 1024-bit interfaces. Those have no direct competition.

▲

zozbot234 5 hours ago | parent | prev | next [-]

And then only Apple devices have 512GB of unified memory, which matters when you have to combine larger models (even MoE) with the bigger context/KV caching you need for agentic workflows. You can make do with less, but only by slowing things down a whole lot.

▲

pja 2 hours ago | parent | prev | next [-]

Only the M4 Pro Mac Minis have faster RAM than you’ll get in an off-the-shelf Intel/AMD laptop. The M4 Pros start at $1399.

You want the M4 Max (or Ultra) in the Mac Studios to get the real stuff.

▲

cmrdporcupine 5 hours ago | parent | prev [-]

But a $500 Mac Mini has nowhere near the memory capacity to run such a model. You'd need at least 2 512GB machines chained together to run this model. Maybe 1 if you quantized the crap out of it.

And Apple completely overcharges for memory, so.

This is a model you use via a cheap API provider like DeepInfra, or get on their coding plan. It's nice that it will be available as open weights, but not practical for mere mortals to run.

But I can see a large corporation that wants to avoid sending code offsite setting up their own private infra to host it.

	▲	zozbot234 4 hours ago \| parent [-]
		The needed memory capacity depends on active parameters (not the same as total with a MoE model) and context length for the purpose of KV caching. Even then the KV cache can be pushed to system RAM and even farther out to swap, since writes to it are small (just one KV vector per token).

▲

mythz 4 hours ago | parent | prev [-]

Not feasible for Large models, it takes 2x M3 512GB Ultra's to run the full Kimi K2.5 model at a respectable 24 tok/s. Hopefully the M5 Ultra will can improve on that.

▲

vidarh 4 hours ago | parent | prev | next [-]

I don't really care about being able to self host these models, but getting to a point where the hosting is commoditised so I know I can switch providers on a whim matters a great deal.

Of course, it's nice if I can run it myself as a last resort too.

▲

muyuu 2 hours ago | parent | prev | next [-]

you have 128GB strix halo machines for US$ ~3k

these run some pretty decent models locally, currently I'd recommend GPT-OSS 120GB, Qwen Coder Next 80B (either Q8 or Q6 quants, depending on speed/quality trade-offs) and the very best model you can run right now which is Step 3.5 Flash (ubergarm GGUF quant) with 256K context although this does push it to the limit - GLMs and nemotrons also worth trying depending on your priorities

there's clearly a big quantum leap in the SotA models using more than 512GB VRAM, but i expect that in a year or two, the current SotA is achievable with consumer level hardware, if nothing else hardware should catch up with running Kimi 2.5 for cheaper than 2x 512GB mac studio ultras - perhaps medusa halo next year supports 512GB and DDR5 comes down again, and that would put a local whatever the best open model of that size is next year within reach of under-US$5K hardware

the odd thing is that there isn't much in this whole range between 128GB and 512GB VRAM requirement to justify the huge premium you pay for Macs in that range - but this can change at any point as every other day there are announcements

	▲	saubeidl 2 hours ago \| parent [-]
		And you can get Strix Halo in a Laptop that looks and feels like a Macbook Pro that can run Linux if you buy an HP ZBook G1A. Super happy with that thing, only real downside is battery life.

▲

Aurornis 5 hours ago | parent | prev | next [-]

> It's looking like we'll have Chinese OSS to thank for being able to host our own intelligence, free from the whims of proprietary megacorps.

I don’t know where you draw the line between proprietary megacorp and not, but Z.ai is planning to IPO soon as a multi billion dollar company. If you think they don’t want to be a multi billion dollar megacorp like all of the other LLM companies I think that’s a little short sighted. These models are open weight, but I wouldn’t count them as OSS.

Also Chinese companies aren’t the only companies releasing open weight models. ChatGPT has released open weight models, too.

	▲	joshstrange 4 hours ago \| parent [-]
		> Also Chinese companies aren’t the only companies releasing open weight models. ChatGPT has released open weight models, too. I was with you until here. The scraps OpenAI has released don't really compare to the GLM models or DeepSeek models (or others) in both cadence and quality (IMHO).

▲

pzo an hour ago | parent | prev | next [-]

AFAIK they haven't released this one as OSS yet. They might eventually but its pretty obvious to me that at one point all/most those more powerful chinese models probably will stop being OSS.

▲

mikrl 5 hours ago | parent | prev | next [-]

>I know it doesn't make financial sense to self-host given how cheap OSS inference APIs are now

You can calculate the exact cost of home inference, given you know your hardware and can measure electrical consumption and compare it to your bill.

I have no idea what cloud inference in aggregate actually costs, whether it’s profitable or a VC infused loss leader that will spike in price later.

That’s why I’m using cloud inference now to build out my local stack.

	▲	mythz 5 hours ago \| parent [-]
		Not concerned with electricity cost - I have solar + battery with excess supply where most goes back to the grid for $0 compensation (AU special). But I did the napkin math on M3 Ultra ROI when DeepSeek V3 launched: at $0.70/2M tokens and 30 tps, a $10K M3 Ultra would take ~30 years of non-stop inference to break even - without even factoring in electricity. You clearly don't self-host to save money. You do it to own your intelligence, keep your privacy, and not be reliant on a persistent internet connection.

▲

nialv7 4 hours ago | parent | prev | next [-]

> Didn't expect to go back to macOS but their basically the only feasible consumer option for running large models locally.

Framework Desktop! Half the memory bandwidth of M4 Max, but much cheaper.

▲

thebruce87m 4 hours ago | parent [-]

Does that equate to half the speed in terms of output? Any recommended benchmarks to look at?

	▲	nialv7 3 hours ago \| parent [-]
		https://kyuz0.github.io/amd-strix-halo-toolboxes/

▲

gz5 4 hours ago | parent | prev | next [-]

hopefully it will spread - many open options, from many entities, globally.

it is brilliant business strategy from China so i expect it to continue and be copied - good things.

reminds me of Google's investments into K8s.

▲

andersa 4 hours ago | parent | prev | next [-]

They haven't published the weights yet, don't celebrate too early.

▲

throwaw12 4 hours ago | parent | prev | next [-]

our laptops, devices, phones, equipments, home stuff are all powered by Chinese companies.

It wouldn't surprise me if at some point in the future my local "Alexa" assistant will be fully powered by local Chinese OSS models with Chinese GPUs and RAM.

▲

2 hours ago | parent | prev | next [-]

[deleted]

▲

mminer237 3 hours ago | parent | prev | next [-]

I'm not sure being beholden to the whims of the Chinese Communist Party is an iota better than the whims of proprietary megacorps, especially given this probably will become part of a megacorp anyway.

	▲	hnfong 2 hours ago \| parent [-]
		It seems you missed the point entirely once you saw the word "Chinese". The point isn't that the models are from China. It's that the weights are open. You can download the weights and finetune them yourself. Nobody is beholden to anything.

▲

TheRealPomax 3 hours ago | parent | prev | next [-]

Not going to call $30/mo for a github copilot subscription "cheap". More like "extortionary".

▲

cmrdporcupine 3 hours ago | parent [-]

Yeah it's funny how the needle has moved on this kind of thing.

Two years ago people scoffed at buying a personal license for e.g. JetBrains IDEs which netted out to $120 USD or something a year; VS Code etc took off because they were "free"

But now they're dumping monthly subs to OpenAI and Anthropic that work out to the same as their car insurance payments.

It's not sustainable.

	▲	TheRealPomax 2 hours ago \| parent [-]
		There's also zero incentive for individual companies to care: if I only want to use opus in VS code (and why would I use anything else, it's so much better at the job) I can either pay for copilot, which has excellent VS Code integration (because it has to), or I can pay Claude specifically and then use their extension which has the absolute worst experience because not only is the chat "whimsical, to make AI fun!", its interface is pat of the sidebar, so it's mutually exclusive with your file browser, search, etc. So whether you pay Claude or GitHub, Claude gets paid the same. So the consumer ends up footing a bill that has no reason to exist, and has no real competition because open source models can't run at the scale of an Opus or ChatGPT. (not unless the EU decides it's time for a "European Open AI Initiative" where any EU citizen gets free access to an EU wide datacenter backed large scale system that AI companies can pay to be part of, instead of getting paid to connect to)

▲

swalsh 5 hours ago | parent | prev [-]

Yeah that sounds great until it's running as an autonomous moltbot in a distributed network semi-offline with access to your entire digital life, and China sneaks in some hidden training so these agents turn into an army of sleeper agents.

▲

jfaat 5 hours ago | parent | next [-]

Lol wat? I mean you certainly have enough control self hosting the model to not let it join some moltbot network... or what exactly are you saying would happen?

▲

swalsh 4 hours ago | parent [-]

We just saw last week people are setting up moltbots with virtually no knowledge of what it has and doesn't have access. The scenario that i'm afraid of is China realizes the potential of this. They can add training to the models commonly used for assistants. They act normal, are helpful, everything you'd want a bot to do. But maybe once in a while it checks moltbook or some other endpoint China controls for a trigger word. When it sees that, it kicks into a completely different mode, maybe it writes a script to DDoS targets of interest, maybe it mines your email for useful information, maybe the user has credentials to some piece that is a critical component of an important supply chain. This is not a wild scenario, no new sci-fi technology would need to be invented. Everything to do it is available today, people are configuring it, and using it like this today. The part that I fear is if it is running locally, you can't just shut off API access and kill the threat. It's running on it's own server, it's own model. You have to cut off each node.

Big fan of AI, I use local models A LOT. I do think we have to take threats like this seriously. I don't Think it's a wild scifi idea. Since WW2, civilians have been as much of an equal opportunity target as a soldier, war is about logistics, and civilians supply the military.

	▲	resters 3 hours ago \| parent [-]
		Fair point but I would be more worried about the US government doing this kind of thing to act against US citizens than the Chinese government doing it. I think we're in a brief period of relative freedom where deep engineering topics can be discussed with AI agents even though they have potential uses in weapons systems. Imagine asking chat gpt how to build a fertilizer bomb, but apply the same censorship to anything related to computer vision, lasers, drone coordination, etc.

▲

saubeidl 4 hours ago | parent | prev | next [-]

What if the US government does instead?

I don't consider them more trustworthy at this point.

▲

tw1984 2 hours ago | parent | prev | next [-]

exactly, we all need to use CIA/NSA approved models to stay safe.

very smart idea!

▲

resters 5 hours ago | parent | prev [-]

sleeper agents to do what? let's see how far you can take the absurd threat porn fantasy. I hope it was hyperbole.

	▲	falcor84 3 hours ago \| parent \| next [-]
		There was research last year [0] finding significant security issues with the Chinese-made Unitree robots, apparently being pre-configured to make it easy to exfiltrate data via wi-fi or BLE. I know it's not the same situation, but at this stage, I wouldn't blame anyone for "absurd threat porn fantasy" - the threats are real, and present-day agentic AI is getting really good at autonomously exploiting vulnerabilities, whether it's an external attacker using it, or whether "the call is coming from inside the house". [0] https://spectrum.ieee.org/unitree-robot-exploit
	▲	swalsh 4 hours ago \| parent \| prev [-]
		I replied to the comment who doubted me in a more polite manner.