| ▲ | ValentineC 4 hours ago |
| > I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI - which currently costs me just $100 per provider thanks to their generous subsidized plans for individual subscribers. Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China? Many lower-budget individuals are now moving to China open weight models like DeepSeek. I wonder if China's really subsidising the providers, or if inferencing costs are actually much lower, and Anthropic/OpenAI are just making sure no money's left on the table for their eventual IPOs. |
|
| ▲ | vidarh an hour ago | parent | next [-] |
| We can tell that the inferencing costs for many of these models are low enough that these models are being sold close to real costs on the basis that many of them are open weight and available from third party providers who have no incentive to subsidize them. I think the frontier labs will need to drop their high per-token prices at least for their low and mid-level models for the reason that several Chinese models (at least Qwen, DeepSeek, Kimi and GLM) are "close enough" that with the right harness they are cost effective alternatives. They won't necessarily need to close the gap - at least not yet -, because these models won't necessarily compete at the same token counts. E.g. at least some of them need to do far more work to solve the same problems. But, yeah, the prices will come down one way or the other. At the same time, even the subscriptions for the cheap Chinese models are probably subsidised, and those subscriptions are likely to get less generous over time. |
|
| ▲ | dgellow 3 hours ago | parent | prev | next [-] |
| One aspect Paul Kedrosky mentioned recently is the concept of „duration mismatch“. The price per token goes down over time (either because the AI vendor reduces due to competition pressure, or because customers are now incentivized to use older cheaper models). But datacenters are financed through debt, with the assumption their revenue increases over time. Quoting him: „[AI vendors are] paying for a fixed cost with a depreciating commodity“[0]. So you have on one end the token revenue trending down, on the other end the training cost going up for the next frontier models, and you need to pay back your 10y debt. 0: https://youtu.be/wGZboZcSGDY?is=64GuKyqBh_4aSjTE |
| |
| ▲ | missedthecue an hour ago | parent | next [-] | | "So you have on one end the token revenue trending down, on the other end the training cost going up for the next frontier models, and you need to pay back your 10y debt." Not necessarily, the bond holders could simply take a massive hair cut and lose shitloads of money. On the topic of bubbles and exuberance, Jeff Bezos made the salient point that there was a massive over-invested biotech boom in the 1990s and tons of sophisticated investors ended up losing lots of money. But humanity still kept the medical advancements made by the boom. Stocks going down didn't un-research drugs, and it won't un-research new GPUs or un-build datacenters. | | |
| ▲ | biztos an hour ago | parent [-] | | In order to not un-build the data centers, they at least have to make more than it costs to operate them, and also not have some attractive liquidation value (the land, maybe). I could imagine something like “inference is done at home or in China, that’s the price to beat” and it’s not worth keeping all those GPUs cool out in Nevada. | | |
| ▲ | missedthecue an hour ago | parent [-] | | But the parent comment was that one of the bigger costs in these data centers was the interest expense on the borrowed money. A restructuring removes or heavily reduces that amount. The fiber laid during the dotcom bubble never paid back the investors or lenders, but it's still profitably connecting customers all these years later. | | |
| ▲ | nothercastle 15 minutes ago | parent | next [-] | | It’s true once built the data center can operate right up to a financed data center value of zero. The investors will loose money but the costs of AI will go down as they do | |
| ▲ | lazide 10 minutes ago | parent | prev [-] | | Yup, that is the real economic benefit of bankruptcy - a reset. |
|
|
| |
| ▲ | geysersam an hour ago | parent | prev | next [-] | | Current AI datacenter/model development investment rate is roughly 1T/year. That's a lot. But the US economy is 33T/year. So the investment pays back (roughly) over ten years if, each year, the AI investments increase overall productivity by 0.6%, assuming the AI companies can capture half of the value of that productivity gain. > „[AI vendors are] paying for a fixed cost with a depreciating commodity“ That's just a confusing way to say you don't think future models will be worth the development costs.
Because if future models are significantly better, why would the price of tokens to access those models deprecate? | | |
| ▲ | jiggawatts 44 minutes ago | parent [-] | | The $1T number seems more promises than reality, which is closer to the $300B to $500B level. Still a big number, but between a third and a half of the value used in the popular media. |
| |
| ▲ | bijowo1676 3 hours ago | parent | prev [-] | | do GPU chips really depreciate physically? There are no moving parts, I dont think memory chips or GPU chips deteriorate naturally. I think its only accounting depreciation. I have been using my laptop for a decade, what is stopping datacenters from using the purchased GPU chips for a decade? | | |
| ▲ | bgnn 2 hours ago | parent | next [-] | | Chips age and fail with age. You can check hot-carrier injection, bias-temperature instability and electromigration as they are the main aging mechanisms. All if these are a linear function of time but exponentieal of temperature. 90-100C these chips are running at are really tough, so they are likely to fail at couple of percent to 10% range in 2-3 years depending on the margins they have in the design. The solder joints are notorious to fail at a high rate too. | | | |
| ▲ | Aurornis 3 hours ago | parent | prev | next [-] | | There are data centers that use and rent out 10 year old server GPUs. They can't run larger modern models. They can't run smaller models as fast as newer servers. So their remaining market is applications where customers are okay with older, smaller models and slower performance. They have to price the service lower than competitors due to the lower performance. The older GPUs are less efficient so it costs them more to keep them running. They're paid off, but they're taking up valuable power, space, and cooling in a data center. Eventually there is a tipping point where it's better to replace that space and power budget with something new that has more demand. The parts are sold off on the open market. There's an equilibrium demand for the parts from other data centers keeping older servers running and from hobby people who are okay with a jet engine sounding toaster of a GPU running in their home. | | |
| ▲ | jmalicki 2 hours ago | parent [-] | | As long as the demand for GPUs keeps increasing, there are more data centers being built to house them. When you have waitlists for many many months for Blackwell GPUs, keeping the old ones around as long as customers are willing to pay for them is great. If I as a customer have a use case for a machine learning model I developed awhile ago, so an insect identification model, I had an ML researcher/eng develop it back in 2019, and it runs fine on a 2018-era T4 GPU (NVidia 2080 era), why mess with it? | | |
| ▲ | HumanOstrich an hour ago | parent [-] | | We aren't talking about insect identification models from 2019. | | |
| ▲ | jmalicki 34 minutes ago | parent [-] | | What do you think are running on the T4 GPUs in AWS? A lot of the use cases I know of for them are mid-level computer vision models that don't need to be frontier level. |
|
|
| |
| ▲ | munk-a 3 hours ago | parent | prev | next [-] | | In addition to the physical depreciations other comments mentioned I'd also mention that old chips will settle into a low price and then actually go up on a per unit basis if you're trying to buy a significant amount of them. With a limitation on fabrication facilities continuing to pump out older cards is an opportunity cost to the manufacturers that would prefer to be producing newer cards. If you were in a place where you suddenly wanted to buy 10,000 3080s, as an example, I'm not certain if the market could actually fulfill that demand and no one with the ability to increase the available supply to meet that demand actually wants to do so. Chips do wear out and need to be replaced (entropy do be like that and durability is not a primary concern for chip design) so you'll need to refresh your stock and, even if you don't need cutting edge models, the price of all chips at scale will go up over time. It may feel unintuitive since, when the PS3 was released PS1s were extremely cheap - but if you're struggling to understand this effect from your experiences in the consumer market you're actually looking at the price factor that starts making antiques increase in value since at a certain point they become scarce goods. The market price for an NES is higher today than it was in 2003 because the price had already bottomed out from demand from the general consumer market but the demand remaining (speedrunners and the like) is now fixed or growing while the supply is inevitably shrinking. | |
| ▲ | vb-8448 3 hours ago | parent | prev | next [-] | | I used to work in datacenters, during spinning disk era we had technicians from vendors basically every couple of days to replace some broken part. When the massive switch to ssd happened instead of having them every couple of days it was 3 or 4 times per month. Despite no moving parts things broke anyway and, even if it doesn't break, the vendor can make you change the technology just by playing with maintenance cost of the older one, limiting or removing spare parts from the market. | |
| ▲ | tardedmeme 2 hours ago | parent | prev | next [-] | | Gradually, and especially when hot. Modern chips are pretty close to the physical limits of how small they can be made, and that means atomic/chemical effects like electromigration are accounted for and determine the lifetime. Every extra 10 degrees Celsius of temperature doubles the speed of chemical reactions. When they stray too close to the line ... you get Intel's 13/14th gen chips that wear out after 1-2 years instead of 10-20 years. Intel calls it "Vmin drift" because that doesn't sound scary, but the actual point is that various wear-out mechanisms push the chip outside of its design envelope - increasing the voltage or lowering the clock speed may get it to run for a while longer, but you're living on borrowed time as the various circuits just stop working right and you get unpredictable instruction mis-execution: https://fgiesen.wordpress.com/2025/05/21/oodle-2-9-14-and-in... | | |
| ▲ | bijowo1676 an hour ago | parent [-] | | sounds like planned depreciation on Intel's part, they definitely do not design server grade chips for longevity since that would harm their own revenues |
| |
| ▲ | malfist 2 hours ago | parent | prev | next [-] | | They do degrade physically, but the bigger thing is they stop being competitive quickly. Each year or so we see doubling of GPU speeds for the same amount of power. If you build a 100MW data center with GPU compute and three years laster a new data center opens with the same cost for GPUs and same electricity cost you do, but can do twice as much compute, you quickly lose business unless the market is just so constrained customers can't afford to be picky. But the moment there's slack in the market you'll see major migrations off of providers that have the same cost but half, or quarter of the same performance. So when you see someone talking about GPUs fully deprecating in value in 1-3 years this is what they're talking about. Right now it's not a big deal because there's no slack in the market. But once there is, the bottom will drop out. | |
| ▲ | ozim 9 minutes ago | parent | prev | next [-] | | Transistors do wear out. Not going to elaborate as it is easy to ask GPT | |
| ▲ | mattalex an hour ago | parent | prev | next [-] | | Nothing is stopping them, it's just not worth it: Have a look at e.g. vast.ai's pricing (https://vast.ai/pricing). The V100 (2017 -> 9 years old) can be rented from $0.02 to $0.37/h (right now I can find a V100 with a Xeon Gold 6140 and 48GB RAM for $0.165/h). Let's assume the guy you rent it to pins it at its 250W TDP and let's ignore the running costs of CPU/RAM/etc...
Then you draw 1/4 kwh for that compute hour. The industrial electricity prices in the US vary between 7.5 and 25 ct per kwh (depending on state, time of day, etc...), so at 100% efficiency, assuming nothing ever breaks, and the CPU consumes 0W you earn about 14ct/h. And remember: V100s hours are sometimes sold at 1/10th the price. If I pick average conditions you need to start thinking of whether it is worth it to rent them out: Usually it isn't unless you have them anyways and just sell idle capacity. It's barely worth it to run them in a pure "is it profitable" sense, if we also account for the opportunity cost of taking up a slot in your datacenter it seizes to be worth it really quickly. | |
| ▲ | whateverboat 3 hours ago | parent | prev | next [-] | | Today's data center GPUs are essentially overclocked, and so at limit of how much the chip materials can physically handle, and therefore degrade over time. For example, GH200s operate at 1W/superchip but the actual safe power is somewhere around 650W which will allow them to function for a decade or more. But that leads to around 15% slowdown and that is unacceptable in today's competition. So current GPUs are destined to be depreciating assets. In future, we might have fixed cost GPUs but not today. | | |
| ▲ | missedthecue an hour ago | parent | next [-] | | I would presume the reason they are overclocked is because they are trying to make up for the shortage. In time, the shortage of computing components will be remedied, and tokens produced at lower power pulls will be cheaper. | |
| ▲ | bijowo1676 an hour ago | parent | prev [-] | | i think its reasonable to give up 15% of speed for a decade more lifetime. This depreciation change alters economics of GPU | | |
| |
| ▲ | numpad0 2 hours ago | parent | prev | next [-] | | Chips do deteriorate and fail naturally at datacenter scale or in timescales of decades, though not exactly like on financial reports. Leak current increases or electro-migrations occur at junctions or whatever those words mean. And yeah, it does feel like GPUs will start losing values slower going forward with Moore's Law being dead for a while. It used to be that 3-5 years old GPUs were more useful as space heaters than GPUs, but that's much less of the case today. | |
| ▲ | dgellow 3 hours ago | parent | prev | next [-] | | GPU do depreciate indeed, but here the depreciating commodity is the token, not the hardware. You sell cheaper token with the same hardware | |
| ▲ | threetonesun 3 hours ago | parent | prev | next [-] | | I assumed the issue was similar to crypto mining, where given finite amounts of space and power it makes sense to always be running the latest and most powerful GPUs instead of keeping older hardware running. There's definitely a secondary market for these GPUs as well. | |
| ▲ | bigfishrunning 3 hours ago | parent | prev | next [-] | | Your laptop doesn't have a 100% duty cycle. If you ran it like a data center it would indeed wear out much faster. | |
| ▲ | foobarian 3 hours ago | parent | prev | next [-] | | > There are no moving parts, I dont think memory chips or GPU chips deteriorate naturally I believe they do, but I too would love to know more details because there are several ways this can happen. Electromigration, package failures, VRAM failures, dielectric breakdown... Hopefully there will be studies soon similar to that old Google paper on HDD failures! | | |
| ▲ | hgoel 29 minutes ago | parent [-] | | Currently it's a pretty big ask to look at the several hundred billion transistors and the interconnects between them to find what broke. Though, those capabilities are maybe just a few years out, funnily it's taking AI to make it potentially doable. |
| |
| ▲ | manyatoms 2 hours ago | parent | prev | next [-] | | the hardware itself is still useful, but random failures happen every so often, so if you're trying to run a fixed sized fleet then your fleet shrinks when you can't get spares any more | |
| ▲ | sandworm101 3 hours ago | parent | prev [-] | | Yes, even if the hardware is untouched. As technology advances, the power cost per compute cycle goes down. A gpu using old tech costs progressively more to operate compared to the newer models. So its value goes down over time = depreciation. As for duty cycles, the chips are perfectly happy at 100% operation. Cooling and power componants fail, not the chips. But it costs manpower to repair such things and manpower is inconveniant these days. A gpu with any sort of fault just gets dumped. |
|
|
|
| ▲ | satvikpendem 2 hours ago | parent | prev | next [-] |
| Don't worry, they'll just lobby to ban Chinese models instead to keep their token revenues high. > Compounding the problem, labs in China often release dual-use capable models as open-weight. Once a model is open-weight, safeguards that do exist can be removed, making the model available to any state or non-state actor to use for malicious purposes, including the cyber and CBRN misuse those safeguards were built to prevent. https://www.anthropic.com/research/2028-ai-leadership |
| |
| ▲ | CuriouslyC 2 hours ago | parent [-] | | If you do the math, they don't have a choice. If China captures America's AI market it'll cause a major depression. They'll give it the BYD treatment, though it'll be a lot less effective. | | |
| ▲ | WarmWash an hour ago | parent | next [-] | | They'll ban them because (unless run locally or self-hosted) they are just data capture tools for the China. | | |
| ▲ | dakolli 5 minutes ago | parent [-] | | Please explain to me how that works. If I download gguf file and run inference with it, how is it collecting and sending data back to China? This makes no sense, 99% of the people using Chinese models are using them via Western inference providers who are running them and serving them to people over openrouter or whatever. If anyone is stealing your data it would be an American or European inference provider. A model has no ability to send data anywhere. China bad by default, right? |
| |
| ▲ | arealaccount 43 minutes ago | parent | prev [-] | | The “you wouldn’t download a car” meme applies here |
|
|
|
| ▲ | Animats 3 hours ago | parent | prev | next [-] |
| Raise them, more likely. NVidia says that GPU hardware prices won't decrease until at least 2030. The world is out of fab capacity. |
| |
| ▲ | EA-3167 2 hours ago | parent [-] | | Seriously, they’re trying to justify trillion+ IPO’s while setting piles of money on fire, prices aren’t going DOWN. | | |
| ▲ | dakolli 2 minutes ago | parent | next [-] | | They aren't going down, but in the meantime they'll cover their ass by bribing their way into the S&P 500 and then use your 60 year old mother's 401k and teacher's pension to fund their risky capital expenditure. | |
| ▲ | criddell 2 hours ago | parent | prev [-] | | Today's frontier models will be tomorrows low-end option. I think whatever model you are using today will be less expensive to use a year or two from now. | | |
| ▲ | missedthecue an hour ago | parent [-] | | Last year's o3 was more expensive than 5.5 is. Whatever model we are using now is probably be more expensive than next year's leading models will be. | | |
| ▲ | Insanity an hour ago | parent [-] | | Price per M/tokens is also a fuzzy metric when newer models reason longer, and then burn more tokens while doing so. |
|
|
|
|
|
| ▲ | freediddy 3 hours ago | parent | prev | next [-] |
| Most sane US companies will disallow use of cloud-based Chinese AI providers, because everything including code, data, PII, etc is being sent to them. |
| |
| ▲ | eikenberry 3 hours ago | parent | next [-] | | Then don't use the cloud-based Chinese providers, use cloud-base US/EU providers using Chinese models. The interesting Chinese models are all open making this issue mostly moot. | |
| ▲ | ceejayoz 3 hours ago | parent | prev | next [-] | | Saner companies ask the same question about models from their own country too. | |
| ▲ | rd 3 hours ago | parent | prev | next [-] | | I wonder if I could start a US-based company with good data regulation and just serve open-weight models at a competitive price. I feel like the real barrier is just that most companies willing to adopt AI usage enough to make it worth it at this point don't want to be using inferior models. | | |
| ▲ | tokioyoyo 3 hours ago | parent | next [-] | | Yes, you can. There are multiple inference providers out there. The problem is, it’s hard to beat the Chinese providers in cost. And you also have to compete with frontier model providers’ subsidized offerings. | |
| ▲ | CobrastanJorji 3 hours ago | parent | prev | next [-] | | Here's a free startup idea: operate an open-weight model service, and offer "Verified AI Integrity," which signs the input tokens, the seed for the randomness in selecting outputs, and the model ID, proving that the result of the call to AI was completely "organic" and was not interfered with. Your main audience would be snake oil salesmen trying to prove their AI products are unbiased and not under the thumb of any outside influence. This doesn't address the biases of the model itself, but that's not your business. Your business is selling tokens and security certificates. If you can get the right angel investor, you could maybe have your new standard required for some government applications. | |
| ▲ | mediaman 3 hours ago | parent | prev | next [-] | | There are plenty of US-based inference providers available, including AWS, that serve Chinese models at competitive prices (vs frontier US models). They also have lots of usage. Not necessarily for coding, but for other enterprise tasks. | |
| ▲ | fg137 2 hours ago | parent | prev [-] | | It's called AWS. Bedrock is right there. Price or data policy is never the issue. The models themselves are the problem -- most large US companies are not going to touch them. Source: directly involved in these discussions. You can downvote as much as you'd like but you can't ignore the facts. |
| |
| ▲ | tmp10423288442 2 hours ago | parent | prev | next [-] | | There are some objections here saying that some US firms are using Chinese AI providers, but I wonder if any of those are subject to compliance. Large firms that are disproportionately responsible for AI spending are all subject to compliance. | |
| ▲ | amunozo 3 hours ago | parent | prev | next [-] | | You can run DeepSeek as it's open weights, unlike Claude or GPT. | |
| ▲ | cheeze 3 hours ago | parent | prev [-] | | Deepseek has some models in Bedrock. There is definitely a huge market for a "good enough" model running within the country of the company |
|
|
| ▲ | testdelacc1 4 hours ago | parent | prev | next [-] |
| Per token costs will fall, but the harnesses will get more token hungry. Instead of just centering the div it’ll spin up a battery of agents to architect, critique, advise, code, review, refactor and so on. |
| |
| ▲ | sevenzero 4 hours ago | parent | next [-] | | I wish I could disable most of these. I already hate all the "oh you're actually right, let me fix that" nonsense. Then it proceeds to burn 50k tokens on the git history instead of copying logic A from a different part of the codebase to logic B, where I want that exact logic without having to write the boilerplate myself... | | |
| ▲ | apsurd 4 hours ago | parent | next [-] | | Makes me think of how my Claude.md files specifies to use the built in framework code-generators (rails). Those generators are deterministically right every time. I wonder how often the Agent actually follows the guidance. I do see them follow it when I look. But it doesn't seem so every time. | | |
| ▲ | thefunnyman 3 hours ago | parent [-] | | This is tricky since it can and will ignore your md directions. When possible I try to lean on tool call hooks or skills that invoke deterministic scripts. As much as you can remove the "choice" the better though still there's a lot of randomness in how reliably it invokes skills ime. |
| |
| ▲ | sfn42 3 hours ago | parent | prev [-] | | A lot of the time if you're copying code from one place to another what you actually want to do is abstract it so you can reuse it in both places. The LLM can easily do this type of stuff, just tell it and it'll happily do it. This is exactly what I mean when I tell people they need to work closer with the AI, tell it how to do things. Don't just tell it what to do and get frustrated when it does it differently than you would. A good way to achieve this without writing huge prompts is tell it to plan the change first. Just give it some vague low-effort directions. It'll usually get most things right, you tell it what you want different and once you're happy you tell it to go ahead. | | |
| ▲ | sevenzero 3 hours ago | parent [-] | | Nah the codebase is legacy fucked and I cant be bothered to try and optimize business flows without the fear of other stuff breaking. Claude 100% of the time even thinks we use laravel despite the project being some old lumen codebase, so most of laravels features are not available. It also gets the PHP version we are using wrong 100% of the time. |
|
| |
| ▲ | KaiShips 3 hours ago | parent | prev [-] | | [flagged] |
|
|
| ▲ | SecretDreams 4 hours ago | parent | prev | next [-] |
| > Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China? I genuinely do not know how prices can get lower from the current major providers in NA without the whole market collapsing. Everyone is spending copious amounts of money to presumably make more money back. |
| |
| ▲ | aDyslecticCrow 3 hours ago | parent | next [-] | | An inference only platform selling good open weight model inference without the research overhead could capture a-lot of market for lower size model uses (haiky, gemeni flash). Diffusion-transformers and clever cashing can drop inference even lower, which is improving at a high rate. The biggest reason large models are un-attainable for local applications is the lack hardware with large amount of unified/graphics memory (and the cost of the platforms that do). Once the memory slog goes back to normal and hardware manufacturers adapt to demand, we may see consumer hardware with large memory capacity effectively opening the door for slow but usable frontier model inference (assuming improvements in model efficiency and compute capacity) At that point, inference becomes a race to the bottom. The large labs hope they can attain a leap in capability (which is increasingly looking bleak, with a average catch-up of just a few months) or market dominance through integration (integration in platforms and OS, exclusive deals with companies or governments). For coding agents, i suspect no player will manage lock in enough market to enforce pricing much higher than the true inference cost, and catering to programmers becomes an unsustainable proposition. We will instead be further hit with a lot of AI integrated into our other tooling costs, such as GitHub, Microsoft suite, G-suite, forcing in AI functions as a value-ad into the total cost without giving the option to exclude them. (using their market position) | | |
| ▲ | pianopatrick 2 hours ago | parent | next [-] | | AI may get so commoditized for certain use cases that you will not even be able sell inference at a profit. AI might be bundled in with other services, just like cursor bundles in their own AI model for auto complete with their editor. I.e. cameras might have AI for image recognition bundled in etc. | | |
| ▲ | HDThoreaun an hour ago | parent [-] | | Agreed, this is where google is really, really set up to win the market. They can combine gemini subscription with a moderately more expensive google workspace and steal MSFTs entire $50 billion enterprise productivity software market. MSFT is quickly trying to get copilot in a good enough state but without TPUs I think itll be tough for them to serve a good enough model at a price people will accept. |
| |
| ▲ | SecretDreams 3 hours ago | parent | prev [-] | | I agree with all of this. So my question remains the same: How are the players investing 100s of billions in buildout going to hope to make this back? Market capture looks bleak, inference looks like a race to the bottom. End users look like they could be beneficiaries. Where do the big boys go? | | |
| ▲ | CuriouslyC 2 hours ago | parent [-] | | The American big boys are hoping to create "labor as a service" rather than sell tools. You don't hire an accountant that uses Claude, you hire Claude and it just does everything, without the visibility of current agents. They'll need to make it remote and obfuscated to protect their secret sauce from distillation and reverse engineering. It'll be really expensive, and be focused on enabling rich business types and upper managers. |
|
| |
| ▲ | HDThoreaun an hour ago | parent | prev [-] | | Prices can go down while tokens sold increases so that profit increases. The labs number one goal right now is moving past software engineers so that every white collar worker in the country finds ai assistants indispensable. Speculation here but I think openAI/antrhopic api inference is insanely profitable, it just needs more volume to amortize the training costs. | | |
| ▲ | SecretDreams an hour ago | parent [-] | | > Speculation here but I think openAI/antrhopic api inference is insanely profitable, it just needs more volume to amortize the training costs. Well, they just rent their hardware, so I'm not so sure. But they'll both be public soon and we should get that breakout in their cost structures, somewhat. |
|
|
|
| ▲ | cyanydeez 4 hours ago | parent | prev | next [-] |
| id be amazed any american business will aend data to china |
| |
| ▲ | linkregister 3 hours ago | parent | next [-] | | HuggingFace offers DeepSeek as one of its models— it's pretty simple to spin up instances under your control. I'm not sure about OpenRouter but I wouldn't be surprised if they offer a US-based provider of DeepSeek. For reference, Cursor has their first own light fork of Kimi that they use as their baseline coding and review model. | | |
| ▲ | dghlsakjg 3 hours ago | parent [-] | | The majority of Deepseek providers on OpenRouter for v4 pro are in the US. Especially interesting is that they are in the same ballpark for pricing. | | |
| ▲ | eikenberry 3 hours ago | parent [-] | | They are in the same ballpark for deepseek-v4-flash, but deepseek-v4-pro from deepseek is still around 1/2 of the alternatives. | | |
| ▲ | dghlsakjg 3 hours ago | parent [-] | | I'm pretty sure that Deepseek said that pricing was promotional. Be curious to see if it lasts. V3 pricing from them was right in line with what the commodity providers are charging. | | |
| ▲ | eikenberry 2 hours ago | parent [-] | | They announced a few weeks back that the promotional pricing was permanent. |
|
|
|
| |
| ▲ | alpinisme 3 hours ago | parent | prev | next [-] | | “Any” is a very high bar Unless laws prevent it, I don’t see why a substantial minority wouldn’t buy services from where they can get them at a similar quality and much lower price. | |
| ▲ | dkersten 3 hours ago | parent | prev | next [-] | | Together.ai provide many open weights models and as far as I’m are their servers are US based (the company certainly is) | |
| ▲ | lowbloodsugar 3 hours ago | parent | prev [-] | | Any IT cost center will send to the lowest bidder. This isn’t intellectual property: it’s annoying shit that is an unwelcome cost of doing business. China might copy our tedious scripts? Will they make a product out of it? Can I buy it and fire my IT staff? Great! Not everyone using AI is using it to code core value IP. |
|
|
| ▲ | mcmcmc 3 hours ago | parent | prev [-] |
| [dead] |