Remix.run Logo
fraXis 5 hours ago

How are you accessing their API? Through OpenRouter, or direct? Are you using DeepSeek v4 Pro? $2 seems a lot cheaper than my own experience accessing them through OpenRouter for over 100 million tokens, but I am using OpenRouter to access v4 pro.

crazylogger an hour ago | parent | next [-]

Cache hit rate dominates your total cost calculation for long agent session, and it largely depends on the provider. Deepseek's native deployment is probably much better than third party in this regard. For v4 pro it's a whopping >100x price difference between normal input vs. cached input tokens.

apatheticonion 3 hours ago | parent | prev | next [-]

I am using Flash and accessing the API directly via vscode insiders and occasionally Zed (it's buggy but I keep coming back to it because I want it to succeed).

Unless you need enterprise multi-model management, I don't see the point in OpenRouter as it just adds cost overhead and you can just self-host an open router alternative (LiteLLM, Bifrost, etc). Running an LLM gateway locally is kind of nice as it allows you to normalize your configurations against your internal gateway - but I haven't really needed to.

fc417fc802 5 hours ago | parent | prev [-]

Pro is substantially more expensive than flash. In addition, there's wide variance in price with DeepSeek themselves providing the cheapest tokens last I checked (but they train on them). Caching policy also varies by provider. TTL can be as low as 5 minutes or as high as 24 hours and reading from the cache might or might not reset the timer. Whether or not you get a hit makes (IIRC) a 10x (edit: it's actually 50x) price difference in the case of DeepSeek themselves.