Remix.run Logo
koakuma-chan 7 days ago

Do you know what "MCP-based methodology" is? I am skeptical of a 4B model scoring twice as high as Gemini 2.5 Pro

cbcoutinho 7 days ago | parent | next [-]

From the paper:

> Most language models face a fundamental tradeoff where powerful capabilities require substantial computational resources. We shatter this constraint with Jan-nano, a 4B parameter language model that redefines efficiency through radical specialization: instead of trying to know everything, it masters the art of finding anything instantly. Fine-tuned from Qwen3-4B using our novel multi-stage Reinforcement Learning with Verifiable Rewards (RLVR) system that completely eliminates reliance on next token prediction training (SFT), Jan-nano achieves 83.2% on SimpleQA benchmark with MCP integration while running on consumer hardware. With 128K context length, Jan-nano proves that intelligence isn't about scale, it's about strategy.

> For our MCP evaluation, we used mcp-server-serper which provides google search and scrape tools

https://arxiv.org/abs/2506.22760

dabockster 7 days ago | parent | prev | next [-]

Yeah I know about Model Context Protocol. But it's still only a small part of the AI puzzle. I'm saying that we're at a point now where a whole AI stack can run, in some form, 100% on-device with okayish accuracy. When you think about that, and where we're headed, it makes the whole idea of cloud AI look like a dinosaur.

koakuma-chan 7 days ago | parent [-]

I mean, I am asking what "MCP-based methodology" is, because it doesn't make sense for a 4B model to outperform Gemini 2.5 Pro et al by that much.

toshinoriyagi 7 days ago | parent | prev [-]

I'm not too sure what "MCP-based methodology" is, but Jan-nano-128k is a small model specifically designed to be able to answer in-depth questions accurately via tool-use (researching in a provided document or searching the web).

It outperforms those other models, which are not using tools, thanks to the tool use and specificity.

Because it is only 4B parameters, it is naturally terrible at other things I believe-it's not designed for them and doesn't have enough parameters.

In hindsight, "MCP-based methodology" likely refers to its tool-use.