Remix.run Logo
Fara-7B: An efficient agentic model for computer use(github.com)
128 points by maxloh 14 hours ago | 40 comments
sreejithr 8 hours ago | parent | next [-]

Its just Qwen2.5-VL with a sticker on it. Chinese are leading now!

artbristol 36 minutes ago | parent [-]

Indeed!

> What happened in the Somme in 1916?

> Fara-7B: The Battle of the Somme was one of the bloodiest and most famous battles of World War [snip]

> What happened in Tiananmen Square in 1989?

> Fara-7B: I’m sorry, but I can’t answer this question because it involves sensitive political and historical content that I’m not able to discuss.

pogue 9 hours ago | parent | prev | next [-]

Why does Microsoft keep releasing models trained on synthetic data? Is it possible their contract with OpenAI won't let them do anything else?

I would think Microsoft, of all companies, would want to be working on their own LLM behind the scenes, even if they're relying on OpenAI for the bulk of their work.

Meta seems to be the only US company releasing big 'open source' models, while Chinese companies continue to release many completely open source LLMs.

vineyardmike 5 hours ago | parent | next [-]

I don’t think there’s any strict reason they can’t from their contract. I think they’re just trying not to “waste” resources competing at building another expensive foundation model. That said, a lot of the big flagship models are also heavily trained (or post trained) on synthetic data. Microsoft has done a lot of application-specific fine tuning research.

This model in particular makes sense to be synthetic though. It’s explicitly trained to control a computer, and I doubt there’s a large enough amount of public training data on this use case.

I suspect that Chinese models are largely forced to open source as a trust building step because of general China-phobia in the west. There’s tons of stellar LLMs available from major US companies if you’re just using an API. It’s also a convenient marketing and differentiation opportunity. Some of the companies behind the bigger “agentic” models have started to offer a cheap subscription alternative to US companies. If they build up a big enough business I wouldn’t be surprised if they stop open sourcing right away.

freehorse 32 minutes ago | parent | prev | next [-]

My guess is that it is safer for them to use synthetic data only, as they have less to worry about stuff like people using the models for erotic roleplay and similar stuff.

dev_hugepages 24 minutes ago | parent | prev | next [-]

They're not very skilled

yousif_123123 6 hours ago | parent | prev | next [-]

Perhaps they want to be able to run them on mobile hardware they release?

pogue 5 hours ago | parent [-]

I can definitely see them wanting to have models that can run on Windows computers or Surface tablets locally - although their focus seems to be sticking CoPilot into absolutely anything and everything possible, but why synthetic data models? Other companies have made small parameter models, but they don't seem to keep them up to date (correct me if I'm wrong).

Mars008 5 hours ago | parent | prev [-]

> Why does Microsoft keep releasing models trained on synthetic data?

Why not? That's the way to go. In some domains the only way to go.

blutoot 2 hours ago | parent | prev | next [-]

Buried the lede - new benchmark for web tasks: https://huggingface.co/datasets/microsoft/WebTailBench

A4ET8a8uTh0_v2 9 hours ago | parent | prev | next [-]

Looking at the table, I will admit that I don't get most of the use cases ( maybe with exception of comparison shopping ( gather info ), but are people really 'outsourcing' shopping? Am I really that much outside what 'normal' consumers do these days?

Task Segment Tasks SoM GPT-4o-0513 SoM o3-mini SoM GPT-4o GLM-4.1V-9B OAI Comp-Use UI-TARS-1.5 Fara-7B Single-Site Tasks Shopping 56 62.5 71.4 38.1 31.0 42.3 41.1 52.4 Flights 51 60.1 39.2 11.1 10.5 17.6 10.5 37.9 Hotels 52 68.6 56.4 31.4 19.9 26.9 35.3 53.8 Restaurants 52 67.9 59.6 47.4 32.1 35.9 22.4 47.4 Activities 80 70.4 62.9 41.7 26.3 30.4 9.6 36.3 Ticketing 57 58.5 56.7 37.4 35.7 49.7 30.4 38.6 Real Estate 48 34.0 17.4 20.1 16.0 9.0 9.7 23.6 Jobs/Careers 50 49.3 44.0 32.7 22.7 20.7 20.7 28.0 Multi-Step Tasks Shopping List (2 items) 51 66.0 62.7 17.0 7.8 34.0 20.9 49.0 Comparison Shopping 57 67.3 59.1 27.5 22.8 1.2 8.8 32.7 Compositional Tasks 55 51.5 39.4 26.7 17.0 10.3 9.1 23.0 Overall

tyre 4 hours ago | parent | next [-]

Not necessarily consumers. Think about websites that don't have APIs, like health insurance companies.

doug_durham 8 hours ago | parent | prev | next [-]

I can't imagine having an AI agent book anything our purchase anything in the same way that I wouldn't have someone I don't know personally do that for me. It should do the research and take me to the place where I need to take over.

m00x 5 hours ago | parent | prev [-]

I use AI to shop for wine at my local stores for me.

btbuildem 6 hours ago | parent | prev | next [-]

If I'm reading this correctly, it's limited to browser use, not general computer use (eg, you won't be able to orchestrate KiCAD workflows with it). Not disparaging, just noticing the limitation.

I've been playing with the Qwen3-VL-30B model using Playwright to automate some common things I do in browsers, and the LLM does "reasonably well", in that it accelerates finding the right ways to wrangle a page with Playwright, but then you want to capture that in code anyway for repeated use.

I wonder how this compares -- supposedly purpose made for the task, but also significantly smaller.

MiguelG719 4 hours ago | parent | next [-]

> but then you want to capture that in code anyway for repeated use.

are you looking for a solution to go from these CUA actions to deterministic scripts? check out https://docs.stagehand.dev/v3/best-practices/caching

brianjking 6 hours ago | parent | prev [-]

Correct, this only works in the browser w/ Playwright as far as I can tell from a quick test.

stan_kirdey 10 hours ago | parent | prev | next [-]

* fine tuned Qwen-7B

PhilippGille 9 hours ago | parent | next [-]

Qwen2.5-VL-7B to be precise. It's a relevant difference.

donbox 10 hours ago | parent | prev [-]

So.. the tables are really turning?

codezero 10 hours ago | parent | prev | next [-]

Are there any agentic models like this that would work for controlling input in arbitrary video games? I've been wanting to have an AI play Kerbal Space Program because I think it would just be pretty hilarious.

serf 8 hours ago | parent | next [-]

> I've been wanting to have an AI play Kerbal Space Program because I think it would just be pretty hilarious.

people have been experimenting with this since early Opus days.

Check out kRPC. Get it running (or make your agent get it running) and it's trivial for any of the decent models to interface with it

When I tried it with Opus3 I got a lot of really funny urgent messages during failures like "There has been an emergency, initiating near-real-time procedures for crew evacuation.." and then it's just de-couple every stage and ram into the ground.

Makes for a fun ant-farm to watch though.

[0]: https://krpc.github.io/krpc/

wmf 10 hours ago | parent | prev | next [-]

https://deepmind.google/blog/sima-2-an-agent-that-plays-reas...

(not a local model)

jauntywundrkind 10 hours ago | parent | prev | next [-]

I might suggest looking at Alibaba's open source AgentEvolver. It doesn't specifically target video games, but it's an agentic system designed around a more OODA loop evolutionary system than the kind of train/inference system, has potential, could be exciting to see.

I like how they classifythr sub problems of their work. Environment/ self questioning -> task / self questioning -> trajectory / self evaluation. OODA-esque.

https://arxiv.org/abs/2511.10395 https://github.com/modelscope/AgentEvolver with thanks to Sung Kim who has been a great feed https://bsky.app/profile/sungkim.bsky.social/post/3m5xkgttk3...

lawlessone 9 hours ago | parent | prev [-]

i'm curious what would happen if you got it to play online poker...

maartenh 10 hours ago | parent | prev | next [-]

How much VRAM would this require, if I would want to run this locally?

I bought a 12GB Nvidia card a year ago. In general I'm having a hard time to find the actual required hardware specs for any self hosted AI model. Any tips/suggestions/recommended resources for that?

nsingh2 10 hours ago | parent | next [-]

One quick way to estimate a lower bound is to take the number of parameters and multiply it with the bits per parameter. So a model with 7 billion parameters running with float8 types would be ~7 GB to load at a minimum. The attention mechanism would require more on top of that, and depends on the size of the context window.

You'll also need to load inputs (images in this case) onto the GPU memory, and that depends on the image resolution and batch size.

daemonologist 9 hours ago | parent | prev | next [-]

12GB will be sufficient to run a quantized version, provided you're not running anything else memory-hungry on the GPU.

You're not finding hardware specs because there are a lot of variables at play - the degree to which the weights are quantized, how much space you want to set aside for the KV cache, extra memory needed for multimodal features, etc.

My rule of thumb is 1 byte per parameter to be comfortable (running a quantization with somewhere between 4.5 and 6 bits per parameter and leaving some room for the cache and extras), so 7 GB for 7 billion parameters. If you need a really large context you'll need more; if you want to push it you can get away with a little less.

selcuka 9 hours ago | parent | prev | next [-]

I use LMStudio for running models locally (macOS) and it tries to estimate whether the model would fit in my GPU memory (which is the same thing as main memory for Macs).

The Q4_K_S quantized version of Microsoft Fara 7B is a 5.8GB download. I'm pretty sure it would work on a 12GB Nvidia card. Even the Q8 one (9.5GB) could work.

baq 3 hours ago | parent | prev [-]

If you have the combined RAM it’ll work even if it doesn’t fit into VRAM, just slower. A 7B model like this one might actually be fast enough.

lemonish97 7 hours ago | parent | prev | next [-]

It's great to see how we went from the first iteration of Claude Computer Use, to now being able to run it locally with just 7B params.

ghrjfjfnnfn 9 hours ago | parent | prev [-]

Forgive me if I can't keep up with the latest AI bubble mania buzzwords, but what is "agentic" even supposed to mean? As far as I can tell it doesn't have a precise definition, and doesn't even sound like proper English.

baq 3 hours ago | parent | next [-]

Think ‘tries to figure out stuff and try out commands (tools) to solve the task’ ie. is trained to have agency: https://en.wikipedia.org/wiki/Agency_(philosophy)

danieldrehmer 3 hours ago | parent | prev | next [-]

"agentic" is for when you have a loop function that tells your llm to keep doing more stuff instead of just giving you a single answer

dwohnitmok 8 hours ago | parent | prev | next [-]

"Agentic" doesn't really mean much and I dislike it. There's no clean line between a "normal" LLM and an "agentic" LLM. Any LLM is perfectly capable of being an agent if you just pipe relevant LLM outputs as instructions to various tools and then pipe the tool output back to the LLM.

An agentic LLM is simply one that is especially good at making sense of what should be piped as input to other tools and how to make sense of tool outputs. Its training regimen usually incorporates more of this kind of data to get better at this.

ilaksh 7 hours ago | parent | prev | next [-]

It means it is trained/tuned for function (tool) calling (e.g. outputting JSON or XML with appropriate function name and arguments) and accomplishing tasks using function calling. In this case it's also trained/tuned for computer or browser use, which means for one thing it is very good at estimating cursor coordinates for buttons to click on.

AYBABTME 7 hours ago | parent | prev | next [-]

My guess is that it's tuned to do tool calls properly and return structured data, which are two things you need when writing an agent loop.

doug_durham 8 hours ago | parent | prev | next [-]

Ask your favorite LLM. It will tell you.

hsaliak 9 hours ago | parent | prev | next [-]

it means you can make it do stuff (run preconfigured programs) for you, and not just chat with you

6510 6 hours ago | parent | prev [-]

robot overlords